0 is ambiguous

Michał ‘mina86’ Nazarewicz | 24 października 2010

It has been a long time since my last entry. In fact, it was so long, that this condition has already been pointed out pushing me into finally writing something. Inspired by Adriaan de Groot’s entry, I decided to write something about 0, NULL and upcoming nullptr.

I will try to be informative and explain what the whole buzz is about and then give my opinion about nullptr. Let us first inspect how a null pointer can be donated in C and C++.

The confusing 0

In C and C++, the literal 0 has two meanings: it’s either an octal (yes, octal) literal representing number zero or a null pointer. Which meaning is used depends on context, which compiler can usually figure out. For instance:

void number(int);
void pointer(char *);

int main(void) {
	char ch, *ptr;
	ch = 0;		/* number zero (ch is of type char) */
	ptr = 0;	/* null pointer (ptr is of type pointer to char)*/
	number(0);	/* number zero (number accepts int as argument) */
	pointer(0);	/* null pointer (pointer accepts pointer to
			 * char as argument) */
	return 0;	/* number zero (main returns int) */
}

However, if function lacks a prototype or has variable length of arguments, available information may be insufficient to figure out what programmer meant. A good example is printf function from standard library:

#include <stdio.h>

int main(void) {
	printf("%p\n", 0);	/* ??? */
	return 0;
}

In those situations, the first meaning is assumed but since it is not so common nowadays it is not such a big issue. Still there are two things to keep in mind: (i) always provide function prototypes and (ii) when using functions with variable arguments use (foo *)0 to mean a null pointer.

The confusing NULL

Since context is not always obvious some programmers prefer to explicitly indicate the intended meaning by using different notations for numbers and pointers. For those purposes, NULL macro has been introduced. Standard requires that it is defined in such a way, that it can be used in pointer context to mean a null pointer. In C, the macro is often defined as:

#define NULL ((void *)0)

This guarantees not only that NULL means a null pointer in pointer context but also that a diagnostic message must be issued when one tries to use NULL as a number, ie.:

$ cat test.c
#include <stddef.h>

int main(void) {
	return NULL;
}
$ gcc -ansi -pedantic test.c
test.c: In function ‘main’:
test.c:4: warning: return makes integer from pointer without a cast
$

But then comes C++ with stricter typing and says implicit conversions from pointer to void to any other pointer type are invalid, which renders the following an invalid C++ code:

int main(void) {
	char *ptr = (void *)0;
	return 0;
}

The easiest way around is to define NULL as plain 0 or 0L. This is in fact what often happens. GCC uses __null extension but it seems __null is treated like 0L in some contexts:

$ cat test.c
#include <stddef.h>

int main(void) {
	int ret = NULL;	/* no complains */
	ret += __null;
	ret += NULL;
	return ret;
}
$ g++  -ansi -pedantic test.c
test.c: In function ‘int main()’:
test.c:5: warning: NULL used in arithmetic
test.c:6: warning: NULL used in arithmetic
$

Whenever you use NULL, you have to keep in mind that you never know what it really is. This in particular means, that the following code, after pre-processing, may or may not be valid:

#include <stdio.h>

int main(void) {
	printf("%p\n", NULL);	/* ((void *)0)? 0? 0L? __null? */
	return 0;
}

Function overloading

And since we are in the realm of C++ we can now deal with the biggest issue. As you may already know, C++ allows function overloading, so that several functions with the same name can exist so long as they have different arguments. As a consequence, function name is not enough to figure out function’s prototype. Let us consider the following simple code:

void print(int num);
void print(void *ptr);

int main(void) {
	print(0);
	return 0;
}

The above will invoke the first function. To call the second one, we should use a cast, since we know that using NULL is not reliable:

int main(void) {
	print((void *)0);
	return 0;
}

The lesson here is that in C++ you must always use (foo *)0 when you want to pass a null pointer to a function. Even if the call is not ambiguous at this time, someone may add an overloaded version of the function which will break the code if cast is not used in the first case.

Useless nullptr

Addressing those issues, the C++ working group defined a new nullptr keyword which evaluates to a null pointer of type std::nullptr_t which can be implicitly converted to any other pointer while, at the same time, cannot be converted to a number.

This seems like a good thing, right? Unfortunately, there are two issues with this approach. First one is that it does not fix the problem completely. For instance, the following is still ambiguous:

void print(char *);
void foo(wchar_t *);

int main(void) {
	print(nullptr);
}

At the same time, yet another keyword has been introduced which serves no purpose. NULL was a perfectly fine identifier which might have been used without polluting keyword name space or adding another identifier people need to understand..

My recommendation is not to bother with it. Just ignore this thing and stick to NULL where context is clear and (foo *)0 when calling function with multiple versions.

UPDATE: It has been pointed out to me that nullptr in fact fixes a single problem: it prevents an incorrect function from being called. This is true but it does not change much in my reasoning: I still recommend ignoring nullptr and using NULL, which is more portable and when nullptr comes around, compilers will define the latter as the former anyway. This is also what C++ committee should do instead of adding to the confusion by introducing yet another method of representing a null pointer.

Null pointer representation

The last thing I want to talk about is a null pointer’s representation. There are two misconceptions that came from the fact that 0 is used to mean a null pointer. The first one is that a null pointer is in fact represented by a “zero value” (ie. all bits clear). The second one is that when implementation uses a different representation assigning 0 to a pointer does not yield a null pointer.

Both of those are incorrect. In some implementations (microcontrollers, x86 real mode, …) a “zero pointer” is perfectly fine and using (for instance) all bits set representation to mean a null pointer may make more sense. At the same time, it’s compiler responsibility to use correct representation whenever 0 is used to mean a null pointer.

UPDATE: Przemoc has pointed to one of comp.lang.c FAQs which gives examples of machines where null pointers that are not represented by all bits clear as well as machines with different representation of pointers of different types (which I haven’t mentioned in this article). Also, the recurring issue of not being able to comment has been fixed (hopefully for good this time).

Summary

To sum things up, here are all the tips that you should keep in mind while programming in C or C++:

  1. Always define function prototypes. Always.
  2. When function has variable number of arguments, use (foo *)0 to mean a null pointer.
  3. In C++, when specifying function arguments, use (foo *)0 to mean a null pointer. This also applies to a single argument constructors where no explicit function call is made.
  4. When new C++ standard comes around, just ignore nullptr, it solves no problems and only adds to the confusion.