0 is ambiguous

Posted by Michał ‘mina86’ Nazarewicz on 24th of October 2010

It has been a long time since my last entry, so inspired by Adriaan de Groot’s entry, I decided to write something about 0, NULL and upcoming nullptr.

I will try to be informative and explain what the whole buzz is about and then give my opinion about nullptr. Let us first inspect how a null pointer can be donated in C and C++.

The confusing 0

In C and C++, the literal 0 has two meanings: it’s either an octal (yes, octal, here’s a fun fact of the day for you ;) ) literal representing number zero or a null pointer. Which meaning is used depends on context, which compiler can usually figure out. For instance:

void takes_number(int);
void takes_pointer(long *);

int main(void) {
	char ch = 0;	/* number zero */
	char *ptr = 0;	/* null pointer */
	takes_number(0);	/* number zero (argument is an int) */
	takes_pointer(0);	/* null pointer (argument is a pointer) */
	return 0;	/* number zero (main returns int) */
}

However, if function lacks a prototype or has variable length of arguments, available information may be insufficient to figure out what programmer meant. A good example is printf function from standard library:

#include <stdio.h>

int main(void) {
	printf("%p\n", 0);
	return 0;
}

In such situations, the first meaning prevails (i.e. a number), which in turn makes the above into an undefined behaviour (an int is passed where a pointer is expected).

Based on that, two things to keep in mind are to always provide function prototypes and prefer explicit (void *)0 to mean a null pointer when calling variadic functions.

The confusing NULL

To disambiguate intended context, NULL macro can be used. Standard requires that it is defined such that it can be used in pointer context to mean a null pointer. In C the macro is often defined as:

#define NULL ((void *)0)

This fulfils the aforementioned requirement and in addition guarantees a warning when it is used as a number, i.e.:

$ cat test.c
#include <stddef.h>

int main(void) {
	return NULL;
}
$ gcc -ansi -pedantic test.c
test.c: In function ‘main’:
test.c:4: warning: return makes integer from pointer without a cast
$

And all was good until C++ came along with its stricter typing rules. Since in C++ implicit conversions from void* to other pointer type is illegal making aforementioned definition invalid:

#define OLD_NULL ((void *)0)

int main(void) {
	char *ptr = OLD_NULL;  /* compile time error */
	return 0;
}

The easiest solution is to define NULL as a plain 0 (or 0L). GCC is a bit smarter and uses __null extension but confusingly even that is treated like 0 in some contexts:

$ cat test.c
#include <stddef.h>

int main(void) {
	int ret = NULL;	/* no complains */
	ret += __null;
	ret += NULL;
	return ret;
}
$ g++  -ansi -pedantic test.c
test.c: In function ‘int main()’:
test.c:5: warning: NULL used in arithmetic
test.c:6: warning: NULL used in arithmetic
$

Whenever you use NULL, you have to keep in mind that you never know what it really is. This in particular means, that the following code may or may not be valid:

#include <stdio.h>

int main(void) {
	printf("%p\n", NULL);	/* ((void *)0)? 0? 0L? __null? */
	return 0;
}

Function overloading

Fortunately (at least in the context of null pointers), variadic functions aren’t that common. Function overloading is what poses more problem since even with full prototypes, it’s not always possible to determine arguments types by function name and its arity alone. For example:

void print(int num);
void print(long *ptr);

int main(void) {
	print(0);		/* first function */
	print((long *)0);	/* second function */
	print(NULL);		/* ??? */
	return 0;
}

The lesson here is that (especially in C++) NULL macro is ambiguous as well and when dealing with overloaded functions an explicit cast might be necessary.

So what about nullptr?

To help address those issues, C++11 introduced nullptr keyword. It evaluates to a std::nullptr_t object which can be implicitly converted to any null pointer (but not to a number).

Unfortunately, one problem remains. If multiple pointer types are acceptable in given context, the compiler cannot determine which one to use. This is most easy to see with an overloaded function taking different types of pointers as in the following example:

void print(char *);
void print(wchar_t *);

int main(void) {
	print(nullptr);
}

But at least any ambiguity results in failure to build rather than the compiler silently choosing one of the options (which may or may not be what we want).

My other criticism is that NULL was (and still is) a perfectly fine identifier which served us for years yet the committee decided to throw it away like yesterday’s jam and instead pollute keyword name-space and people’s minds with yet another name that means ‘a null pointer’. The standard could define _Null keyword and mandate that NULL expands to _Null which would be much more straightforward.

I used to be a wee bit critical of this new keyword. I’m still not in love with it, but I also recognise I’m being in minority (perhaps even minority of one person). As such, nullptr is the best we’re gonna get going forward, though spelling out the pointer type explicitly is also a perfectly valid solution and don’t let others tell you otherwise. ;)

Null pointer representation

The last thing I want to talk about is a null pointer’s representation. There are two misconceptions that came from the fact that 0 is used to mean a null pointer. The first one is that a null pointer is in fact represented by a ‘zero value’ (i.e. all bits clear). The second one is that when implementation uses a different representation assigning 0 to a pointer does not yield a null pointer.

Both of those are incorrect. In various implementations zero is a perfectly cromulent address and other representations for a null pointer (e.g. all bits set) may make more sense. Regardless, 0 always means a null pointer in pointer context and it’s compiler’s responsibility to translate it into the correct representation.

Summary

To sum things up, here are all the tips that you should keep in mind while programming in C or C++:

  1. Always define function prototypes.
  2. In C++ use nullptr or (T*)0 to mean a null pointer.
  3. In C, if you’re using NULL beware of variadic functions and false sense of security the macro might give you.

I used to recommend ignoring nullptr keyword but nowadays people will look at you funny if you try using NULL.