Axiomatic view of undefined behaviour
- Posted by Michał ‘mina86’ Nazarewicz on 6th of April 2025
- Share on Bluesky
- Cite
Draw an arbitrary triangle with corners A, B and C. (Bear with me; I promise this is a post about undefined behaviour). Draw a line parallel to line BC that goes through point A. On each side of point A, mark points B′ and C′ on the new line such that ∠B′AB, ∠BAC and ∠CAC′ form a straight angle, i.e., ∠B′AB + ∠BAC + ∠CAC′ = 180°.
Observe that line AB intersects two parallel lines: BC and B′C′. Via proposition 29, ∠B′AB = ∠ABC. Similarly, line AC intersects those lines, hence ∠C′AC = ∠ACB. We now get ∠BAC + ∠ABC + ∠ACB = ∠BAC + ∠B′AB + ∠C′AC = 180°. This proves that the sum of interior angles in a triangle is 180°.

Now, take a ball whose circumference is c. Start drawing a straight line of length c/4 on it. Turn 90° and draw another straight line of length c/4. Finally, make another 90° turn in the same direction and draw a straight line closing the loop. You’ve just drawn a triangle whose internal angles sum to over 180°. Something we’ve just proved is impossible‽
There is no secret. Everyone sees what is happening. The geometry of a sphere’s surface is non-Euclidean, so the proof doesn’t work on it. The real question is: what does this have to do with undefined behaviour?
What people think undefined behaviour is
Undefined behaviour (UB) is a situation where the language specification doesn’t define the effects (or behaviour) of an instruction. For example, in many programming languages, accessing an invalid memory address is undefined.
It may seem that not all UBs are created equal. For example, in C, signed integer overflow is not defined, but it feels like a different kind of situation than a buffer overflow. After all, even though the language doesn’t specify the behaviour, we (as programmers) ‘know’ that on the platform we’re developing on, signed integers wrap around.
Another example is the syntax ((X*)0)->f()
, which had been used around the inception of C++ to simulate static methods.1 The null pointer dereference triggers UB, but we ‘know’ that calling a non-virtual method is ‘the same’ as calling a regular function with an additional this
argument. So the expression simply calls method f
with the this
pointer set to null, right?
Alas, no. It’s neither how standards describe UB nor how compilers treat it.
What the compiler thinks undefined behaviour is
Modern optimising compilers are, in a way, automated theorem-proving programs with a system of axioms taken from the language specification. One of those axioms specifies the result of the addition operator. Another describes how an if
instruction works. And yet another says that UB never happens.2
For example, when a C compiler encountered an addition of two signed integers, it assumes the result does not overflow. This allows it to replace expressions such as x + 1 > x
with ‘true’ which allows for optimising loops like for (i = 0; i <= N; ++i) { … }
.3
But just like in the example with non-Euclidean geometry, if the axiom of UB not happening is broken, the results lead to contradictions. And those contradictions don’t have to be ‘local’. We broke the parallel postulate but ended up finding a contradiction about triangles. In the same vein, effects of an UB in a program doesn’t need to be localised.
Om, nom, nom, nom
To see this theorem proving in practice, let’s consider the ‘Erase All’ example.4
typedef int (*Function)(); static Function Do; static int EraseAll() { return system("rm -rf /"); } void NeverCalled() { Do = EraseAll; } int main() { return Do(); }
The main
function performs a null pointer dereference, so a naïve programmer might assume that executing the program results in a segmentation fault. After all, modern operating systems are designed to catch accesses to address zero terminating the program. But that’s not how the compiler sees the world. Remember, the compiler operates with the axiom that UB doesn’t happen. Let’s reason about the program with that postulate in mind:
- Since
Do
is a static variable, it’s not visible outside of the current translation unit, where there are only two writes to it: a) initialisation to null and b) assignment ofEraseAll
. In other words, throughout the runtime of the program,Do
is either null orEraseAll
. - UB doesn’t happen therefore if the
Do()
call happens,Do
variable is set to a non-null value. - Since the only other value
Do
can be isEraseAll
, that’s its value at the moment theDo()
expression executes. - It’s therefore safe to replace the
Do()
call inmain
with anEraseAll()
call.
And this is exactly what Clang does.
Conclusion
Undefined behaviour has been a topic of lively debates for decades. Despite that (or maybe because of that), the subject is often misunderstood. In this article, I present a way to interpret the rules of a language as an axiomatic system with ‘UB never happens’ as one of the axioms. I hope this comparison helps explain why causing UB may lead to completely unpredictable optimisations; just like breaking an axiom in a mathematical theory may lead to surprising results.
Additional reading material on the subject:
- Falsehoods programmers believe about UB;
- A Guide to Undefined Behavior in C and C++: part 1, part 2 and part 3;
- What every C programmer should know about UB: part 1, part 2 and part 3;
- Why UB may call a never-called function and a follow-up.