Will the real ARG_MAX
please stand up? Part 1
Posted by Michał ‘mina86’ Nazarewicz on 14th of March 2021 | (cite)
arg max is a set of values from function’s domain at which said function reaches its maxima. That’s certainly an arg max but spelled without an underscore thus not the one we are searching for. No, this article is regarding the ARG_MAX
that limits the length of arguments to an executable.
Or in other words, why you are getting:
bash: command: Argument list too long
The ARG_MAX
parameter is common among UNIX-like platforms but since most such systems have fallen into obscurity, are BSD or rubbish systems, herein I will focus on a GNU/Linux environment running on a x86_64 platform. While this limits the applicability of the information, many concepts will apply to other systems and architectures as well.
Experimentation
The value of ARG_MAX
is no secret and can be retrieve with getconf
utility. On x86_64 platform it’s 2097152 (or 2 MiB) by default. To test this limit we can use pretty much any command; echo
should do. A simple experiment (as shown in the listing on the side) confirms that the tool can be called with a one-million-character-long argument. On the other hand, further investigation demonstrates that an argument three times as long works just as well which shouldn’t be the case.
Turns out that even in minimal shells such as dash and posh the echo
command is a built-in. Its execution is performed entirely within the shell and as such isn’t subject to kernel-imposed limitations. For the ARG_MAX
limit to take effect the execve
system call has to be used. This can be done by executing the /bin/echo
binary instead.
$ getconf ARG_MAX 2097152 $ echo ' head -c "${1?}" /dev/zero |tr "\0" A ' >gen-str $ chmod 755 gen-str $ s=$(./gen-str 1000000) $ echo ${#s} 1000000 $ echo "$s" |wc -c 1000001 $ echo "$s$s$s" |wc -c 3000001 $ type echo echo is a shell builtin $ /bin/echo "$s" sh: /bin/echo: Argument list too long
But this time, after redoing the experiment with that executable, the command fails even for mere one million characters which should be within the ARG_MAX
limit. Has getconf
lied to us? Is the limit some other value? With some trial and error we can determined that the longest argument the kernel will accept is 131072 bytes long including the terminating NUL byte. However, that’s clearly not the end of the story since it’s easy to see that while a single argument is limited to 128 KiB, the whole command line is not.
$ /bin/echo "$(./gen-str 131072)" sh: /bin/echo: Argument list too long $ s=$(./gen-str 131071) $ /bin/echo "$s" |wc -c 131072 $ /bin/echo "$s" "$s" "$s" |wc -c 393216 $ /bin/echo "$s" "$s" "$s" "$s" "$s" "$s" "$s" "$s" \ "$s" "$s" "$s" "$s" "$s" "$s" "$s" "$s" sh: /bin/echo: Argument list too long $ t=$(./gen-str $((131071 - 3422))) $ /bin/echo "$s" "$s" "$s" "$s" "$s" "$s" "$s" "$s" \ "$s" "$s" "$s" "$s" "$s" "$s" "$s" "$t" |wc -c 2093730
Do determine the apparent limit we can try passing more and more arguments until we get the ‘Argument list too long’ error message. As we’ll see the one gets is highly dependent on the environment. In the shell session shown in listing above, I got a limit of 2093730 characters which is only 3422 bytes shy of the 2 MiB we got as the value ARG_MAX
. What could account for this difference?
Since echo
prints a white-space character after each argument, the number reported by wc -c
does effectively count NUL bytes so that cannot be where the missing bytes went. There’s one other aspect of command line arguments that has been overlooked so far. By UNIX convention, the first argument of a program is its name. In the above case that’s /bin/echo
which accounts for ten characters (again, NUL byte is counted). While it’s something, it still leaves 3412 bytes unaccounted for.
The next breakthrough comes once we realise that command line arguments are not the only way information is passed to an application. The other, commonly overlooked method, is environment variables such as PATH
, TERM
or USER
. All of them contribute to the limit in the same way command line arguments do. To measure how much, we can simply invoke env |wc
which will produce the number as well as total length of all the variables. This isn’t robust against variables containing newline characters, but other than that it correctly measures used space including NUL bytes terminating each value.
$ env |wc 45 45 2906
Environment variables explain further 2906 bytes which gets the discrepancy down to 506 bytes. Close but no cigar just yet.
Third thing to consider is how arguments and environment variables are passed to an application. They end up in argv
and environ
arrays respectively which take up space. In the example above there are 17 arguments and 45 environment variables (as seen from the output of env |wc
) so total of 62 strings. Each requires an eight-byte pointer in corresponding array which in total amounts to 496 bytes. This still leaves 10 bytes. The argv
and environ
arrays are NULL-terminated but that is not counted against the limit — if we were to include those NULL pointers we would overshoot the tally by 6 bytes. Something else is afoot.
The final bit of the puzzle is the auxiliary vector. A Linux-specific mechanism which kernel uses to pass additional information to the user space. One of the pieces of data this vector includes is the path used to launch the executable. In the example above this path is once again /bin/echo
and thus perfectly accounts for the remaining ten bytes.
Verification
To verify all the findings we can try calling different binaries with and without environment variables present. When doing that from a shell it’s important to take note of any variables that the shell might automatically create when calling programs. I’ve found that posh is particularly well behaved in this regard. While it makes sure a PATH
variable is present when it starts, it lets user remove it and later doesn’t try to create any more variables when invoking commands.
$ env -i posh $ unset PATH; env $ s=$(./gen-str 131071) $ check() { "$1" "$s" "$s" "$s" "$s" "$s" "$s" "$s" "$s" \ "$s" "$s" "$s" "$s" "$s" "$s" "$s" \ "$(./gen-str $2)" } # 17 arguments, 8 bytes per pointer # 10-byte-long path, once in argv and once in auxv $ check /bin/echo $((131071 - 17*8 - 2*10)) |wc -c 2096996 $ check /bin/false $((131071 - 17*8 - 2*10)) posh: /bin/false: Argument list too long # this time 11-byte-long path $ check /bin/false $((131071 - 17*8 - 2*11))) $ foo=bar; export foo $ check /bin/echo $((131071 - 17*8 - 2*10)) posh: /bin/echo: Argument list too long # Additional 8-byte pointer in environ array plus # ‘foo=bar’ (inc. NUL byte) takes another 8 bytes $ check /bin/echo $((131071 - 17*8 - 1*8 - 2*10 - 8)) |wc -c 2096980
Calling programs whose paths have different lengths with and without environment variables present is in fact consistent with all the rules analysed earlier.
Pushing the limits
Knowing the limit, the next step is to understand how, if at all, can it be changed. There’s no setconf
counterpart to the getconf
tool which would allow setting the parameter. Indeed, we could search far and wide for a method to directly change ARG_MAX
and not find anything. Alas, there is a way to influence the value on Linux.
To realise how we should consider what do all of the objects counted in the limit — command line arguments, environment variables and auxiliary vector — have in common. More specifically, where are they located. With address space randomisation the exact addresses will vary even between runs of the same application, but looking at an example addresses proves helpful nonetheless:
Expression | Result |
---|---|
getauxval(AT_EXECFN) | 0x7fffac2fcfed |
environ[0] | 0x7fffac2fc473 |
argv[0] | 0x7fffac2fc468 |
sbrk(0) | 0x555bd584a000 |
&global | 0x555bd4c6a040 |
Yes, that’s it. All objects of interest are stored on the stack. It stands to reason then that changing the maximum stack size might influence ARG_MAX
value. This hypothesis can be tested with the help of ulimit
built-in which allows reading and modifying resource limits such as the maximum stack size.
$ ulimit -Ss 1024; getconf ARG_MAX 262144 # 256 KiB $ ulimit -Ss 512; getconf ARG_MAX 131072 # 128 KiB $ ulimit -Ss 256; getconf ARG_MAX 131072 # 128 KiB $ ulimit -Ss $((1024 * 1024)); getconf ARG_MAX 268435456 # 256 GiB
Correlating maximum stack size with value of the ARG_MAX
limit we can easily see that the latter is set to one fourth of the former with additional restriction that ARG_MAX
is no lower than 128 KiB. Upper limit on the other hand doesn’t appear to exist. Or does it? Let’s try a stack size of 42 MiB:
$ ulimit -Ss $((42 * 1024)) $ getconf ARG_MAX 11010048 $ s=$(./gen-str 131071) $ /bin/true "$s" "$s" "$s" "$s" "$s" "$s" "$s" "$s" \ "$s" "$s" "$s" "$s" "$s" "$s" "$s" "$s" \ "$s" "$s" "$s" "$s" "$s" "$s" "$s" "$s" \ "$s" "$s" "$s" "$s" "$s" "$s" "$s" "$s" \ "$s" "$s" "$s" "$s" "$s" "$s" "$s" "$s" \ "$s" "$s" "$s" "$s" "$s" "$s" "$s" "$s" sh: /bin/true: Argument list too long
Turns out even if the kernel returns a large value of ARG_MAX
, the limit is always capped at 6 MiB.
On the other end, even if kernel reports ARG_MAX
to be 128 KiB, that effective limit will never reach more than the maximum stack size. For example:
$ ulimit -Ss 100 $ exec env -i posh $ unset PATH $ getconf ARG_MAX 131072 $ /bin/true "$(../gen-str 102400)" posh: /bin/true: Argument list too long
The real argument limit
To quickly recap all the information:
- The maximum length of the arguments to the
execve
syscall is one forth of the maximum stack size but no less than 128 KiB and no more than 6 MiB. - The limit covers: i) all command line arguments, ii) all environment variables, iii) pointers to the former two present in
argv
andenviron
arrays and iv) the path to the executable used to run the program. - In addition, regardless of how high the limit is, a single command line argument and environment variable cannot exceed 128 KiB. Size of environment variable is calculated as the size of
name=value
string. - String size includes the terminating NUL byte, i.e. it’s the string length plus one.
And that’s it for now. In the next part we’re going to look at the kernel code responsible for implementing the limit so stay tuned (or should I say smash that Atom feed button?).
† Roughly speaking. It’s of course impossible to use the entire stack for arguments and environment variables since there would be no space left for any other information or setting up stack frame. As such, the actual arg limit is capped at less than maximum stack size.
PS. By the way, despite all the BSD influences in XNU, I don’t consider macOS to be a BSD.