Posted onMar 25, 2023

Various Things About Command-line Arguments for Linux Processes

#linux #command #arguent

Introduction

When you execute a command like "foo bar baz", the command-line arguments are typically "foo", "bar", and "baz". Although you might think that the arguments are only "bar" and "baz", this is the definition anyway.

In C and C++, command-line arguments can be referenced from theargv array argument of themain function in the program. In the example above, the executable name is stored inargv[0]. The "bar" after that is inargv[1], and "baz" is inargv[2]. The variable equivalent to argv is "$0","$1","$2"... in shell scripts,sys.argv in Python, andos.Args in Go, etc. However, scripts such as shell scripts and Python scripts do not directly expose command-line arguments like C, but show slightly modified ones. This will be discussed later.

About the first element of command-line arguments

The first element of command-line arguments (hereafter referred to asargv[0]) conventionally contains the name of the executable. When executing a program, regardless of the language in which the program is written, theexecve() system call shown below is eventually called, specifying the program's executable name in the pathname argument and the command-line arguments in theargv argument.

intexecve(constchar*pathname,char*constargv[],char*constenvp[]);

At this time, following the convention, the same value is specified for pathname andargv[0].

When would you setargv[0] to something other than the name of the executable? For example, when bash is a login shell,argv[0] is not bash but rather "-bash" with a "-" at the beginning. This allows bash to know whether it is a login shell at the time of execution, and to branch its processing accordingly (such as changing the configuration file to be loaded). We will actually verify the value ofargv[0] for bash in the next section.

When executing a program, if the executable is run through an interpreter like a bash script, the interpreter's executable name, not the script name, is stored inargv[0]. For example, if there is a bash script called "test.sh", when executing "./test.sh",argv[0] is the executable name of bash, and "./test.sh" is stored inargv[1]. However, this is hard to handle for programmers. So in bash, you can accessargv[0] with "$0" andargv[1] with "$1". We will actually verify this in a later section as well.

Verifying the values of a process's command-line arguments using procfs

The command-line arguments of each process can be referenced from/proc/<pid>/cmdline. For example, on the Linux machine where the author is currently logged in via ssh, the command-line arguments forrsyslogd, which collects system logs, were as follows:

sat@tea:~$pgrep rsyslogd568sat@tea:~$cat /proc/568/cmdline/usr/sbin/rsyslogd-n-iNONEsat@tea:~

The output looks a bit odd. A command-option-like string is connected after the executable-name-like string "/usr/sbin/rsyslogd". Moreover, there is no newline before the next prompt. This is not because the/proc/<pid>/cmdline outputs all arguments without any delimiters such as " " by design. In fact, each argument is separated by a null character (a byte with a value of 0, or "\0" in C) and bash does not display the null character on the screen. We can use binary dump tools likehexdump to confirm this behavior.

$hexdump-c /proc/568/cmdline0000000   /   u   s   r   /   s   b   i   n   /   r   s   y   s   l   o0000010   g   d  \0   -   n  \0   -   i   N   O   N   E  \0           000001d

Let's take a look at theargv[0] of bash instances on the system. The last field of theps ax output shows the command-line arguments separated by spaces, so we'll use this to list the existing bash instances on the system.

sat@tea:~$ps ax |grepbash   5239 pts/3    Ss+    0:00 /usr/bin/bash --init-file /home/sat/.vscode-server/bin/74b1f979648cc44d385a2286793c226e611f59e7/out/vs/workbench/contrib/terminal/browser/media/shellIntegration-bash.sh   8725 pts/4    Ss     0:00 -bash   8907 pts/4    S      0:00 /bin/bash ./test.sh   8909 pts/4    S      0:00 /bin/bash ./test.sh   8929 pts/4    S+     0:00 grep --color=auto bashfor p ax

We can see that the processes with pid 5239, 8725, 8907, and 8909 are bash instances. Among them, the process with pid=8725, where the first character ofargv[0] is "-", is the login shell where the author is running the above commands.

Thus, in reality,argv[0] is "/usr/sbin/rsyslogd",argv[1] is "-n", andargv[2] is "-iNONE".

Let's also take a look at an example of a bash script. The script we'll use here outputs "$0" and then sleeps indefinitely.

sat@tea:~$cattest.sh#!/bin/bashecho $0sleep infinitysat@tea:~$./test.shfg./test.sh #Output of`$0`^Z[2]+  Stopped                 ./test.shsat@tea:~$bg[2]+ ./test.sh &sat@tea:~$hexdump-c /proc/8909/cmdline0000000   /   b   i   n   /   b   a   s   h  \0   .   /   t   e   s   t0000010   .   s   h  \0                                               0000014

While the value of "$0" is the script name "./test.sh", the value ofargv[0] is "/bin/bash", and the script name is inargv[1]. This means that although it appears to the user as if they are directly executing the "./test.sh" file, the actual program running is bash. Bash interprets and executes the script, mappingargv[1] and beyond to $0 and later variables within the program.

Differences between the command name and command line arguments held by the kernel

Please note that theargv[0] mentioned in this article, which "usually contains the executable file name," is different from the command name seen by the kernel, which is displayed by commands likeps. For more information on the command name from the kernel's perspective, please refer to the following article.

https://dev.to/satorutakeuchi/command-name-from-the-perspective-of-the-linux-kernel-257l

The command name seen by the kernel is the "first 15 bytes of the basename of the executable file name" and is different fromargv[0].

Conclusion

I hope this article will reduce confusion about command names, executable file names, command line arguments, and the command line arguments that can be referenced from within the program's source code.