After re-reading some of the papers from Bell Labs, something clicked in my mind, and Iâm hooked. Iâm now reading âThe UNIX Programming Environmentâ, by Kernighan and Pike. Itâs got that fun style youâre probably familiar with, if youâve read K&R or the blue book. One of the first exercise questions, on the chapter on file systems:
(harder) How does the pwd command operate?
Seems like a fun one. My first guess is that it used $PWD from the environment. Letâs test that.
~ % PWD=/usr/local pwd
/home/gg
Thereâs a non-standard -L flag that seems to use $PWD. Maybe that one would do?
~ % PWD=/usr/local pwd -L
/home/gg
Not that either. Wait a second, whatâs pwd again?
~ % type pwd
pwd is a shell builtin
Hah, so I was calling the wrong one. So I replace pwd with /bin/pwd in my queries above, but the results are the same.
My next hypothesis is that it would somehow expand . to absolute. Iâm not aware of a UNIX command that performs such an expansion, so I man -k some keywords. Nothing.
Maybe pwd(1)? Itâs not terribly descriptive (itâs such a simple utility after all.) It doesnât explain the implementation at all, but links me to getcwd(3). Alright, letâs just look at the source.
int
main(int argc, char *argv[])
{
int ch, lFlag = 0;
const char *p;
/* pledge(), parse flags... */
if (lFlag)
p = getcwd_logical();
else
p = NULL;
if (p == NULL)
p = getcwd(NULL, 0);
if (p == NULL)
err(EXIT_FAILURE, NULL);
puts(p);
exit(EXIT_SUCCESS);
}
Unless -P is passed, it just calls getcwd. Letâs see what that âlogicalâ function does:
static char *
getcwd_logical(void)
{
char *pwd, *p;
struct stat s_pwd, s_dot;
/* Check $PWD -- if it's right, it's fast. */
pwd = getenv("PWD");
puts("PWD found in the ENV");
puts(pwd);
if (pwd == NULL)
return NULL;
if (pwd[0] != '/')
return NULL;
/* check for . or .. components, including trailing ones */
for (p = pwd; *p != '\0'; p++)
if (p[0] == '/' && p[1] == '.') {
if (p[2] == '.')
p++;
if (p[2] == '\0' || p[2] == '/')
return NULL;
}
if (stat(pwd, &s_pwd) == -1 || stat(".", &s_dot) == -1)
return NULL;
if (s_pwd.st_dev != s_dot.st_dev || s_pwd.st_ino != s_dot.st_ino)
return NULL;
return pwd;
}
So -L does check for $PWD, but only returns it if itâs pointing to the same inode, on the same device. You canât just manually override it to be anything you want. In that case, it falls back to the libc call to getcwd.
Makes me wonder what use this -L flag is in the first place. Maybe it has to do with symlinks?
/tmp % mkdir one
/tmp % ln -s one two
/tmp % cd one
/tmp/one % /bin/pwd
/tmp/one
/tmp/one % cd ../two
/tmp/two % /bin/pwd
/tmp/one
/tmp/two % /bin/pwd -L
/tmp/two
Makes sense. Anyway, thatâs not a very satisfying answer. I doubt the authors' intended answer would have been âdefer to the libcâ.
Ok, OpenBSD source didnât help. But Plan9 is Unicibus ipsis Unicior, so maybe we can find the answer there. Letâs inspect pwd(1):
DESCRIPTION
Pwd prints the path name of the working (current) directory.
Pwd is guaranteed to return the same path that was used to
enter the directory. If, however, the name space has
changed, or directory names have been changed, this path
name may no longer be valid. (See fd2path(2) for a descrip-
tion of pwd's mechanism.)
Hah, that was helpful! Now, from fd2path(2):
As an example, getwd(2) is implemented by opening . and exe-
cuting fd2path on the resulting file descriptor.
By the way, it turns out that fd2path(2) is a fascinating topic in its own right (cf. âLexical File Names in Plan 9 or: âGetting Dot-Dot Rightâ").
So my hypothesis above was correct, at least when it comes to Plan9. Also, another cool thing about Plan9 is that it lets me inspect a folder (âeverything is a fileâ, right?)
% cat . > foo
% cat foo
I can then run foo through hexdump and see whatâs in there.
So that was it, a brief excursion into different implementations of a simple command in UNIX. The difference in complexity is palpable. The Plan9 documentation is fun to read, and so is the code.