Wednesday, December 9, 2009

Octal NUL

"Well, here's another nice mess you've gotten me into." -- Oliver Hardy

It's disappointing, but not surprising, to see edge cases behave differently in different programs. It is surprising when they're inconsistent within one program: bash.

Oh, the behaviors are standards-conforming and well-documented. Still, watch this, keeping in mind that printf and echo are shell built-ins:
# within single quotes, the shell doesn't expand metacharacters
$ echo 'a\0000b' | od -c
0000000 a \ 0 0 0 0 b \n
0000010
# but echo -e interprets \n, \nn, \nnn, and \nnnn as octal characters
$ echo -e 'a\0000b' | od -c
0000000 a \0 b \n
0000004
# printf, however, only takes \n,\nn, and \nnn as octal
$ printf 'a\0000b' | od -c
0000000 a \0 0 b
0000004
# but $'...' is interpreted as a C string,
# and in C, \0 terminates a string
$ echo -e $'a\0000b' | od -c
0000000 a \n
0000002
In the last case, it's the shell interpreting the octal string before it even gets to echo.

Want proof?
$ cat <<< 'a\0000b' | od -c
0000000 a \ 0 0 0 0 b \n
0000010
$ cat <<< $'a\0000b' | od -c
0000000 a \n
0000002
Note also that much of this quirkiness only appears when you start using four-digit, octal representations and mess around with NUL (\0). Try keeping all those details in your head, bucko!

Me, I can't. Or won't. Good thing it's all documented in the man page.

"A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines." -- Emerson

Hat Tip: I started looking at this after puzzling over a line in Hal Pomeranz's latest Command-Line Kung Fu column.

No comments: