See Jeff Run: 2010

Friday, December 24, 2010

Handling Failure

Here are two coding styles for handling failure. I read a lot of code that uses the first. I always use the second.

(1)

do-something;

if (success) {

do-this;
do-that;
...
[many lines of stuff]
do-some-other-thing;

}

else

{

print-error-message-and-exit;

}

(2)

do-something;

if (failure) {

print-error-message-and-exit;

}

do-this;

do-that;

...

[many lines of stuff]

do-some-other-thing;

I could go on at length to try to justify my strong preference, but I won't. In the end, it's just taste.

No, I take it back, I will rationalize a little. Here are some reasons I prefer #2.

It's shorter overall.
The short alternative comes first, so you don't overlook it in the noise.
There's one fewer nesting level to puzzle through. This also occasionally permits longer lines without word-wrap.
It feels more like exception handling, and saying what I mean: A problem? Let's bail.

In Perl, the solution is even more stereotyped and easy-to-read:

do-something OR die "it didn't work";
...

In bash, I use a shell function, die(), which I wrote to let me code in the same style:

do-something || die "It didn't work"

...

Monday, November 29, 2010

Process Whack-a-Mole: An Alternative Approach

Gosh. I haven't blogged in, like, forever. Inspired, as is so often the case, by Hal Pomeranz and friends, here's something.

Over at his Command-Line Kung Fu blog, Hal has posted a solution to the problem of writing a process-whack-a-mole command, that watches for any new process and kills it.

(Hal's self-imposed constraint is always that his solutions really be command lines -- they can't be full-blown scripts -- and that they not use Perl, or other languages that are powerful enough that their use would feel like cheating.)

Nothing wrong with his approach at all, but here's another. (I mailed it to Hal, who said, "Post it somewhere, Dude.")

$ ps -e o pid,ppid,cmd | grep -v $$ > baseline; while : ; do
> sleep 5
> ps -e o pid,ppid,cmd | grep -v $$ > now
> join -v 2 baseline now
done

This is, at least, a single command line. :-)

It shows off a couple of things:

join -v
the output of ps is pre-sorted by process-number

The didactic downside? It doesn't show off arrays or "[[" . (I spent a minute in philosophical reverie, contemplating the elevated virtue of using files to store data for shell scripts, but then realized this is engineering and snapped out of it.)

Oh, and I completely love Hal's "| grep -v $$", which I never would have thought of.

On reflection, the only thing the join would normally see is the ps itself, which is gone by the time of the join, so for the real deal, this can be pared down a titch.

# ps -e o pid,cmd > baseline; while : ; do
> sleep 5
> ps -e o pid,cmd > now
> kill -9 $(join -v2 -o2.1 baseline now)
done 2>/dev/null &

Just kill everything new and ignore the error message from not being able to kill the ps:

"Kill them all and let The Kernel sort them out."

Wednesday, June 16, 2010

Just in case

I just learned something new about shell syntax -- specifically, about the case statement.

The shell's big enough that I expect I'll be able to keep learning things about it for a long time, even though I've been writing shell scripts for ... lessee ... about 30 years.

Here's today's:

I'm used to writing this:

case $key in
whatever) do-something ;;
*) some-default-behavior ;;
esac

I just read a Fedora system script that looks like this, instead:

case $key in
(whatever) do-something ;;
(*) some-default-behavior ;;
esac

The leading paren is optional, but legal. And not just for bash, but for any POSIX shell. Amazing. Also, the last item doesn't have to have the semis, so it could even be this:

case $key in
(whatever) do-something ;;
(*) some-default-behavior
esac

Wednesday, May 26, 2010

Back Off Man, I'm a Scientist

"I have no faith in anything short of actual measurement & the Rule of Three"

-- Charles Darwin

In an interminable thread about recycling computer parts, on the Boulder Linux Users' Group mailing list, Davide Del Vento injects a completely unsubstantiated and largely dubious claim:

The point of the discussion is what we agree is ok to "kill" or "destroy", and what is not. Is it ok to kill a cow to feed my kids? Most people would say "yes". Is it ok to kill Jeffrey Haemer to feed my kids? Most people would say "no".
Doing science means collecting data. The polls in the left sidebar test his claim.

We shall see, eh?

Friday, May 14, 2010

NCAR Software Manufacturing Presentation, the Movie

Courtesy of Davide Del Vento, here's the video in H.264.

Wednesday, April 28, 2010

NCAR Manufacturing Presentation

What I'll Say

How I'll Say It

Having The Exact Code You Need

Building What You Have

Testing What You've Built

Looking Closely at What You Tested

Sharing It After You've Looked At It Long Enough

"What Was He Talking About?"

"Who Was That Masked Man?"

Thursday, April 8, 2010

Performance Tuning Shell Scripts? Why, yes.

Can I write a command-line that lists open TCP ports as quickly as nmap? No way. But can I make one that's fast? Yes, indeedy.

The trick is to do it all in a single shell command.

I always read Hal Pomeranz's weekly, Command Line Kung Fu blog. Inevitably, either I learn something because he tells me, or I learn something because it gets me thinking about how I might do what he's done differently. (That's not an exclusive-or.)

This week, Hal writes a command-line that looks for open TCP ports.

Here's his command. (For a dramatic reading, see his post.)

for ((i=1; i<65535; i++)) ( echo > /dev/tcp/localhost/$i ) 2>/dev/null && echo $i; done

It isn't fast, and he ends with, "But really, if speed were a factor you'd be using nmap instead."

But can I get it to run faster? At least a little? Why, yes I can.

Replacing this:

for (( i=1; i<65535; i++ ))

by this:

for i in {1..65535}

and this:

echo > /dev/tcp/localhost/$i

by this:

> /dev/tcp/local/host/$i (or even < /dev/tcp/localhost/$i )

make minor improvements.

But a bigger win comes from getting rid of subshells.

The parens around the echo create a subshell, which requires a fork() and an exec(), each time through the loop.

By getting rid of those, and discarding error messages at the end of the loop, all the work takes place right in the parent shell.

How much does that improve things? A lot. Here are the numbers.

$ time nmap -p1-65535 --open localhost

real 0m1.366s

user 0m0.280s

sys 0m0.850s

$ time for (( i=1; i<65535; i++ )) ( echo > /dev/tcp/localhost/$i ) 2>/dev/null && echo $i; done

real 1m55.727s

user 0m12.640s

sys 1m28.200s

$ time for i in {1..65536} ; do >/dev/tcp/localhost/$i && echo $i; done 2>/dev/null

real 0m6.203s

user 0m3.290s

sys 0m2.780s

Tom Christiansen claims he can usually write Perl scripts that run within a factor of 'e' (2.718281828...) of the equivalent C program. Here, I'm only doing half that well, but that's not bad.

Even in the shell, sometimes a little tweak makes a big difference.

I'd offer extra points to the reader who knows an attribution for the quote "Make it work, then make it fast," but that would require readers.

It was, however, Frank Zappa who said, "Speed will turn you into your parents."

Monday, March 29, 2010

There's a Lesson Here, but I Can't Remember What

Perhaps you are graced with a mind like a steel trap. I have always had a mind like a steel colander.

I frequently read stuff I wrote, and think, "That's clever. Too bad I can't remember ever knowing that, much less writing about it."

Just now, I was reading an RHCE-prep guide that was explaining pr. I thought, "pr? Geez. There's some ancient history. Next they'll be explaining FORTRAN line-printer-carriage-control codes."

This starts me reminiscing.

"There was an old, Software Tools filter, in RATFOR, that would interpret those codes, called asa (a reference to the American Standards Association, a progenitor of ANSI's). At some point, it was ported to C/Unix. I should see if it's on my Ubuntu desktop."

(As an aside, and before I forget to say it, Kernighan and Plauger's Software Tools is the best book ever written about software engineering.)

It's not there. I think, "Well, okay, I'll install it.

I try apt-cache search and don't find it. Rats.

I google for an Ubuntu version. Nothing. A Linux version? Nothing. Humph.

Well, surely it was in UNIX Version 7. I remember some work Tom Christiansen put in, collecting Perl implementations of old, V7 commands. Maybe he found an implementation of asa(1) that I can just port.

Except I can't remember what he called his collection. I go back to googling, this time for Tom's collection. After a bunch of failed tries, I finally get a hit. You guessed it: a column by Jeff Copeland and, um, me -- Software Ptools -- which I have no recollection of ever having written. How embarrassing.

I should have given up right there, while I was behind, but Noooo .... ("What would you pay? But wait! There's less!")

Had we provided the name of Tom's project? Sho 'nuff: "Perl Power Tools." Maddeningly, the link in our column has gone dead. The universe hates me and there's no beer in the fridge.

I scroll down, hoping for another link. Ooh! Look! There's code! We implement a V7 command, right there in the column, to contribute to PPT ourselves.

We implement asa(1)

Oh, ow.

(I've now learned that the entire Perl Power Tools project has been moved to the CPAN by Casey West.)

Sunday, March 14, 2010

Estimating: The Envelope, Please.

How much does it cost Amazon to ship me a Kindle book? About a nickel.

How much did it cost us to get letters saying we're going to get census forms? About $50 million.

How do I get these? Back-of-the-envelope calculations.

Back-of-the-envelope calculations are the quick calculations we do, from simple assumptions, to give us a sense of rough sizes. They may not let us tell whether the answer is 5 or 9, but they can let us see the answer isn't 5 billion -- a 5 followed by 9 zeros.

My sister, Jo, the Tattooed Lady, wondered out loud, this week, "... just how many millions of dollars it cost The US Commerce Dept (read 'us, the taxpayers'), to send everybody in the US a letter this week that says that they will be sending us a census report to fill out. 'Ooooooo. Look out!!!! Here it comes!!!' "

Let's do a back-of-the-envelope calculation. (No pun intended.) It's not hard.

How much does it cost to send a letter? A first-class stamp costs $0.44. The USPS loses money, which is why they want to cut back to 5-day-a-week delivery. So the real cost of processing and delivering a letter is something like $0.50. Could it be $0.30? Or $0.72? Maybe. But it's less than five bucks and more than a farthing.

What's the cost of producing each letter -- printing, stuffing, and so on? At Kinko's, they'd charge you somewhere between a nickel and a dime. Ditto for the public library. Real money, but we're still talking a total cost of around half a buck per letter.

They sent one to each household, and America has over 100 million of those.

We paid at least fifty million bucks for those letters. $50,000,000 . As Jo says, "Here it comes."

But what did we pay to draft the letter, translate it into a bunch of other languages, and get all that approved and processed through our Federal bureaucracy? Probably not even an extra fifty million.

Here's a second example: What sort of profit is Amazon making on Kindle books? I wondered this last year when I bought my Kindle.

Let's see .... Once they've payed the publisher for the book, they probably get a machine-readable version for next-to-nothing -- maybe free. Converting to the Kindle data format is probably done by a piece of software that they wrote once, and amortize across all their books, which means that probably doesn't contribute much either. Amazon's big cost is probably delivery -- what they pay Sprint to get it to us.

So how much is that? Hand me that envelope.

They'll sell me a subscription to a blog for about $2/month. The content is free if I have a browser, and I can't imagine they're trying to make a lot of money from these, either. The $2 is probably Amazon's delivery cost.

The kind of person who reads a blog on his Kindle is a junkie who, I'll guess, might read it three times a day. That's 30*3 = 90, or about a hundred deliveries a month: two cents a day. Books are bigger, but they come over so fast that I bet connection-set-up and -tear-down costs dominate the price.

Amazon's sells new releases at $9.99. This calculation says almost all of that is profit. Their delivery cost, I guessed, was under a nickel a copy.

How close did I come? In a January press release, Amazon revealed it was "less than six cents."

When the government can send us useless letters by Kindle (or email), they'll cost us far less.

"But what could the government do with its vast inventory of surplus envelopes?" the politicians will ask.

Two suggestions come to mind.

Thursday, March 11, 2010

Collatz Conjecture

I like this shell script, by Kyle Anderson.

I found out about it because Paul Hummer's created a Northern Colorado Linux Blog aggregator.

Thanks, Paul! And Kyle.

Tuesday, March 9, 2010

Generating Arbitrary Numbers

Sometimes, "arbitrary" and "random" aren't synonyms. Here's an example of how to generate the former without their being the latter.

One nice thing about knowing people who make me think is that it gives me things to post about. For example, Hal Pomeranz, Ed Skoudis, Tim Medin, and Paul Asadoorian have a weekly blog, called Command Line Kung Fu, that compares and contrasts command-line tricks for different operating systems.

I only every use Linux, so I read Hal's stuff and skip the Windows and DOS stuff. Even with this, every week or two Hal's post makes me think, "Wait! Here's something he didn't mention!" (typically because it's slightly off-topic).

In this week's column they generate random time intervals.

Here's Hal's punchline:

[...] in larger enterprises you might have hundreds or thousands of machines that all need to do the same task at a regular interval. Often this task involves accessing some central server-- grabbing a config file or downloading virus updates for example. If a thousand machines all hit the server at exactly the same moment, you've got a big problem. So staggering the start times of these jobs across your enterprise by introducing a random delay is helpful. You could create a little shell script that just sleeps for a random time and then introduce it at the front of all your cron jobs like so:

0 * * * * /usr/local/bin/randsleeper; /path/to/regular/cronjob

(The column sketches how to implement 'randsleeper'.)

Yep. This works fine.

But as it stands, the cronjob could kick off one job at 9:59, and the next one at 10:00. What if I want to spread my machines across the hour, but want each machine to use a fixed timeslot, so the elapsed time between runs is a full hour for any given machine?

Here's one way:

Pick an arbitrary machine-specific number, like the IPV6 address, or the MAC address of the ethernet card,
Convert it to an integer.
Take it mod the time interval.
Use that number for the time to start the job.

Here's code to do that, which, as always, I grow, bit-by-bit, on the command line, by getting a little piece right, recalling that piece, and adding another step.

Step 1:

Get a unique, but arbitrary, machine-specific identifier (the MAC address of the first NIC).

$ ifconfig | awk '/HWaddr/ {print $NF; exit 0}'

00:1e:c9:3d:c0:0c

Step 2:

Strip the colons

$ ifconfig | awk '/HWaddr/ {print $NF; exit 0}' | sed 's/://g'

001ec93dc00c

And interpret the result as a hex number. (The shell requires hex numbers begin with "0x", so I'll just tack that on.)

$ echo $(( 0x$(ifconfig | awk '/HWaddr/ {print $NF; exit 0}' | sed 's/://g') ))

132225286156

Step 3:

Mod it by the number of seconds in an hour, to get an arbitrary second.

$ echo $(( 0x$(ifconfig | awk '/HWaddr/ {print $NF; exit 0}' | sed 's/://g') % (60*60) ))

556

Step 4:

Always sleep until that many seconds after the hour, then kick off the job.

$ crontab -l > Cronjobs

$ echo "0 * * * * sleep \$(( 0x\$(ifconfig | awk '/HWaddr/ {print \$NF; exit 0}' | sed 's/://g') % (60*60) )); /path/to/regular/cronjob" >> Cronjobs

$ crontab Cronjobs

Ta-da.

(For step four, I'd probably actually kick off cron -e and paste the line in; otherwise there are just too many ugly backslashes to get wrong.)

Warning: This will not work if your machines' MAC addresses cluster around the same value, mod 556. :-)

Tuesday, March 2, 2010

Better Safe Than Sorry: Writing Code that Writes Safer Code

I write code that writes code. A lot. On the command line. It's safer.

Hal Pomeranz and co-conspirators have another fine post up about command-line programming. In it, they write a clever loop to rename a list of numbered attachments.

Here's Hal's code:

$ cat id-to-filename.txt | while read id file; do mv attachment.$id "$file"; done

(His input file is a two-column list, like this:

$ cat id-to-filename.txt

...

43567 sekrit plans.doc

44211 pizza-costs.xls

...

And, actually, Hal takes the list from stdin, with a less-than sign. Blogger whines and eats my posts when I use those -- it thinks I'm opening an unclosed HTML tag. What a pain.)

The quotes are there because without them the code tries to do this:

mv attachment.43567 sekrit plans.doc

which gets the mysterious message back

mv: target `plans.doc' is not a directory

$

Uh-oh.

When this happens, I usually don't know what the message means. Figuring it out eats time. Plus, with my luck, some files have been moved but others haven't. Recovering from that eats even more time.

Here's what I type instead:

First step: I write code that says what I'd like to do.

$ cat id-to-filename.txt | while read id file; do echo "mv attachment.$id $file"; done

...

mv attachment.43567 sekrit plans.doc

mv attachment.44211 pizza-costs.xls

...

Often, when I do this, I scan the output, notice something's going to go wrong, and fix it.

"Oh. Oops. I need quotes. I'm an idiot."

Note that no files were moved; my code's only echoing commands.

Next step: I recall my command-line, with an up-arrow, and add fixes. I keep doing that until the commands I see are the ones I actually want.

$ cat id-to-filename.txt | while read id file; do echo "mv attachment.$id '$file' "; done

...

mv attachment.43567 'sekrit plans.doc'

mv attachment.44211 'pizza-costs.xls'

...

Look okay? Yep.

Last step: I recall the previous command, one final time, and pipe it to a subshell, which executes the commands my code writes.

$ cat id-to-filename.txt | while read id file; do echo "mv attachment.$id '$file' "; done | bash

$

When I'm nervous about what I'm doing, I even try out the first line by itself, like this:

$ cat id-to-filename.txt | while read id file; do echo "mv attachment.$id '$file' "; done | head -1 | bash

$

I check the result, and if I've done the right thing I go ahead and run the rest.

$ cat id-to-filename.txt | while read id file; do echo "mv attachment.$id '$file' "; done | sed 1d | bash

$

"Never write code on the command line when you can write code that writes code on the command line," I always say.

Monday, March 1, 2010

NFS Made Easier

Automounting disks is magic. Autodetecting what to automount is magic-er.

Last weekend I set up my pogoplug as an NFS server, and installed and configured autofs to look for specific directories on the pogoplug. This weekend, I revisited that configuration and learned I was working too hard.

In /etc/auto.master, the entry "/net -hosts" says, "When I type 'ls /net/foo', do these steps:"

look for a host named 'foo.com' ,
ask foo.com what filesystems it's exporting
mount them under /net/foo.com
now do the 'ls'

No modifying /etc/auto.misc every time you want to automount a new machine: the machine just appears. There is an /etc/auto.net, but it's a script that autofs uses to ask a remote host what it's exporting.

Using remote filesystems could be even easier and more transparent (I could, for example, imagine having upstart manage the whole process, and having autofs be installed by default) but not much.

Tuesday, February 23, 2010

The Magic SysRq Key

My netbook hangs about once a day. I can now reboot it more easily, thanks to the magic SysRq key.

I'm not sure why it hangs, and I'm not very worried about it. Netbook releases are new, and I remember the same kinds of problems when Linux notebooks were new. I had one laptop that worked perfectly except for sound, the mouse, and networking. :-) The next release fixed all those. (Even now, I have a pair of old notebooks running Jaunty Jackalope because Karmic Koala won't run their fans. They overheat and shut themselves off within minutes.)

I even remember the problems when Unix PCs were new. I was in Dave Barach's living room, in 1983, when he got a call from our Maryland office, which had just received the editor Dave had sent them for our as-yet-unreleased PC/IX product, done under contract to IBM, which was to run on stock IBM PC/XTs.

I could only hear Dave's side of the conversation.

What do you mean, "The color's wrong"? What do you mean, "color"?

We'd never seen a PC with a color monitor. Weren't all displays monochromatic? Once we found out how they worked, Dave fixed the problem.

I am, however, discomfitted by having to power-cycle a box -- I worry about unsynced disks and stuff. Luckily, some neuron fired and I remembered the magic SysRq key. I have had to use it so little, over the years, that I didn't remember any details, but Wikipedia came to my rescue. It does the trick, and lets me do a soft re-boot from my keyboard.

(My keyboard doesn't actually have a key labelled "SysRq," but Alt-"Print Scrn"-k works just as well.)

No muss, no fuss, no bad blocks. Magic.

Thrupence and sixpence every day.

Space, The Final Frontier

Last night, I gave my Dell netbook a terabyte drive.

A few weeks ago, Scott Mann talked me into buying a Pogoplug, an embedded-Linux device, about the size of my palm, with an ethernet port and four USB ports to hang external disks off of. Plug it in, plug in a drive (or four), and you're done: no muss, no fuss. Kristina gave me a terabyte drive for Christmas, so they're now up and running in one corner of a bookshelf.

Unfortunately, it's built to serve disks up through a web interface, so each byte goes out the door, off to their servers in San Jose or Boca Raton or Minneapolis or wherever they are, and back down to whereever it's wanted.

But wait! It's a Linux box.

The OpenPogo community has a repository of downloadable packages that you can use to customize it in a variety of ways. (It's a Debian-based distro, and the package manager is ipkg.) I turned it into an NFS server by installing unfs3, exported the disk, and it was instantly visible locally.

On my netbook, I then installed an automounter, autofs, and -- ta-daa! -- now the disk's there whenever I look at it.

Space. The final frontier. [ Cue Star Trek theme. ]

Tuesday, February 16, 2010

Where's My Jet Pack?

At some point, I'll outgrow the wonder of new technology, but I hadn't by this weekend.

My bedside computer is a Dell mini-10 netbook. I've had it a week. Two gig of memory, 160 gig of hard disk, three pounds. The 6-cell battery lasts for hours. Whoof.

I bought it, on-line, from Wal-Mart, who delivered it in three days. Same week. I installed three different OS's on it, and settled on Ubuntu 9.10 (Karmic Koala).

Ubuntu is a lineal descendent of Unix, which I first installed on an IBM PC/XT, of which this Dell is, itself, a lineal descendent. The XT had an Intel 8088 processor, 256Kb of memory, and a 10Mb hard disk with another 10Mb expansion chassis to give it enough space to host multiple users. Those, and its cathode-ray-tube screen, took up a big chunk of a desk.

A megabyte, for those who don't rememeber them, is a milli-gig. A kilobyte is a milli-meg. "You had ones *and* zeroes?"

Kristina doubled the netbook's memory with a kit, also from Wal-Mart, helped by a call to a friendly Dell tech support guy, in Chennai, India.

Let me just pause to say that again: a phone call to Chennai, India. My mother's childhood phone number was 1. They had the first phone in Haynesville, Louisiana. To call my grandmother, we talked to operators. "I think Stella's down 't the beauty parlor. I'll ring down there."

This weekend, on my computer, I watched Spartacus, in bed. I downloaded and read a Kindle book, bought on-line from Amazon.com, with "Kindle for PC." I made a Skype video call to my sister, Jo, in Oregon. I did a software release at work from my living room while I was arranging for a barbershop quartet, from the Boulder Timberliners, to come seranade Kristina for Valentine's Day, at a restaurant I took her to.

I arranged it on my cell phone -- you know, the phone I carry in my pocket? The one running Linux? With the videocamera in it? That I get email on? That could give me turn-by-turn, voice directions to get to the restaurant? Which it could do by knowing where I was from the signals it was getting from the GPS satellite, in outer space?

My father helped open Vandenberg A.F.B., America's first operational missle base. He had computers with big cabinets that held tape drives with 7" reels. The computers and drives and disks took up big rooms and had their own air conditioning. They had bugs, too. We watched the first Atlas ICBMs launch, go astray, and then blow up. Made for cool sunsets.

Of course, we drove to the Valentine's Day dinner in a car. When he was a kid, my father's little sister was run over and killed, in Brooklyn, by a horse cart.

Time to get up and shower, so I can find out what new things I'll see today.

I feel like Duck Dodgers.

See Jeff Run