Tuesday, September 1, 2009

Another Challenge From Hal & Ed: Random Numbers In the Shell

Hal Pomeranz and Ed Skoudis have a weekly blog, Command Line Kung Fu, where they set and solve interesting problems in both the Unix/Linux shell and Windows's CLI, COMMAND.COM.

This week's was how to generate random 8-digit integers.

Obligatory pedantic sidebar, which you can skip:

"Random" can mean a lot of things. If I flip a coin, and announce either "00000000" or "00000001", that's an 8-digit, random number. Ho hum. What I'm looking for here is integers uniformly distributed in the interval 0000000..99999999. I'll write this as U(0,10^8-1).
The problem is trickier than you might think. For example, the shell's random-number generator, $RANDOM, is U(0,2^15-1). Not big enough.

How about, say, summing calls to $RANDOM until the number is big enough?

Nope. Adding a bunch of uniformly-distributed numbers gives a random number, all right. But it isn't uniformly distributed anymore. It's a clumped-up, bell curve.

Hal explores several routes, culminating with a good one-liner:

$ head /dev/urandom | tr -dc 0-9 | cut -c1-8
But is there a way with the shell's own built-ins?

Yep. The answer's below. What's more, because this way doesn't call outside utilities or fork subshells, it should, in theory, be faster. And it is.

How much faster?

On my machine, the one-liner above generates 10,000 random numbers in just under 40 seconds. The code below does the same 10,000 in about a quarter of a second.

I've written it out as a script, with a dramatic reading in the comments, but if I were using it in something else, I'd ditch the big comments and put the code in-line.
#!/bin/bash
## Generate a uniformly distributed,
## 8-digit, random number: U(0,99999999)

## First, the logic

# (1) Get a uniformly-distributed random number,
# N = $RANDOM, up to R = 2^15-1 = 32767.
# That's U(0,R)

# (2) Stretch out the interval covered,
# up to nearly T = 10^8-1
# by multiplying by C = K/R.
# (We'll figure out K in a minute)
# This turns 0,1, 2, 3, ... into 0, C*1, C*2, ... K

# (3) Now add a random shift S = U(0,C-1)
# to turn these into 0, 0+1, 0+2, ... 0+C-1,
# C*1+0, C*1+1, ..., K+0, K+1, ... K+C-1.
# We can get S from another random number M = $RANDOM % C
# (% is the 'mod' operator)

# What's K? We want K+C-1 = T = 10^8-1, so we solve:
# K+C-1 = K+(K/R)-1 = 10^8-1; K[ (R+1)/R ] = 10^8;
# K = (10^8)*R/(R+1)

# For big R, that's so close to T
# that I'll just use K=T, C=T/R
# Our U(0,10^8-1) number is N*C + M

## Next, the code

# calculate the constants
(( R = 2**15-1, T = 10**8-1, C = T/R ))
readonly R T C

# do the calculation, and print the result out
# with 8, full digits.
printf "%8.8u\n" $(( RANDOM*C + ( RANDOM%C ) ))
Nothing up my sleeve ... Presto! Not bad, for a shell.


Update: Hal helpfully points out several things.
  • I misspelled his name.
Oof. Sorry. Fixed. Thanks.
  • There may be even faster solutions.
if the task were to generate 10K 8-digit
random numbers, I'd just suck 80K digits out of /dev/urandom and
chop them up into 8-digit chunks. That would be considerably
faster than running my command line 10K times in a row.
Maybe so .... Is there a way to do that in the shell? It's thought-provoking and an interesting challenge.

You couldn't store the 80K digits in the code, but you could keep from having to store them in a file, and the attendant I/O slowdown, by just providing them as output from a pipe.

(I'm told that in Windows, pipes are implemented with intermediate files. Not so in Unix and its offspring--the system really does hand data directly from one process to another.)

So, is there an easy way, in a shell script, to pull 8 digits at a time out of standard in?

  • Execution efficiency isn't why you use the shell.
If the task is to generate a single 8-digit number, I claim my solution
is better from a typing perspective. :-)
Just so. Hal's exactly right.

1 comment:

mrkdvsn said...

Obviously, the collection of a writer must be a bundle of creativity. New ideas of writing become the reason of success. It makes your work worthwhile. Everyone wants to praise your writing because they bound by creativity. mediumschap opleiding