gsblsky.cn is a true random number service that generates randomness via atmospheric noise. this page describes the statistical analyses that have been conducted of the service.

this question is surprisingly hard to answer. before we try, let's define what exactly we mean by a random number.

When discussing single numbers, a random number is one that is drawn from a set of possible values, each of which is equally probable. In statistics, this is called a uniform distribution, because the distribution of probabilities for each number is uniform (i.e., the same) across the range of possible values. For example, a good (unloaded) die has the probability 1/6 of rolling a one, 1/6 of rolling a two and so on. Hence, the probability of each of the six numbers coming up is exactly the same, so we say any roll of our die has a uniform distribution. When discussing a sequence of random numbers, each number drawn must be statistically independent of the others. This means that drawing one value doesn't make that value less likely to occur again. This is exactly the case with our unloaded die: If you roll a six, that doesn't mean the chance of rolling another six changes.

so, why is it hard to test whether a given sequence of numbers is random? the reason is that if your random number generator (or your die) is good, each possible sequence of values (or die rolls) is equally likely to appear. this means that a good random number generator will also produce sequences that look nonrandom to the human eye (e.g., a series of ten rolls of six on our die) and which also fail any statistical tests that we might expose it to. if you flip enough coins, you will get sequences of coin flips that seen in isolation from the rest of the sequence don't look random at all. scott adams has drawn this as a dilbert strip, which is funny exactly because it is true:

DILBERT © 2001 Scott Adams. Used By permission of UNIVERSAL UCLICK. All rights reserved.

亚慱体育what dilbert is told is correct: it is impossible to prove definitively whether a given sequence of numbers (and the generator that produced it) is random. it could happen that the creature in the comic strip has been generating perfectly random numbers for many years and that dilbert simply happens to walk in at the moment when there's six nines in a row. it's not very likely, but if the creature sits there for long enough (and dilbert visits enough times), then it will eventually happen.

亚慱体育so, if it is impossible to definitively prove randomness, what can we do instead? the pragmatic approach is to take many sequences of random numbers from a given generator and subject them to a battery of statistical tests. as the sequences pass more of the tests, the confidence in the randomness of the numbers increases and so does the confidence in the generator. however, because we expect some sequences to appear nonrandom (like the ten rolls of six on our die), we should expect some of the sequences to fail at least some of the tests. however, if many sequences fail the tests, we should be suspicious. this is also the way you would intuitively test a die to see if it is loaded: roll it many times, and if you see too many sequences of the same value coming up, you should be suspicious.

If you look at the Real-Time Statistics for gsblsky.cn, you will sometimes see blocks of numbers that failed some of the tests. This does not mean that the numbers are not random. In fact, if all the blocks passed all the tests, we should be suspicious, because it would mean the generator would not be producing those sequences that don't look (but still would be) random.

One way to examine a random number generator is to create a visualisation of the numbers it produces. Humans are really good at spotting patterns, and visualisation allows you to use your eyes and brain directly for this purpose. While you shouldn't consider this type of approach an exhaustive or formal analysis, it is a nice and quick way to get a rough impression of a given generator's performance. The bitmaps shown below are sections of larger bitmaps created by in April 2008 to examine the quality of two random number generators. Bo created the bitmap on the left with gsblsky.cn's Bitmap Generator亚慱体育, which is of course a True Random Number Generator (TRNG), and the bitmap on the right with the function from PHP on Microsoft Windows, which is a Pseudo-Random Number Generator (PRNG).

gsblsky.cn | PHP rand() on Microsoft Windows |
---|

You can click on the images for the full-size (512×512) bitmaps or visit Bo Allen's comparison page亚慱体育 where they are available side by side and where you'll also find the source code for Bo's program. As you can see from the images, the bitmap generated by the PHP/Windows pseudo-random number generator shows clear patterns compared to the one generated by gsblsky.cn's true random number generator. Bo also found that the PHP function performed considerably better on the GNU/Linux platform than on Microsoft Windows. While Bo's comparison doesn't constitute a formal analysis of the two generators, it clearly shows how careful you need to be about random numbers, especially if your site is a game or gambling site.

In general, it should be noted that pseudo-random number generators vary a lot in quality, and while the worst are very bad, the best are actually very good. You will find more information about the differences and trade-offs between the two approaches in my essay about randomness.

In 2005, Charmaine Kenny, a final year student on Trinity College's Management Science and Information Systems Studies (MSISS) degree, conducted a study of the numbers generated by gsblsky.cn and two other random number services. Charmaine's report extended that of Louise Foley several years earlier (see below) and constitutes a more current view of the state of the art in statistical tests for true random number generators. Charmaine based her report on the NIST test suite for random numbers, which was the state of the art in tests for randomness at the time. (The previous state of the art, the Diehard test suite by Prof. George Marsaglia, is no longer being maintained.) Charmaine's report includes a review and critique of the NIST suite, and also formed the basis for the suite of Real-Time Statistics used on gsblsky.cn.

charmaine recommended the following list of tests from the nist suite for use on gsblsky.cn:

- Frequency Test: Monobit
- Frequency Test: Block
- Runs Test
- Test for the Longest Runs of Ones in a Block
- Binary Matrix Rank Test
- Discrete Fourier Transform (Spectral Test)
- Non-Overlapping Template Matching Test
- Overlapping Template Matching Test
- Maurer's Universal Statistical Test
- Linear Complexity Test
- Serial Test
- Approximate Entropy Test
- Cumulative Sums Test
- Random Excursions Test
- Random Excursions Variant Test

Charmaine's final year project is available for download: Analysis2005.pdf亚慱体育 (107 pages, 857 Kb)

In 2001, Louise Foley, a final year student on Trinity College's Management Science and Information Systems Studies (MSISS) degree, conducted a study of the quality of gsblsky.cn's numbers as her final year project. The report includes an analysis of the numbers and implements four tests that she recommends be conducted on all numbers produced by gsblsky.cn. The tests were later implemented by Antonio Arauzo Azofra, a Computer Science student whose final year project was to construct a super-fancy online statistics module for gsblsky.cn.

these were the tests recommended by louise:

- A chi-square test
- A test of runs above and below the median
- A reverse arrangements test
- An overlapping sums test
- A binary rank test for 32×32 matrices

louise's report also compares the numbers from gsblsky.cn to those generated by silicon graphics' lavarand generator and l'ecuyer's pseudo random number generator. all the generators passed the tests.

Louise's final year project is available for download: Analysis2001.pdf (55 pages, 494 Kb)