4 Comments
Jul 19Liked by Jacob Bayless

I'm reminded of random number generators. Designing them seems to be a dark art, but one thing they try to do is to maximize the period. Eventually every finite state machine must repeat, but by having a long period, they put that off as long as possible.

From: https://www.pcg-random.org/rng-basics.html

> A pseudo-random number generator is like a big codebook. In fact, a 32-bit generator with a period of 2^32 is like a codebook that is 16 GB in size, which is larger than the compressed download for the whole of Wikipedia. But a 32-bit generator with a period of 2^64 is like 64 EiB (exbibytes), or about six times a mid-1999 estimate of the information content of all human knowledge. How big would an actual book of 2^64 random numbers be? It would be so large, it would require the entire biomass of planet earth to make!

> One difference from a spy's codebook is that in a typical random generator everyone has the book. Thus, whereas spies would keep their codebook secret and could allow anyone to know the page number, random number generators do the opposite—the sequence of the generator (the codebook in our analogy) is known, but if we want to avoid prediction, no one must know where in that sequence we are looking. If the sequence is large enough (and there is no discernible pattern to the numbers), it will be infeasible to figure out what numbers are coming next.

Also, if you have a large enough state space and a clever design, you can set up a random number generator to generate Shakespeare:

From: https://www.pcg-random.org/party-tricks.html

> An argument for k-dimensional equidistribution goes like this: Suppose you bought a lottery ticket every week—how would you feel if you discovered that the clerk at the store was handing you a fake ticket and pocketing the money because, at 259 million to one, you were never going to win anyway. You might, rightly, feel cheated. Thus as unlikely as any particular k-tuple might be, we ought to have a real chance, however remote, of producing it.

> An argument against providing k-dimensional equidistribution (for large k) is that infinitesimal probabilities aren't worth worrying about. You probably aren't going to win the lottery, and your monkeys won't write Shakespeare.

> But what if rather than blindly search for Shakespeare, we manufacture a generator that is just on the cusp of producing it. [...]

Expand full comment
author

There's a close connection! Many pseudorandom number generators are built using chaotic maps (which is what the pure-math people call chaotic systems). They tend to be discrete-time systems which means that they can get away with fewer than 3 dimensions (because the state vector can "teleport" without crossing itself) but they repeat eventually because the state vector has finite resolution.

It's definitely a dark art, and not only do you want the period to be as long as possible, you also want to make sure that every finite-length sub-sequence reoccurs as often as possible. Otherwise, an attacker could try to learn the state vector and guess the next random number by recording the previous k samples, and matching that pattern to some unique index, as you mentioned.

RNGs don't really have control inputs as far as I'm aware, so there's not much we can do to apply feedback to guide their future evolution into some desirable state (and that's probably a good thing...), but it's definitely an interesting application of chaos theory!

This suggests another interesting question -- with a chaotic system, we may struggle to predict the future state due to finite measurement precision of the present state. But what if we can supplement that with imprecise measurements of several previous states? Sensitivity to initial conditions goes both ways (for positive and negative time), meaning that other states within measurement error of the present state would have very different past histories than the trajectory we've observed; therefore we can rule them out as possibilities, and be more certain about the true state. If the system has "cryptographically-secure" chaos, that may not help much, but perhaps there are practical chaotic systems where we can radically improve our predictive power this way. I'm not sure.

Looking for Shakespeare in a PRNG also reminds me of looking for sequences within the digits of pi. I suppose some programs that calculate pi's digits can be formulated as a recurrence relationship, so maybe there's a family of pi-calculating chaotic maps...?

Expand full comment
Jul 19·edited Jul 19Liked by Jacob Bayless

In this case, the trick is that PCG has an “extended” generation scheme with a large state space that consists of many linear congruentual generators. Each of these can be set up to reach a particular value in a given number of steps, which when combined, output a particular value.

Unlike some generators, PCG doesn’t reveal all of its internal state in the random number sequence - there is a mixer function. So this predictability isn’t as bad as it might seem.

There are also “counter based” random number generators where the internal state is trivially predictable (it’s a simple counter), but the mix function is complicated and difficult to reverse. A cryptography secure random number generator can be counter-based if the mixer is good enough - a secure hash. These are too slow for some purposes, though.

Regarding the repetition of subsequences, the goal seems to be to have them repeat as often as they normally would by chance for a true random number generator. However, there’s a limit to how much of the period you can use - another reason why having a large enough period is important.

Expand full comment
Jul 17Liked by Jacob Bayless

God may not play dice, but she does play billiards.

Fantastic article, I'm going to come back and savor it.

Expand full comment