Douxx

Posted on Feb 7

How I Built a Random Number Generator (Sort Of)

#programming #radio #python #opensource

TL;DR: I made an Hybrid Hardware Random Number Generator (HHRNG) using radio noise. You can find the full source code on my GitHub.

Generating randomness is fascinating, and I always wanted to go deeper than just importing some library into my python project and calling .random(). So I set out to build my own library with its own .random() function. Revolutionary, I know.

But I didn't want to just cobble together pseudo-random values. I wanted to build a hardware random number generator (HRNG): one that uses actual physical processes to generate entropy. Think Linux's /dev/random, which harvests entropy from environmental noise: keyboard timings, mouse movements, disk I/O patterns. Windows also have a component named CNG (Cryptography Next Generation) that uses similar inputs.

When I hear noise, the first thing that comes to mind is this:

For the uninitiated, this is an SDR (Software Defined Radio), a real-time visualization of the radio spectrum around me. Notice the constant flickering at the bottom? That's noise, pure and simple. Nothing is transmitting there, yet the intensity is constantly dancing. It's the same static you hear when tuning to an empty FM frequency. If we could capture and measure that movement, we'd have a source of true randomness, unpredictable and nearly impossible to manipulate.

So, based on that, I decided to get to work. Note that I will be using a SDR blog v4 to capture radio signals and compute them.

Here is a global overview of what I needed to do to get this project done:

☐ Find a way of getting the radio signals on my PC
☐ Process them to generate randomness
☐ Create the core functions
☐ Add other functions on top of that
☐ Check that everything works and isn't easily affected by external events

Setting Up the Foundation

So, first of all, I needed to find a way to programmatically access my radio data, served as samples. I'd tried this on Windows before with no luck, so this time I went straight to a template that connects over the network on an rtl_tcp server running on one of my Raspberry Pis.

Once everything setup, I was able to receive samples. The code is quite simple, it uses a socket to connect to the rtl_tcp server, and then reads samples from it. Here is a part of the read_samples function, that reads and processes the signals:

As we can see, it computes IQ samples. I and Q refer to two components of a complex baseband signal. They capture both amplitude and phase of the RF signal.

Well, anyways:

☑ ~~Find a way of getting the radio signals on my PC~~

Creating Randomness

The next step is building the "randomizer". It is basically a function that will take, as an input, samples from our SDR, and output a seed we will base our calculations on later.

To do this, I've tried a couple of ways (2, actually), but the most efficient method was using Phase difference.

The phase is the angle of the signal at a given moment. It can be easily calculated using both values of an IQ sample using this formula:
angle = atan2(Q, I).

We'll do this for both the current iteration value (n), and the previous one (n-1).

After getting both angles, we will retrieve the phase difference, that will tell us in which direction the signal rotated since the previous sample. It can be done like this: delta = (current_angle - previous_angle + π) % 2π - π

The wrapping in π is to ensure that the result stays between -π and +π, and doesn't make big jumps, like going from 2° to -359° just by rotating 3°.

We got the rotation angle, but that value is still too complex to be properly processed. We will reduce it to a simple bit. The easiest way to do this is by checking its direction:
bit = delta > 0 ? 1 : 0

Now that we got a bit, we're simply repeating this n samples times, to get a randomly generated bits array.

But there's a problem. After running the program, and logging some statistics, I observed more 1s than 0s, which isn't great, since the program would tend to go on the 1 side more than the 0 one.

Fixing that is easy, and there are plenty of methods available. I used a very simple one: XOR-ing all the values together, to spread the 1s and 0s. For those who don't know what it implies, it is a simple bit operation:

0 & 0 -> 0
0 & 1 -> 1
1 & 0 -> 1
1 & 1 -> 0

I used that to compare both bits each others, and the results were perfect: we went from a near 20% difference to a max of around 2%.

Finally, for convenience, I'm hashing the results to get an uniform seed to continue with (using SHA256).

☑ ~~Process the signals to generate randomness~~

Code explained in this part

Building the Core Functionalities

We now have our random seed source, but now we have to let the user actually use it.

My first plan: every time someone calls .random(), grab fresh samples from the SDR, generate a seed, and convert it to a float.

This was painfully slow: we were hitting the hardware for every single random number.

Solution: make it hybrid. Use a fast Pseudo-Random Number Generator (PRNG) that gets periodically reseeded with fresh radio entropy, instead of hitting the SDR every time.

This was fairly straightforward to achieve:

We first update our seeder to update its seed each X milliseconds, using a runner thread.
Then, we create a Pseudo-Random Number generator. I used a simple Linear Congruential Generator (LCG), made with the help of this very good video.
We change our code to constantly feed the generator with our new seed.

Now the code uses the LCG as a base, but it is always reseeded with our seeds taken from our radio samples.

Now, let's build our core random class, that we will call RFDom. We will optionally pass arguments to it, such as the rtl_tcp host, frequency ranges to scan, gain, and other settings.

.random() (returns [0.0, 1.0)) was pretty simple to do, as it used the same code as the youtube tutorial, same for .randint(a: int, b: int), that returns [a, b]. I've just had to implement a include: bool parameter to the LCG API, but that was quite easy.

LCG .get_float()

RFDom .random()

And done !

☑ ~~Create the core functions~~

Adding Other Stuff

After adding .random() -> [0.0, 1.0), .uniform(a: float, b: float) -> [a, b], and .randint(a: int, b: int) -> [a, b] I've looked to the core functions of the python random library, and tried to replicate them. Some of them were really easy, some were quite a bit harder to do. I'm not going to explain all of them, since it's just a bunch of formulas, but here are two that I found quite interesting:

1. Choices

The .choices() func takes at max 4 arguments:

population: a sequence of objects
weight (optional): the probability weight of each object to get picked
cum_weights (optional): preprocessed cumulative weights
k (optional): the number of objects to pick

The way it works is pretty simple, even tho it was a bit hard to implement:

Each population object has a weight, either the same for every object, or a given one for each object.
If no weight (or cum_weights) has been provided, simply take a random object in the population array k times.

If a weight array is provided, we will need to build a cum_weight list, if it isn't already provided.

The cumulative weights represent the range where a number falls into an object, like this:

objects -> ["apple", "banana", "orange"]
weights -> [4,        1,       6]

cum_weights = [4,     5,       11]

=> 
apple -> [0, 4)
banana -> [4, 5)
orange -> [5, 11)

Choose a random object k times, based on the weights:

we use .uniform(0, max_weight) to get a float value, get the object falling in that range, add it to the list

Return the list: we return a list of k elements chosen randomly

2. shuffle

The .shuffle() function is much simpler, but I wanted to talk about the algorithm.
It takes a list as the only argument and edits it in-place (doesn't return it, but changes the original).

To do this, we're using the Fisher-Yates algorithm:

Starting from the beginning of the list, for each index i, a random index j is chosen from the range [i, n - 1], and the elements at positions i and j are swapped.
Repeat until everything has been shuffled.

It is represented by this code:

☑ ~~Add other functions on top of that~~

Trying to break the randomizer

My first idea (before even starting the project) was to try to break it out by overflowing the bandwidth using this tool.

I then made this simple code to have an eye on the values

I ran it, first without any special radio broadcast or whatsoever.

Then, I ran the program, and I observed. No changes at all, but that was predictable. Even with the bandwidth saturated, oscillations in the signal are still there, and the phase difference still works. I also later had an AI analyze the results, which confirmed there weren't any major changes.

☑ ~~Check that everything works and isn't easily affected by external events~~

For the End

This project began with a simple question: where does randomness actually come from ?, and turned into a deep dive into radio physics, signal processing, entropy extraction, and practical limits.

Building this taught me that randomness isn't magic: it's trade-offs. Hardware entropy is slow but truly random. PRNGs are fast but predictable. Combining both? That's where it gets interesting.

Is this generator cryptographically secure ? Probably not. Is it faster than Python's built-in random ? Definitely not. Would I recommend using this in production (or in general) ? Absolutely not.

But that was never the point.

In the end, I won't replace the casual import random when I need randomness, but I learned how this works behind the scenes, even if it's only a fraction of the modern random generators.

If you're interested into seeing the full code, I uploaded it to my GitHub. Usage examples can be found in the /examples/ dir, and installation instructions on the README.

Questions? Ideas?

If you have any questions about the implementation, suggestions for improvements, or you've experimented with similar approaches, feel free to drop a comment below. I'm curious to hear what you think!