Building a Probability-Based Random Number Generator

Recently, I tackled an interesting coding task; the objective was to implement a method, next_num(), that generates random numbers based on predefined probabilities. Below, I’ll outline the task, describe my approach, and provide a link to the GitHub repository containing the complete code and unit tests.

Task Overview

The task was to implement a method, next_num(), which generates random numbers from a predefined list based on associated probabilities. Here’s a quick summary of the requirements:

Input:
- A list of random numbers: [-1, 0, 1, 2, 3]
- A list of corresponding probabilities: [0.01, 0.3, 0.58, 0.1, 0.01]
Output:
- A random number from the list based on the specified probabilities.
Performance:
- The method should be well-structured and tested, as if it were to be deployed in a production environment.

Solution Approach

1. Understanding Probability Distribution

The task involves creating a probability distribution over a set of numbers. Each number in the list has a probability associated with it, indicating how likely it is to be chosen. For example:

The number 0 has a probability of 0.3, meaning it should be selected approximately 30% of the time when the function is called.
The number 1 has the highest probability of 0.58, so it should be the most frequently selected number.

Importantly, the probabilities must sum to 1. If they do not, we need to normalize them.

2. Normalization

If the given probabilities do not sum to 1, we can normalize them by dividing each probability by the total sum of the probabilities.

Formula for Normalization:

P' = \frac{P_i}{\sum P}

where:

$P'$ is the normalized probability,
$P_i$ is the original probability, and
$\sum P$ is the total sum of all probabilities.

3. Cumulative Distribution Function (CDF)

To facilitate random selection based on the given probabilities, we construct a Cumulative Distribution Function (CDF). The CDF is a running total of the probabilities.

For a list of normalized probabilities $P'$ , the CDF can be calculated as follows:

CDF_i = \sum_{j=1}^{i} P'_j

where $CDF_i$ is the cumulative probability up to the $i$ -th number.

4. Random Selection

To select a number based on the defined probabilities, generate a random number $r$ in the range $[0, 1)$ . Compare this random number $r$ to the CDF:

If $r < CDF_1$ , select the first number.
If $CDF_1 \leq r < CDF_2$ , select the second number, and so forth.

Why this works?

Over a large number of trials, the relative frequencies of each number should converge to the defined probabilities. For example, if the probability of selecting the number 1 is 0.58, then out of 1000 selections, you would expect approximately 580 of them to be the number 1.

Conclusion

The implementation of the next_num() function serves as a practical example of how to create a probability-based random number generator in Python. The accompanying unit tests provide additional confidence that the function operates correctly and adheres to the specified probabilities.

You can find the complete code and unit tests here on Github

Feel free to reach out with any questions or insights!