Recently, I tackled an interesting coding task; the objective was to implement a method, next_num()
, that generates random numbers based on predefined probabilities.
Below, I’ll outline the task, describe my approach, and provide a link to the GitHub repository containing the complete code and unit tests.
Task Overview
The task was to implement a method, next_num()
, which generates random numbers from a predefined list based on associated probabilities. Here’s a quick summary of the requirements:
- Input:
- A list of random numbers: [-1, 0, 1, 2, 3]
- A list of corresponding probabilities: [0.01, 0.3, 0.58, 0.1, 0.01]
- Output:
- A random number from the list based on the specified probabilities.
- Performance:
- The method should be well-structured and tested, as if it were to be deployed in a production environment.
Solution Approach
1. Understanding Probability Distribution
The task involves creating a probability distribution over a set of numbers. Each number in the list has a probability associated with it, indicating how likely it is to be chosen. For example:
- The number 0 has a probability of 0.3, meaning it should be selected approximately 30% of the time when the function is called.
- The number 1 has the highest probability of 0.58, so it should be the most frequently selected number.
Importantly, the probabilities must sum to 1. If they do not, we need to normalize them.
2. Normalization
If the given probabilities do not sum to 1, we can normalize them by dividing each probability by the total sum of the probabilities.
Formula for Normalization:
where:
- is the normalized probability,
- is the original probability, and
- is the total sum of all probabilities.
3. Cumulative Distribution Function (CDF)
To facilitate random selection based on the given probabilities, we construct a Cumulative Distribution Function (CDF). The CDF is a running total of the probabilities.
For a list of normalized probabilities , the CDF can be calculated as follows:
where is the cumulative probability up to the -th number.
4. Random Selection
To select a number based on the defined probabilities, generate a random number in the range . Compare this random number to the CDF:
- If , select the first number.
- If , select the second number, and so forth.
Why this works?
Over a large number of trials, the relative frequencies of each number should converge to the defined probabilities. For example, if the probability of selecting the number 1
is 0.58
, then out of 1000 selections, you would expect approximately 580 of them to be the number 1
.
Conclusion
The implementation of the next_num()
function serves as a practical example of how to create a probability-based random number generator in Python. The accompanying unit tests provide additional confidence that the function operates correctly and adheres to the specified probabilities.
You can find the complete code and unit tests here on Github
Feel free to reach out with any questions or insights!