Birth Month Simulation

Birth Month Simulation#

We can use the built in random choice feature of numpy to simulate non-uniformly distributed events. For instance below we consider the situation where we want to simulate the choice of a costumer at a chips shop. The costumer can chooce to have mayonaise, ketchup, curry or peanut sauce on their chips. We simulate the choice the costumer makes by assigning each choice a certain probability.

import micropip

await micropip.install("seaborn")
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
food = ["mayonaise", "ketchup", "curry", "peanut sauce"]
p = [0.6, 0.1, 0.15, 0.15]
food_choice = np.random.choice(food, p=p)
print(food_choice)

Now let’s consider simulation of the birth months. So we will simulate a town of people together with their birth months. We use three different models for the birth months distributions. Namely, the uniform distribution over months, a uniform model over days that does not take leap days into account and a uniform model over days that does take leap days into account.

months = [
    "January",
    "February",
    "March",
    "April",
    "May",
    "June",
    "July",
    "August",
    "September",
    "October",
    "November",
    "December",
]

p_uniform_months = [1 / 12 for i in range(12)]

p_uniform_no_leap_days = [
    31 / 365,
    28 / 365,
    31 / 365,
    30 / 365,
    31 / 365,
    30 / 365,
    31 / 365,
    31 / 365,
    30 / 365,
    31 / 365,
    30 / 365,
    31 / 365,
]

p_uniform_with_leap_days = [
    4 * 31 / (4 * 365 + 1),
    (4 * 28 + 1) / (4 * 365 + 1),
    4 * 31 / (4 * 365 + 1),
    4 * 30 / (4 * 365 + 1),
    4 * 31 / (4 * 365 + 1),
    4 * 30 / (4 * 365 + 1),
    4 * 31 / (4 * 365 + 1),
    4 * 31 / (4 * 365 + 1),
    4 * 30 / (4 * 365 + 1),
    4 * 31 / (4 * 365 + 1),
    4 * 30 / (4 * 365 + 1),
    4 * 31 / (4 * 365 + 1),
]


def town_of_people(size=20000, model=p_uniform_months):
    return np.random.choice(months, size=size, p=model)

Finally let’s have a look at our simulated distribution. For more on plotting have a look at the visualiation chapter. Below we load our simulated data into a dataframe and subsquently plot the proportions of each model into a histogram.

simulated_uniform_months_model = town_of_people()
simulated_uniform_no_leap_days_model = town_of_people(model=p_uniform_no_leap_days)
simulated_uniform_with_leap_days_model = town_of_people(model=p_uniform_with_leap_days)

df_simulated_uniform_months_model = pd.DataFrame(
    simulated_uniform_months_model, columns=["Month"]
)
df_simulated_uniform_no_leap_days_model = pd.DataFrame(
    simulated_uniform_no_leap_days_model, columns=["Month"]
)
df_simulated_uniform_with_leap_days_model = pd.DataFrame(
    simulated_uniform_with_leap_days_model, columns=["Month"]
)


plt.figure(figsize=(15, 5))

plt.subplot(1, 3, 1)
sns.countplot(
    x="Month", data=df_simulated_uniform_months_model, order=months, stat="proportion"
)
plt.title("Distribution: uniform on months")
plt.xlabel("Month")
plt.ylabel("Proportion")
plt.xticks(rotation=45)


plt.subplot(1, 3, 2)
sns.countplot(
    x="Month",
    data=df_simulated_uniform_no_leap_days_model,
    order=months,
    stat="proportion",
)
plt.title("Distribution: uniform on days (no leap days)")
plt.xlabel("Month")
plt.ylabel("Proportion")
plt.xticks(rotation=45)


plt.subplot(1, 3, 3)
sns.countplot(
    x="Month",
    data=df_simulated_uniform_with_leap_days_model,
    order=months,
    stat="proportion",
)
plt.title("Distribution: uniform on days (with leap days)")
plt.xlabel("Month")
plt.ylabel("Proportion")
plt.xticks(rotation=45)

plt.tight_layout()
plt.show()