Think about you’re an e-commerce platform aiming to personalize your e-mail campaigns primarily based on person exercise from the previous week. If a person has been much less energetic in comparison with earlier weeks, you intend to ship them a reduction provide.
You’ve gathered person statistics and observed the next for a person named John:
- John visited the platform for the primary time 15 days in the past.
- Throughout the first 7 days (days 1–7), he made 9 visits.
- Throughout the subsequent 7 days (days 2–8), he made 8 visits.
- Completely we’ve 9 values.
Now, you wish to consider how excessive the newest worth is in comparison with the earlier ones.
import numpy as np
visits = np.array([9, 8, 6, 5, 8, 6, 8, 7])
num_visits_last_week = 6
Let’s create a CDF of those values.
import numpy as np
import matplotlib.pyplot as pltvalues = np.array(sorted(set(visits)))
counts = np.array([data.count(x) for x in values])
possibilities = counts / counts.sum()
cdf = np.cumsum(possibilities)
plt.scatter(values, cdf, shade='black', linewidth=10)
Now we have to restore the operate primarily based on these values. We’ll use spline interpolation.
from scipy.interpolate import make_interp_splinex_new = np.linspace(values.min(), values.max(), 300)
spline = make_interp_spline(values, cdf, okay=3)
cdf_smooth = spline(x_new)
plt.plot(x_new, cdf_smooth, label='Сплайн CDF', shade='black', linewidth=4)
plt.scatter(values, cdf, shade='black', linewidth=10)
plt.scatter(values[-2:], cdf[-2:], shade='#f95d5f', linewidth=10, zorder=5)
plt.present()
Not dangerous. However we observe a small downside between pink dots — the CDF have to be monotonically growing. Let’s repair this with Piecewise Cubic Hermite Interpolating Polynomial.
from scipy.interpolate import PchipInterpolatorspline_monotonic = PchipInterpolator(values, cdf)
cdf_smooth = spline_monotonic(x_new)
plt.plot(x_new, cdf_smooth, shade='black', linewidth=4)
plt.scatter(values, cdf, shade='black', linewidth=10)
plt.present()
Alright, now it’s excellent.
To calculate p-value for our present commentary (6 visits over the last week) we have to calculate the floor of stuffed space.
To take action let’s create a easy operate calculate_p_value:
def calculate_p_value(x):
if x < values.min():
return 0
elif x > values.max():
return 1
else:
return spline_monotonic(x) p_value = calculate_p_value(num_visits_last_week)
print(f"Likelihood of getting lower than {num_visits_last_week} equals: {p_value}")
Likelihood of getting lower than 6 equals: 0.375
So the chance is kind of excessive (we could examine it to a threshold 0.1 as an illustration) and we determine to not ship the low cost to John. Similar calculation we have to do for all of the customers.