MLE offers a framework that exactly tackles this query. It introduces a chance operate, which is a operate that yields one other operate. This chance operate takes a vector of parameters, typically denoted as theta, and produces a likelihood density operate (PDF) that is determined by theta.
The likelihood density operate (PDF) of a distribution is a operate that takes a price, x, and returns its likelihood inside the distribution. Due to this fact, chance capabilities are usually expressed as follows:
The worth of this operate signifies the chance of observing x from the distribution outlined by the PDF with theta as its parameters.
The aim
When setting up a forecast mannequin, we now have knowledge samples and a parameterized mannequin, and our aim is to estimate the mannequin’s parameters. In our examples, comparable to Regression and MA fashions, these parameters are the coefficients within the respective mannequin formulation.
The equal in MLE is that we now have observations and a PDF for a distribution outlined over a set of parameters, theta, that are unknown and never immediately observable. Our aim is to estimate theta.
The MLE strategy includes discovering the set of parameters, theta, that maximizes the chance operate given the observable knowledge, x.
We assume our samples, x, are drawn from a distribution with a recognized PDF that is determined by a set of parameters, theta. This suggests that the chance (likelihood) of observing x below this PDF is actually 1. Due to this fact, figuring out the theta values that make our chance operate worth near 1 on our samples, ought to reveal the true parameter values.
Conditional chance
Discover that we haven’t made any assumptions concerning the distribution (PDF) on which the chance operate relies. Now, let’s assume our commentary X is a vector (x_1, x_2, …, x_n). We’ll contemplate a likelihood operate that represents the likelihood of observing x_n conditional on that we now have already noticed (x_1, x_2, …, x_{n-1}) —
This represents the chance of observing simply x_n given the earlier values (and theta, the set of parameters). Now, we outline the conditional chance operate as follows:
Later, we’ll see why it’s helpful to make use of the conditional chance operate relatively than the precise chance operate.
Log-Chance
In observe, it’s typically handy to make use of the pure logarithm of the chance operate, known as the log-likelihood operate:
That is extra handy as a result of we frequently work with a chance operate that could be a joint likelihood operate of impartial variables, which interprets to the product of every variable’s likelihood. Taking the logarithm converts this product right into a sum.
For simplicity, I’ll exhibit estimate probably the most fundamental shifting common mannequin — MA(1):
Right here, x_t represents the time-series observations, alpha and beta are the mannequin parameters to be estimated, and the epsilons are random noise drawn from a standard distribution with zero imply and a few variance — sigma, which will even be estimated. Due to this fact, our “theta” is (alpha, beta, sigma), which we intention to estimate.
Let’s outline our parameters and generate some artificial knowledge utilizing Python:
import pandas as pd
import numpy as npSTD = 3.3
MEAN = 0
ALPHA = 18
BETA = 0.7
N = 1000
df = pd.DataFrame({"et": np.random.regular(loc=MEAN, scale=STD, measurement=N)})
df["et-1"] = df["et"].shift(1, fill_value=0)
df["xt"] = ALPHA + (BETA*df["et-1"]) + df["et"]
Notice that we now have set the usual deviation of the error distribution to three.3, with alpha at 18 and beta at 0.7. The info seems to be like this —
Chance operate for MA(1)
Our goal is to assemble a chance operate that addresses the query: how doubtless is it to look at our time sequence X=(x_1, …, x_n) assuming they’re generated by the MA(1) course of described earlier?
The problem in computing this likelihood lies within the mutual dependence amongst our samples — as evident from the truth that each x_t and x_{t-1} depend upon e_{t-1) — making it non-trivial to find out the joint likelihood of observing all samples (known as the precise chance).
Due to this fact, as mentioned beforehand, as an alternative of computing the precise chance, we’ll work with a conditional chance. Let’s start with the chance of observing a single pattern given all earlier samples:
That is a lot less complicated to calculate as a result of —
All that is still is to calculate the conditional chance of observing all samples:
making use of a pure logarithm provides:
which is the operate we should always maximize.
Code
We’ll make the most of the GenericLikelihoodModel
class from statsmodels for our MLE estimation implementation. As outlined within the tutorial on statsmodels’ web site, we merely must subclass this class and embody our chance operate calculation:
from scipy import stats
from statsmodels.base.mannequin import GenericLikelihoodModel
import statsmodels.api as smclass MovingAverageMLE(GenericLikelihoodModel):
def initialize(self):
tremendous().initialize()
extra_params_names = ['beta', 'std']
self._set_extra_params_names(extra_params_names)
self.start_params = np.array([0.1, 0.1, 0.1])
def calc_conditional_et(self, intercept, beta):
df = pd.DataFrame({"xt": self.endog})
ets = [0.0]
for i in vary(1, len(df)):
ets.append(df.iloc[i]["xt"] - intercept - (beta*ets[i-1]))
return ets
def loglike(self, params):
ets = self.calc_conditional_et(params[0], params[1])
return stats.norm.logpdf(
ets,
scale=params[2],
).sum()
The operate loglike
is crucial to implement. Given the iterated parameter values params
and the dependent variables (on this case, the time sequence samples), that are saved as class members self.endog
, it calculates the conditional log-likelihood worth, as we mentioned earlier.
Now let’s create the mannequin and match on our simulated knowledge:
df = sm.add_constant(df) # add intercept for estimation (alpha)
mannequin = MovingAverageMLE(df["xt"], df["const"])
r = mannequin.match()
r.abstract()
and the output is:
And that’s it! As demonstrated, MLE efficiently estimated the parameters we chosen for simulation.
Estimating even a easy MA(1) mannequin with most chance demonstrates the facility of this technique, which not solely permits us to make environment friendly use of our knowledge but in addition offers a strong statistical basis for understanding and deciphering the dynamics of time sequence knowledge.
Hope you favored it !
[1] Andrew Lesniewski, Time Series Analysis, 2019, Baruch School, New York
[2] Eric Zivot, Estimation of ARMA Models, 2005
Until in any other case famous, all photos are by the writer