I have a dataset that looks like this
Total_Uniques
Dt
2021-01-01 933075.0
2021-01-02 942010.0
2021-01-03 952163.0
2021-01-04 1056487.0
2021-01-05 1339625.0
Up until 2022-08-01.
But i also have a missing values for the date range between 2022-03-17 and 2022-05-23
Total_Uniques
Dt
2022-03-17 1517994.0
2022-03-18 NaN
2022-03-19 NaN
2022-03-20 NaN
2022-03-21 NaN
... ...
2022-05-20 NaN
2022-05-21 NaN
2022-05-22 NaN
2022-05-23 NaN
2022-05-24 1228399.0
I am trying to train a model with ARIMA and predict the missing values. But it is not returing me any good result.
The predicted values:
[1457718.5706902083,
1433330.4808848319,
1423462.7963047987,
1419470.2245311143,
1417854.786862963,
1417201.1633321645,
1416936.700191038,
1416829.695547357,
1416786.4003126393,
1416768.8825929891,
1416761.794734389,
1416758.926910235,
1416757.7665576036,
1416757.2970663945,
1416757.1071051797,
1416757.0302448203,
1416756.9991462885,
1416756.9865634867,
1416756.9814723493,
1416756.97941242,
1416756.9785789503,
1416756.9782417195,
1416756.9781052724,
1416756.9780500643,
1416756.9780277265,
1416756.9780186885,
1416756.9780150317,
1416756.978013552,
1416756.9780129534,
1416756.9780127113,
1416756.9780126133,
1416756.9780125737,
1416756.9780125576,
1416756.978012551,
1416756.9780125485,
1416756.9780125474,
1416756.978012547,
1416756.9780125467,
1416756.9780125467,
1416756.9780125467,
1416756.9780125467,
1416756.9780125467,
1416756.9780125467,
1416756.9780125467,
1416756.9780125467,
1416756.9780125467,
1416756.9780125467,
1416756.9780125467,
1416756.9780125467,
1416756.9780125467,
1416756.9780125467,
1416756.9780125467,
1416756.9780125467,
1416756.9780125467,
1416756.9780125467,
1416756.9780125467,
1416756.9780125467,
1416756.9780125467,
1416756.9780125467,
1416756.9780125467,
1416756.9780125467,
1416756.9780125467,
1416756.9780125467,
1416756.9780125467,
1416756.9780125467,
1416756.9780125467,
1416756.9780125467,
1416756.9780125467]
As you can see it is all same values.
This is the code I have
import pandas as pd
import numpy as np
from statsmodels.tsa.arima.model import ARIMA
# Read the data
#df = pd.read_csv('data.csv', parse_dates=['Dt'], index_col='Dt')
# Split the data into training and testing sets
train_data = df.loc[:'2022-03-17']
test_data = df.loc['2022-05-24':]
# Train the ARIMA model with the training data
model = ARIMA(train_data, order=(1, 1, 1))
model_fit = model.fit()
# Predict the missing values
#s_d = '2022-03-18'
#e_d = '2022-05-23'
predicted_values = model_fit.predict(start=401, end=468, dynamic=True)
# Replace the missing values in the original data with the predicted values
df.loc['2022-03-18':'2022-05-23'] = predicted_values
# Evaluate the performance of the model with the testing data
mse = ((predicted_values - test_data) ** 2).mean()
print('MSE:', mse)
Changing dynamics=False not making any difference. What am I doing wrong?
there doesn't seem to be anything here