Prédiction du cours boursier#

Objectif

prédire le cours boursier à horizon 60 jours
comparer les modèles

Modèles choisis

ARMA
SARIMA
XGBoost
Extra Trees (variante de Random Forest)
Support Vector Machine (SVM)
Prophet

Tableau. Modèles de prédiction

Modèle	Detrend	Saisonnalité	Type
ARMA	Moyenne mobile linéaire	Moyenne mobile linéaire	Série temporelle
SARIMA	Moyenne mobile linéaire	Moyenne mobile linéaire	Série temporelle
XGBoost	Régression linéaire	Mensuelle	Machine Learning
ExtraTrees	Régression linéaire	Mensuelle	Machine Learning
Support Vector Machine	Régression linéaire	Mensuelle	Machine Learning
Prophet	Pas de detrend	Automatique	Autre

Critères d’évaluation

train / test split (test = 60 jours)
AIC
MSE
graphiquement (la courbe ne doit pas faire “n’importe quoi”)

Imports#

import matplotlib.dates as mdates
import matplotlib.pyplot as plt
import pandas as pd
import xgboost
from prophet import Prophet
from sklearn.ensemble import ExtraTreesRegressor
from sklearn.linear_model import LinearRegression
from sklearn.metrics import (
    mean_absolute_error,
    mean_absolute_percentage_error,
    mean_squared_error,
)
from sklearn.svm import SVR
from statsmodels.tsa.statespace.sarimax import SARIMAX

from src.functions.arima_parameters import arima_parameters, seasonal_order
from src.utils import init_notebook

1init_notebook()

data_folder = "data/processed_data/detrend_data/LinearMADetrend/window-100"
stock_name = "AAPL"

df = pd.read_csv(
    f"{data_folder}/{stock_name}.csv", parse_dates=["Date"], index_col="Date"
)
print(f"{df.shape = }")

df.shape = (756, 6)

prediction_results_dict = {}

SARIMA#

Predict new price#

# Take close price as target variable
price = df["Close"]

# Example: Fit ARMA(1,1) model
model = SARIMAX(price, order=arima_parameters, seasonal_order=seasonal_order)
fitted_arima = model.fit()

# Display model summary
print(fitted_arima.summary())

RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =           21     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  2.09762D+00    |proj g|=  9.04128D-02

/home/runner/.cache/pypoetry/virtualenvs/stock-analysis-DF-fhKMw-py3.10/lib/python3.10/site-packages/statsmodels/tsa/base/tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
  self._init_dates(dates, freq)
/home/runner/.cache/pypoetry/virtualenvs/stock-analysis-DF-fhKMw-py3.10/lib/python3.10/site-packages/statsmodels/tsa/base/tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
  self._init_dates(dates, freq)
 This problem is unconstrained.

At iterate    5    f=  2.08571D+00    |proj g|=  2.66727D-02

At iterate   10    f=  2.08366D+00    |proj g|=  1.10199D-02

At iterate   15    f=  2.08036D+00    |proj g|=  2.48142D-02

At iterate   20    f=  2.07949D+00    |proj g|=  1.87000D-02

At iterate   25    f=  2.07880D+00    |proj g|=  7.76589D-02

At iterate   30    f=  2.07724D+00    |proj g|=  5.55703D-03

At iterate   35    f=  2.07699D+00    |proj g|=  9.37390D-03

At iterate   40    f=  2.07618D+00    |proj g|=  1.06681D-02

At iterate   45    f=  2.07585D+00    |proj g|=  1.99382D-02

At iterate   50    f=  2.07546D+00    |proj g|=  2.08780D-02

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
   21     50     56      1     0     0   2.088D-02   2.075D+00
  F =   2.0754648378324383     

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT                 
                               SARIMAX Results                                
==============================================================================
Dep. Variable:                  Close   No. Observations:                  756
Model:             SARIMAX(10, 0, 10)   Log Likelihood               -1569.051
Date:                Mon, 05 Feb 2024   AIC                           3180.103
Time:                        09:25:27   BIC                           3277.292
Sample:                             0   HQIC                          3217.538
                                - 756                                         
Covariance Type:                  opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
ar.L1          1.2920      0.216      5.980      0.000       0.869       1.715
ar.L2         -0.5610      0.267     -2.099      0.036      -1.085      -0.037
ar.L3          0.2448      0.192      1.277      0.202      -0.131       0.621
ar.L4          0.3863      0.143      2.705      0.007       0.106       0.666
ar.L5         -0.6578      0.137     -4.814      0.000      -0.926      -0.390
ar.L6          0.2803      0.139      2.019      0.043       0.008       0.552
ar.L7          0.4084      0.133      3.072      0.002       0.148       0.669
ar.L8         -0.8304      0.181     -4.595      0.000      -1.185      -0.476
ar.L9          1.1218      0.271      4.142      0.000       0.591       1.653
ar.L10        -0.7089      0.173     -4.096      0.000      -1.048      -0.370
ma.L1         -0.4244      0.219     -1.934      0.053      -0.855       0.006
ma.L2          0.2860      0.167      1.717      0.086      -0.040       0.612
ma.L3         -0.0823      0.137     -0.600      0.548      -0.351       0.186
ma.L4         -0.4330      0.149     -2.897      0.004      -0.726      -0.140
ma.L5          0.3459      0.141      2.462      0.014       0.071       0.621
ma.L6         -0.1033      0.138     -0.750      0.453      -0.373       0.167
ma.L7         -0.3871      0.146     -2.645      0.008      -0.674      -0.100
ma.L8          0.4684      0.181      2.592      0.010       0.114       0.823
ma.L9         -0.7679      0.189     -4.054      0.000      -1.139      -0.397
ma.L10         0.1254      0.038      3.267      0.001       0.050       0.201
sigma2         3.6688      0.133     27.535      0.000       3.408       3.930
===================================================================================
Ljung-Box (L1) (Q):                   0.01   Jarque-Bera (JB):               292.50
Prob(Q):                              0.91   Prob(JB):                         0.00
Heteroskedasticity (H):               7.28   Skew:                             0.15
Prob(H) (two-sided):                  0.00   Kurtosis:                         6.03
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).

/home/runner/.cache/pypoetry/virtualenvs/stock-analysis-DF-fhKMw-py3.10/lib/python3.10/site-packages/statsmodels/base/model.py:607: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
  warnings.warn("Maximum Likelihood optimization failed to "

# Make predictions

forecast_steps = 60  # N days to forecast
forecast = fitted_arima.get_forecast(steps=forecast_steps)

date_range = pd.date_range(
    price.index[-1], periods=forecast_steps + 1, freq=price.index.freq
)
forecast_index = date_range[1:]  # Exclude price.index[-1]

predicted_values = forecast.predicted_mean

/home/runner/.cache/pypoetry/virtualenvs/stock-analysis-DF-fhKMw-py3.10/lib/python3.10/site-packages/statsmodels/tsa/base/tsa_model.py:836: ValueWarning: No supported index is available. Prediction results will be given with an integer index beginning at `start`.
  return get_prediction_index(
/home/runner/.cache/pypoetry/virtualenvs/stock-analysis-DF-fhKMw-py3.10/lib/python3.10/site-packages/statsmodels/tsa/base/tsa_model.py:836: FutureWarning: No supported index is available. In the next version, calling this method in a model without a supported index will result in an exception.
  return get_prediction_index(

plot_n_days_prior_pred = 2 * forecast_steps

plt.plot(price[-plot_n_days_prior_pred:], label="Original price")
plt.plot(forecast_index, predicted_values, label="ARMA predictions", color="red")
plt.title("SARIMA predictions for Apple stock price")
plt.legend()


# Display limited number of date index
plt.gca().xaxis.set_major_locator(mdates.MonthLocator(interval=3))
# Rotate x-axis labels
plt.gcf().autofmt_xdate()

plt.show()

../_images/4648e1e717bb70f73a5d8e5bb14915299606ba4eb455a43671044c63c6101a96.png

Train test split#

train_test_split_date = pd.Timestamp("2021-10-01")
train, test = (
    price[price.index <= train_test_split_date],
    price[price.index > train_test_split_date],
)

model = SARIMAX(train, order=arima_parameters, seasonal_order=seasonal_order)
result = model.fit()

/home/runner/.cache/pypoetry/virtualenvs/stock-analysis-DF-fhKMw-py3.10/lib/python3.10/site-packages/statsmodels/tsa/base/tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
  self._init_dates(dates, freq)
/home/runner/.cache/pypoetry/virtualenvs/stock-analysis-DF-fhKMw-py3.10/lib/python3.10/site-packages/statsmodels/tsa/base/tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
  self._init_dates(dates, freq)
 This problem is unconstrained.

RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =           21     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  2.08236D+00    |proj g|=  2.78026D+00

At iterate    5    f=  2.06322D+00    |proj g|=  1.87002D-01

At iterate   10    f=  2.05780D+00    |proj g|=  1.38443D-01

At iterate   15    f=  2.05447D+00    |proj g|=  1.85209D-01

At iterate   20    f=  2.05395D+00    |proj g|=  5.90870D-02

At iterate   25    f=  2.05044D+00    |proj g|=  1.18707D-01

At iterate   30    f=  2.04875D+00    |proj g|=  6.01596D-02

At iterate   35    f=  2.04800D+00    |proj g|=  3.00315D-02

At iterate   40    f=  2.04743D+00    |proj g|=  6.00667D-02

At iterate   45    f=  2.04602D+00    |proj g|=  3.24028D-02

At iterate   50    f=  2.04558D+00    |proj g|=  1.63582D-02

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
   21     50     55      1     0     0   1.636D-02   2.046D+00
  F =   2.0455840523131141     

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT                 

/home/runner/.cache/pypoetry/virtualenvs/stock-analysis-DF-fhKMw-py3.10/lib/python3.10/site-packages/statsmodels/base/model.py:607: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
  warnings.warn("Maximum Likelihood optimization failed to "

forecast_steps = len(test)
forecast = result.get_forecast(steps=forecast_steps)
predicted_values = forecast.predicted_mean

/home/runner/.cache/pypoetry/virtualenvs/stock-analysis-DF-fhKMw-py3.10/lib/python3.10/site-packages/statsmodels/tsa/base/tsa_model.py:836: ValueWarning: No supported index is available. Prediction results will be given with an integer index beginning at `start`.
  return get_prediction_index(
/home/runner/.cache/pypoetry/virtualenvs/stock-analysis-DF-fhKMw-py3.10/lib/python3.10/site-packages/statsmodels/tsa/base/tsa_model.py:836: FutureWarning: No supported index is available. In the next version, calling this method in a model without a supported index will result in an exception.
  return get_prediction_index(

plot_n_days_prior_pred = 2 * forecast_steps

plt.plot(train[-plot_n_days_prior_pred:], label="Original training price")
plt.plot(test, label="Original test price")
plt.plot(test.index, predicted_values, label="SARIMA predictions", color="red")
plt.title("SARIMA predictions for Apple stock price")
plt.legend()


# Display limited number of date index
plt.gca().xaxis.set_major_locator(mdates.MonthLocator(interval=3))
# Rotate x-axis labels
plt.gcf().autofmt_xdate()

plt.show()

../_images/ad6b539ece72ba01a74ae74f0277c38d1f15b3777671ac9c181f1518dad0bab6.png

Prédiction du prix d’Apple à 2 mois#

Recomposons les prédictions du modèle SARIMA pour la stochasticité avec la tendance pour obtenir une prévision du cours d’Apple.

# Import original stock price time series
data_folder = "data/raw_data"
original_data = pd.read_csv(
    f"{data_folder}/{stock_name}.csv", parse_dates=["Date"], index_col="Date"
)

# Get only close price
original_data = original_data["Close"]

# Train test split
original_data_train, original_data_test = (
    original_data[price.index <= train_test_split_date],
    original_data[price.index > train_test_split_date],
)

# Drop time index in order to vanish weekend days issues
original_data_train.reset_index(drop=True, inplace=True)
original_data_test.reset_index(drop=True, inplace=True)

Tendance#

On reconstruit la tendance que l’on avait prédite par la méthode de la moyenne mobile.

for i in range(forecast_steps):
    rolling_mean = original_data_train.rolling(window=100, center=False).mean()
    pred_trend = rolling_mean.iloc[-1]
    pred_index = original_data_train.index[-1] + 1
    original_data_train = pd.concat(
        [original_data_train, pd.Series([pred_trend], index=[pred_index])]
    )

plt.plot(
    original_data_train.iloc[-300:-forecast_steps], color="blue", label="Original price"
)
plt.plot(
    original_data_train.iloc[-forecast_steps:],
    color="red",
    label="Predicted trend price",
)
plt.legend()

<matplotlib.legend.Legend at 0x7fcae9bd1900>

../_images/ac89b7b04ffc18242089f69b06c9723f995abe2a0f91c7b2289b9215a12789e3.png

Saisonnalité et stochasticité#

# Make SARIMA predicted values begin at zero
predicted_values -= predicted_values.iloc[0]

# Calculate trend + ARIMA
add_components = original_data_train.iloc[-forecast_steps] + predicted_values

# Put predicted data in train series set
original_data_train.iloc[-forecast_steps:] = add_components

Evaluation de la prédiction#

# Set index for test data i.e. actual data
original_data_test.index = original_data_train.index[-forecast_steps:]

plt.plot(
    original_data_train.iloc[-200:-forecast_steps], color="blue", label="Original price"
)
plt.plot(
    original_data_train.iloc[-forecast_steps:],
    color="red",
    label="Predicted price with trend + seasonality + stochasticity",
)
plt.plot(original_data_test, color="green", label="Actual price")
plt.legend()

<matplotlib.legend.Legend at 0x7fcae9a7b910>

../_images/19ebdcf6cae6ce3bed83ef3792bcf2a39a9b1195a0db97ec95fa53cf0c4c12e6.png

y_true = original_data_test
y_pred = original_data_train[-forecast_steps:]

mae = mean_absolute_error(y_true=y_true, y_pred=y_pred)
rmse = mean_squared_error(y_true=y_true, y_pred=y_pred, squared=False)

print(f"{mae = }")
print(f"{rmse = }")


prediction_results_dict["SARIMA"] = [rmse, mae]

mae = 10.698679992853735
rmse = 13.862872951424345

XGBoost#

Traitement des données#

# relecture des données (sans detrend)
data_folder = "data/raw_data"
stock_name = "AAPL"
df = pd.read_csv(
    f"{data_folder}/{stock_name}.csv", parse_dates=["Date"], index_col="Date"
)
print(f"{df.shape = }")

df.shape = (756, 6)

train_start_date = "2019"
train_end_date = "2021-10-01"
df_train = df.loc[train_start_date:train_end_date].copy()
df_test = df.loc[train_end_date:].copy()

df_train["time_dummy"] = range(len(df_train))
df_test["time_dummy"] = range(len(df_test))
df_test["time_dummy"] += len(df_train)
df_train["day"] = df_train.index.day
df_test["day"] = df_test.index.day

df_train["time_dummy"].tail()

Date
2021-09-27    689
2021-09-28    690
2021-09-29    691
2021-09-30    692
2021-10-01    693
Name: time_dummy, dtype: int64

df_test["time_dummy"].head()

Date
2021-10-01    694
2021-10-04    695
2021-10-05    696
2021-10-06    697
2021-10-07    698
Name: time_dummy, dtype: int64

x_col = ["time_dummy", "day"]
y_col = ["Close"]

x = df_train[x_col]
y = df_train[y_col]

x_test = df_test[x_col]
y_test = df_test[y_col]

Apprentissage des modèles#

lr = LinearRegression()

xgb = xgboost.XGBRegressor(random_state=0, n_jobs=-2, colsample_bytree=0.3, max_depth=3)

lr.fit(x, y)

LinearRegression()

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

y_residuals = y - lr.predict(x)
xgb.fit(x, y_residuals)

XGBRegressor(base_score=None, booster=None, callbacks=None,
             colsample_bylevel=None, colsample_bynode=None,
             colsample_bytree=0.3, device=None, early_stopping_rounds=None,
             enable_categorical=False, eval_metric=None, feature_types=None,
             gamma=None, grow_policy=None, importance_type=None,
             interaction_constraints=None, learning_rate=None, max_bin=None,
             max_cat_threshold=None, max_cat_to_onehot=None,
             max_delta_step=None, max_depth=3, max_leaves=None,
             min_child_weight=None, missing=nan, monotone_constraints=None,
             multi_strategy=None, n_estimators=None, n_jobs=-2,
             num_parallel_tree=None, random_state=0, ...)

def xgb_prediction(xgb: xgboost.XGBRegressor, lr: LinearRegression, x):
    lr_predict = lr.predict(x).reshape(-1, 1)
    y_pred = xgb.predict(x).reshape(-1, 1)

    return y_pred + lr_predict

plt.title("Prédiction XGBoost sur le train set")
plt.plot(xgb_prediction(xgb, lr, x))

[<matplotlib.lines.Line2D at 0x7fcae99a48e0>]

../_images/4f5f6e39367e588fa47e8e7ab4dbae21be22915bf975ddfa359004999d497c94.png

plt.title("Prédiction XGBoost sur le test set")
y_pred = xgb_prediction(xgb, lr, x_test)
plt.plot(y_pred)

[<matplotlib.lines.Line2D at 0x7fcae9b13910>]

../_images/b2815ccc56e9ef8becd9bbbb72037813b1ff78fc20fe2383722d38336c6f80aa.png

y_pred = pd.DataFrame(y_pred)
y_pred.index = df_test.index

plt.title("Prédiction XGBoost (time dummy + saisonnalité mensuelle)")
plt.plot(df_train[["Close"]])
plt.plot(df_test[["Close"]], label="Original")
plt.plot(y_pred, label="Régression linéaire + XGBoost")
plt.legend()
_ = plt.xticks(rotation=45, ha="right")

../_images/f26dbf00c559941c9d4983f342ced4d91bfb4db766acaf1f5f0e889efc999770.png

rmse = mean_squared_error(y_test, y_pred, squared=False)
mae = mean_absolute_error(y_test, y_pred)


print(f"RMSE: {rmse}")
print(f"MAE: {mae}")

prediction_results_dict["XGBoost"] = [rmse, mae]

RMSE: 13.252235166510234
MAE: 9.821550637003275

Extra Trees#

Apprentissage des modèles#

lr = LinearRegression()

et = ExtraTreesRegressor(random_state=0, n_jobs=-2)

lr.fit(x, y)

LinearRegression()

y_residuals = y - lr.predict(x)
et.fit(x, y_residuals)

/home/runner/.cache/pypoetry/virtualenvs/stock-analysis-DF-fhKMw-py3.10/lib/python3.10/site-packages/sklearn/base.py:1152: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  return fit_method(estimator, *args, **kwargs)

ExtraTreesRegressor(n_jobs=-2, random_state=0)

def et_prediction(et: ExtraTreesRegressor, lr: LinearRegression, x):
    lr_predict = lr.predict(x).reshape(-1, 1)
    y_pred = et.predict(x).reshape(-1, 1)

    return y_pred + lr_predict

y_pred = et_prediction(et, lr, x_test)
y_pred = pd.DataFrame(y_pred)
y_pred.index = df_test.index

plt.title("Prédiction ExtraTrees (time dummy + saisonnalité mensuelle)")
plt.plot(df_train[["Close"]])
plt.plot(df_test[["Close"]], label="Original")
plt.plot(y_pred, label="Régression linéaire + ExtraTrees")
plt.legend()
_ = plt.xticks(rotation=45, ha="right")

../_images/8aaa092cef3b80188e8bb46d80d6eba93adaa304d83c812e7aca3b00d4774625.png

rmse = mean_squared_error(y_test, y_pred, squared=False)
mae = mean_absolute_error(y_test, y_pred)


print(f"RMSE: {rmse}")
print(f"MAE: {mae}")

prediction_results_dict["ExtraTrees"] = [rmse, mae]

RMSE: 13.201822820070818
MAE: 9.74557499696818

SVM#

Apprentissage des modèles#

lr = LinearRegression()

svr = SVR()

lr.fit(x, y)

LinearRegression()

y_residuals = y - lr.predict(x)
svr.fit(x, y_residuals)

/home/runner/.cache/pypoetry/virtualenvs/stock-analysis-DF-fhKMw-py3.10/lib/python3.10/site-packages/sklearn/utils/validation.py:1183: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)

SVR()

def svr_prediction(svr: SVR, lr: LinearRegression, x):
    lr_predict = lr.predict(x).reshape(-1, 1)
    y_pred = svr.predict(x).reshape(-1, 1)

    return y_pred + lr_predict

y_pred = svr_prediction(svr, lr, x_test)
y_pred = pd.DataFrame(y_pred)
y_pred.index = df_test.index

plt.title("Prédiction Support Vector (time dummy + saisonnalité mensuelle)")
plt.plot(df_train[["Close"]])
plt.plot(df_test[["Close"]], label="Original")
plt.plot(y_pred, label="Régression linéaire + Support Vector")
plt.legend()
_ = plt.xticks(rotation=45, ha="right")

../_images/166238762197d567bbdd034d688ce940a0bcae9ed3c88b25202f65068fb77847.png

rmse = mean_squared_error(y_test, y_pred, squared=False)
mae = mean_absolute_error(y_test, y_pred)


print(f"RMSE: {rmse}")
print(f"MAE: {mae}")

prediction_results_dict["Support Vector Machine"] = [rmse, mae]

RMSE: 10.297525832287155
MAE: 8.73950426028892

Prophet#

Pré-traitement pour Prophet#

# relecture des données (sans detrend)
data_folder = "data/raw_data"
stock_name = "AAPL"
df = pd.read_csv(
    f"{data_folder}/{stock_name}.csv", parse_dates=["Date"], index_col="Date"
)
print(f"{df.shape = }")

df.shape = (756, 6)

df_train = df.loc[train_start_date:train_end_date]

df_train.shape

(694, 6)

x = df_train[[]].copy()

x["ds"] = df_train.index
x["y"] = df_train[["Close"]]

x.head()

	ds	y
Date
2019-01-02	2019-01-02	39.480000
2019-01-03	2019-01-03	35.547501
2019-01-04	2019-01-04	37.064999
2019-01-07	2019-01-07	36.982498
2019-01-08	2019-01-08	37.687500

Prédiction#

Calcul de la prédiction#

model = Prophet()
model.fit(x)

09:25:35 - cmdstanpy - INFO - Chain [1] start processing

09:25:35 - cmdstanpy - INFO - Chain [1] done processing

<prophet.forecaster.Prophet at 0x7fcae9b30520>

future = x_test.copy()
future["ds"] = x_test.index

forecast = model.predict(future)
forecast[["ds", "yhat", "yhat_lower", "yhat_upper"]].tail()

	ds	yhat	yhat_lower	yhat_upper
58	2021-12-23	163.485849	157.429297	169.888856
59	2021-12-27	164.851525	158.423615	171.336812
60	2021-12-28	165.169482	158.395403	171.728886
61	2021-12-29	165.506546	159.233661	171.649162
62	2021-12-30	165.524170	159.169593	172.744326

Affichage de la prédiction#

fig, ax1 = plt.subplots(figsize=(10, 10))
fig1 = model.plot(forecast, ax=ax1)
df[["Close"]].loc[train_end_date:].plot(ax=ax1, color="orange")
plt.legend()

/home/runner/.cache/pypoetry/virtualenvs/stock-analysis-DF-fhKMw-py3.10/lib/python3.10/site-packages/prophet/plot.py:72: FutureWarning: The behavior of DatetimeProperties.to_pydatetime is deprecated, in a future version this will return a Series containing python datetime objects instead of an ndarray. To retain the old behavior, call `np.array` on the result
  fcst_t = fcst['ds'].dt.to_pydatetime()
/home/runner/.cache/pypoetry/virtualenvs/stock-analysis-DF-fhKMw-py3.10/lib/python3.10/site-packages/prophet/plot.py:73: FutureWarning: The behavior of DatetimeProperties.to_pydatetime is deprecated, in a future version this will return a Series containing python datetime objects instead of an ndarray. To retain the old behavior, call `np.array` on the result
  ax.plot(m.history['ds'].dt.to_pydatetime(), m.history['y'], 'k.',

<matplotlib.legend.Legend at 0x7fcae666e860>

../_images/ec9d4ef5b82111278034d1c3c627bacc164b755186e94d05c8ceb536f6319394.png

Décomposition#

fig2 = model.plot_components(forecast)

/home/runner/.cache/pypoetry/virtualenvs/stock-analysis-DF-fhKMw-py3.10/lib/python3.10/site-packages/prophet/plot.py:228: FutureWarning: The behavior of DatetimeProperties.to_pydatetime is deprecated, in a future version this will return a Series containing python datetime objects instead of an ndarray. To retain the old behavior, call `np.array` on the result
  fcst_t = fcst['ds'].dt.to_pydatetime()
/home/runner/.cache/pypoetry/virtualenvs/stock-analysis-DF-fhKMw-py3.10/lib/python3.10/site-packages/prophet/plot.py:351: FutureWarning: The behavior of DatetimeProperties.to_pydatetime is deprecated, in a future version this will return a Series containing python datetime objects instead of an ndarray. To retain the old behavior, call `np.array` on the result
  df_y['ds'].dt.to_pydatetime(), seas[name], ls='-', c='#0072B2')
/home/runner/.cache/pypoetry/virtualenvs/stock-analysis-DF-fhKMw-py3.10/lib/python3.10/site-packages/prophet/plot.py:354: FutureWarning: The behavior of DatetimeProperties.to_pydatetime is deprecated, in a future version this will return a Series containing python datetime objects instead of an ndarray. To retain the old behavior, call `np.array` on the result
  df_y['ds'].dt.to_pydatetime(), seas[name + '_lower'],

../_images/92630775e1bab29c4bdc0a280ad2a47809f9cdabce07214503a938a7eb9da914.png

Métriques de prédiction#

y_true = df[["Close"]].loc[train_end_date:]
y_pred = forecast[["yhat"]].iloc[-y_true.shape[0] :]

rmse = mean_squared_error(y_true, y_pred, squared=False)
mae = mean_absolute_error(y_true, y_pred)
mape = mean_absolute_percentage_error(y_true, y_pred)

print(f"RMSE: {rmse}")
print(f"MAE: {mae}")

RMSE: 9.228347960218333
MAE: 7.378329691676633

prediction_results_dict["Prophet"] = [rmse, mae]

Comparaison des modèles#

prediction_results_df = pd.DataFrame(prediction_results_dict).T
prediction_results_df.columns = ["RMSE", "MAE"]
# prediction_results_df

prediction_results_df.plot(kind="bar")
plt.title("Comparaison des modèles de prédiction")

plt.xticks(rotation=45, ha="right")

(array([0, 1, 2, 3, 4]),
 [Text(0, 0, 'SARIMA'),
  Text(1, 0, 'XGBoost'),
  Text(2, 0, 'ExtraTrees'),
  Text(3, 0, 'Support Vector Machine'),
  Text(4, 0, 'Prophet')])

../_images/a90dbb0c19f79978e4073da0ee08b24470e38189bfc5d2d9637bf9b1590e2f33.png

1# print(prediction_results_df.to_markdown())

Tableau. Comparaison des modèles de prédiction

	RMSE	MAE	Graphiquement
ARMA	17.3524	14.9797	✅
SARIMA	13.6817	11.6047	✅
XGBoost	13.2914	9.85264	✅
ExtraTrees	13.2018	9.74557	✅
Support Vector Machine	10.2975	8.7395	❌
Prophet	8.97155	7.26206	✅

Meilleur modèle : Prophet

Prédiction du cours boursier

Contents

Prédiction du cours boursier#

Imports#

SARIMA#

Predict new price#

Train test split#

Prédiction du prix d’Apple à 2 mois#

Tendance#

Saisonnalité et stochasticité#

Evaluation de la prédiction#

XGBoost#

Traitement des données#

Apprentissage des modèles#

Extra Trees#

Apprentissage des modèles#

SVM#

Apprentissage des modèles#

Prophet#

Pré-traitement pour Prophet#

Prédiction#

Calcul de la prédiction#

Affichage de la prédiction#

Décomposition#

Métriques de prédiction#

Comparaison des modèles#