Clustering#

Import des outils / jeu de données#

 1import matplotlib.pyplot as plt
 2import numpy as np
 3import pandas as pd
 4import prince
 5import seaborn as sns
 6from sklearn.cluster import (
 7    DBSCAN,
 8    OPTICS,
 9    AffinityPropagation,
10    AgglomerativeClustering,
11    KMeans,
12    MeanShift,
13)
14from sklearn.compose import ColumnTransformer
15from sklearn.metrics import (
16    calinski_harabasz_score,
17    davies_bouldin_score,
18    silhouette_score,
19)
20from sklearn.mixture import GaussianMixture
21from sklearn.preprocessing import RobustScaler, StandardScaler
22
23from src.clustering import initiate_cluster_models
24from src.config import data_folder, seed
25from src.constants import var_categoriques, var_numeriques
26from src.utils import init_notebook
1init_notebook()
1df = pd.read_csv(
2    f"{data_folder}/data-cleaned-feature-engineering.csv",
3    sep=",",
4    index_col="ID",
5    parse_dates=True,
6)
1composantes_acp = pd.read_csv(f"{data_folder}/composantes_acp.csv", index_col="ID")
2composantes_acm = pd.read_csv(f"{data_folder}/composantes_acm.csv", index_col="ID")

Variables globales#

1var_categoriques_extra = ["NbAcceptedCampaigns", "HasAcceptedCampaigns", "NbChildren"]
2
3var_categoriques_fe = var_categoriques + var_categoriques_extra

Clustering#

Préparation des données#

Nous commencer par fusionner les variables quantitatives et les coordonnées des individus dans l’ACM.

1X_clust = pd.concat((df[var_numeriques], composantes_acm), axis=1)
1X_clust.head()
Year_Birth Income Recency MntWines MntFruits MntMeatProducts MntFishProducts MntSweetProducts MntGoldProds NumDealsPurchases ... ACM4 ACM5 ACM6 ACM7 ACM8 ACM9 ACM10 ACM11 ACM12 ACM13
ID
5524 1957 58138.0 58 635 88 546 172 88 88 3 ... -0.909429 0.439397 0.662880 -0.292805 -0.039068 0.196400 -0.843529 -0.898797 0.174169 0.960631
2174 1954 46344.0 38 11 1 6 2 1 6 2 ... -0.402198 0.129620 0.171945 -0.143049 -0.116974 0.105742 0.259841 0.262817 -0.093681 0.033373
4141 1965 71613.0 26 426 49 127 111 21 42 1 ... -0.423124 -0.465975 0.553202 0.541437 -1.880626 0.028891 -0.374510 -0.919603 0.287754 -0.350058
6182 1984 26646.0 26 11 4 20 10 3 5 2 ... -0.275507 -0.245603 0.246638 0.239619 -0.942750 -0.002593 -0.333331 0.052414 -0.412546 -0.177088
5324 1981 58293.0 94 173 43 118 46 27 15 5 ... 0.554819 0.142063 -0.591854 -0.680550 0.175540 -0.518129 -0.348914 0.007669 -0.491990 -0.319610

5 rows Ă— 27 columns

1preprocessor = ColumnTransformer(
2    remainder="passthrough",
3    transformers=[
4        ("scaler", RobustScaler(), var_numeriques),
5    ],
6)
1scaler = RobustScaler()
2df_apres_scale = pd.DataFrame(
3    preprocessor.fit_transform(X_clust),
4    columns=X_clust.columns,
5    index=df.index,
6)
1df_apres_scale.head()
Year_Birth Income Recency MntWines MntFruits MntMeatProducts MntFishProducts MntSweetProducts MntGoldProds NumDealsPurchases ... ACM4 ACM5 ACM6 ACM7 ACM8 ACM9 ACM10 ACM11 ACM12 ACM13
ID
5524 -0.722222 0.187662 0.18 0.957469 2.50000 2.238318 3.386243 2.424242 1.361702 0.5 ... -0.909429 0.439397 0.662880 -0.292805 -0.039068 0.196400 -0.843529 -0.898797 0.174169 0.960631
2174 -0.888889 -0.175314 -0.22 -0.337137 -0.21875 -0.285047 -0.211640 -0.212121 -0.382979 0.0 ... -0.402198 0.129620 0.171945 -0.143049 -0.116974 0.105742 0.259841 0.262817 -0.093681 0.033373
4141 -0.277778 0.602373 -0.46 0.523859 1.28125 0.280374 2.095238 0.393939 0.382979 -0.5 ... -0.423124 -0.465975 0.553202 0.541437 -1.880626 0.028891 -0.374510 -0.919603 0.287754 -0.350058
6182 0.777778 -0.781546 -0.46 -0.337137 -0.12500 -0.219626 -0.042328 -0.151515 -0.404255 0.0 ... -0.275507 -0.245603 0.246638 0.239619 -0.942750 -0.002593 -0.333331 0.052414 -0.412546 -0.177088
5324 0.611111 0.192433 0.90 -0.001037 1.09375 0.238318 0.719577 0.575758 -0.191489 1.5 ... 0.554819 0.142063 -0.591854 -0.680550 0.175540 -0.518129 -0.348914 0.007669 -0.491990 -0.319610

5 rows Ă— 27 columns

1df_avec_clusters = df_apres_scale.copy()

Différents algorithmes de clustering#

Nous choisissons de tester 2 types de modèles de clustering :

  1. les modèles à choix du nombre de clusters

  2. les modèles qui décident du nombre de clusters

Cela nous permettra de comparer le nombre de clusters donné par les seconds algorithmes.

Pour les modèles pour lesquels il faut choisir le nombre de clusters, nous décidons de tester des clusters de taille 2 à 5 (inclus), car un trop grand nombre de clusters serait plus difficile à interpréter pour l’équipe marketing dans un premier temps.

Tableau. MĂ©thodologie de clustering

|:—————————-|:—-| | Algorithmes | Avec choix du nombre de clusters (entre 2 et 5)
Sans choix du nombre de clusters | | Critères de sélection | Répartition des clusters
MĂ©triques de clusters
SĂ©lection manuelle des clusters via leur affichage | | MĂ©triques | Score Silhouette (entre -1 et 1, proche de 1 = meilleurs clusters)
Calinski-Harabasz (entre 0 et $+\infty$ plus grand = meilleure séparation)
Davies-Bouldin (entre 0 et $+\infty$, proche de 0 = meilleurs clusters) | | Affichage des clusters | Sur les axes d’ACP 1-4
Sur les axes d’ACM 1-4
En fonction des variables quantitatives
En fonction des variables qualitatives |

Tableau. Algorithmes de clustering testés

Choix du nombre de clusters

Algorithmes

Avec

KMeans
MĂ©lange Gaussien (GMM)
Classification Ascendante Hiérarchique (CAH)
(méthode de Ward, single/complete/average linkage)

Sans

OPTICS
Mean Shift
Propagation d’affinité (Affinity Propagation)

1NB_CLUSTER_MIN = 2
2NB_CLUSTER_MAX = 6  ## non inclus
1model_clusters = initiate_cluster_models(
2    NB_CLUSTER_MIN,
3    NB_CLUSTER_MAX,
4    seed,
5)
1a = GaussianMixture()
1isinstance(a, GaussianMixture)
True
 1cluster_metrics = []
 2
 3for (model_name, model) in model_clusters.items():
 4    if isinstance(model, GaussianMixture):  ## cas particulier du mélange gaussien
 5        df_avec_clusters[model_name] = model.fit_predict(df_apres_scale)
 6    else:
 7        model.fit(df_apres_scale)
 8        df_avec_clusters[model_name] = model.labels_
 9
10    df_avec_clusters[model_name] = pd.Categorical(
11        df_avec_clusters[model_name].astype(str)
12    )
13
14    nb_clusters = df_avec_clusters[model_name].nunique()
15
16    repartition = list(
17        df_avec_clusters[model_name].value_counts(normalize=True).round(2).astype(str)
18    )  ## todo: enlever astype(str) si ça sert à rien (tester)
19
20    cluster_metrics.append(
21        [
22            model_name,
23            nb_clusters,
24            " | ".join(repartition),
25            silhouette_score(
26                df_apres_scale, df_avec_clusters[model_name], random_state=seed
27            ),  ## proche de 1 = mieux
28            calinski_harabasz_score(
29                df_apres_scale,
30                df_avec_clusters[model_name],
31            ),  ## plus élevé, mieux c'est
32            davies_bouldin_score(
33                df_apres_scale, df_avec_clusters[model_name]
34            ),  ## proche de 0 = mieux
35        ]
36    )
/home/runner/.cache/pypoetry/virtualenvs/customer-base-analysis-F-W2gxNr-py3.10/lib/python3.10/site-packages/sklearn/cluster/_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
  super()._check_params_vs_input(X, default_n_init=10)
/home/runner/.cache/pypoetry/virtualenvs/customer-base-analysis-F-W2gxNr-py3.10/lib/python3.10/site-packages/sklearn/cluster/_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
  super()._check_params_vs_input(X, default_n_init=10)
/home/runner/.cache/pypoetry/virtualenvs/customer-base-analysis-F-W2gxNr-py3.10/lib/python3.10/site-packages/sklearn/cluster/_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
  super()._check_params_vs_input(X, default_n_init=10)
/home/runner/.cache/pypoetry/virtualenvs/customer-base-analysis-F-W2gxNr-py3.10/lib/python3.10/site-packages/sklearn/cluster/_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
  super()._check_params_vs_input(X, default_n_init=10)
/home/runner/.cache/pypoetry/virtualenvs/customer-base-analysis-F-W2gxNr-py3.10/lib/python3.10/site-packages/sklearn/cluster/_affinity_propagation.py:142: ConvergenceWarning: Affinity propagation did not converge, this model may return degenerate cluster centers and labels.
  warnings.warn(
 1pd.DataFrame(
 2    cluster_metrics,
 3    columns=[
 4        "Algorithme de clustering",
 5        "Nombre de clusters",
 6        "RĂ©partition",
 7        "Silhouette",
 8        "Calinski-Harabasz",
 9        "Davies-Bouldin",
10    ],
11)
Algorithme de clustering Nombre de clusters RĂ©partition Silhouette Calinski-Harabasz Davies-Bouldin
0 KMeans2 2 0.68 | 0.32 0.318482 754.878090 1.547400
1 KMeans3 3 0.46 | 0.29 | 0.25 0.167516 513.955410 1.982583
2 KMeans4 4 0.45 | 0.25 | 0.19 | 0.12 0.157546 402.274688 2.334645
3 KMeans5 5 0.44 | 0.23 | 0.19 | 0.12 | 0.02 0.170982 340.753601 2.058930
4 GMM2 2 0.52 | 0.48 0.207188 542.083968 1.740252
5 GMM3 3 0.52 | 0.31 | 0.18 0.062900 218.097499 3.099757
6 GMM4 4 0.36 | 0.3 | 0.27 | 0.06 0.087929 268.462211 4.010402
7 GMM5 5 0.43 | 0.34 | 0.13 | 0.07 | 0.02 0.129101 205.782852 3.352396
8 CAH (Ward) 2 2 0.72 | 0.28 0.314124 657.064212 1.618113
9 CAH (Ward) 3 3 0.44 | 0.28 | 0.28 0.138385 456.284028 2.071393
10 CAH (Ward) 4 4 0.41 | 0.28 | 0.28 | 0.02 0.147383 348.454297 1.794500
11 CAH (Ward) 5 5 0.41 | 0.28 | 0.22 | 0.06 | 0.02 0.142491 292.850286 2.186182
12 CAH (average linkage) 2 2 1.0 | 0.0 0.552208 29.958308 0.928812
13 CAH (average linkage) 3 3 1.0 | 0.0 | 0.0 0.523717 18.536309 0.735997
14 CAH (average linkage) 4 4 1.0 | 0.0 | 0.0 | 0.0 0.478857 14.305294 0.755391
15 CAH (average linkage) 5 5 1.0 | 0.0 | 0.0 | 0.0 | 0.0 0.458367 14.762254 0.896412
16 CAH (single linkage) 2 2 1.0 | 0.0 0.587091 26.332043 0.549969
17 CAH (single linkage) 3 3 1.0 | 0.0 | 0.0 0.568624 17.823639 0.468717
18 CAH (single linkage) 4 4 1.0 | 0.0 | 0.0 | 0.0 0.540606 14.538207 0.434358
19 CAH (single linkage) 5 5 1.0 | 0.0 | 0.0 | 0.0 | 0.0 0.518696 12.704006 0.417355
20 CAH (complete linkage) 2 2 1.0 | 0.0 0.552208 29.958308 0.928812
21 CAH (complete linkage) 3 3 1.0 | 0.0 | 0.0 0.540075 21.966639 0.879092
22 CAH (complete linkage) 4 4 0.72 | 0.27 | 0.0 | 0.0 0.317609 248.338621 1.259276
23 CAH (complete linkage) 5 5 0.71 | 0.27 | 0.01 | 0.0 | 0.0 0.290965 202.334927 1.242878
24 OPTICS 28 0.89 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01... -0.295888 7.766509 1.447745
25 MeanShift 22 0.84 | 0.04 | 0.02 | 0.02 | 0.02 | 0.02 | 0.01... 0.197199 42.398326 1.371609
26 AffinityPropagation 142 0.04 | 0.03 | 0.03 | 0.02 | 0.02 | 0.02 | 0.02... 0.098856 40.876485 1.657521

Clusters sélectionnés :

  • KMeans 2

  • GMM 2

  • CAH (Ward 2)

Nous avons aussi étudié certains clusters avec 3 groupes, qui nous ont permis d’identifier certains individus, mais qui ne sont pas aussi intéressants et utilisables que les clusters avec 2 groupes.

Visualisation#

1def affiche_taille_clusters(nom_cluster):
2    plt.title("Taille des clusters")
3    sns.histplot(df_avec_clusters[nom_cluster], shrink=0.5)
4
5    plt.show()
 1def affiche_clusters_acp(nom_cluster):
 2    _, ax = plt.subplots(1, 2, figsize=(12, 5))
 3
 4    ax[0].set_title("Clusters sur les axes d'ACP 1-2")
 5    ax[1].set_title("Clusters sur les axes d'ACP 3-4")
 6
 7    sns.scatterplot(
 8        composantes_acp,
 9        x="ACP1",
10        y="ACP2",
11        hue=df_avec_clusters[nom_cluster],
12        alpha=0.8,
13        ax=ax[0],
14    )
15    sns.scatterplot(
16        composantes_acp,
17        x="ACP3",
18        y="ACP4",
19        hue=df_avec_clusters[nom_cluster],
20        alpha=0.8,
21        ax=ax[1],
22    )
23
24    plt.show()
 1def affiche_clusters_acm(nom_cluster):
 2    _, ax = plt.subplots(1, 2, figsize=(12, 5))
 3
 4    ax[0].set_title("Clusters sur les axes d'ACM 1-2")
 5    ax[1].set_title("Clusters sur les axes d'ACM 3-4")
 6
 7    sns.scatterplot(
 8        composantes_acm,
 9        x="ACM1",
10        y="ACM2",
11        hue=df_avec_clusters[nom_cluster],
12        alpha=0.8,
13        ax=ax[0],
14    )
15
16    sns.scatterplot(
17        composantes_acm,
18        x="ACM3",
19        y="ACM4",
20        hue=df_avec_clusters[nom_cluster],
21        alpha=0.8,
22        ax=ax[1],
23    )
24
25    plt.show()
 1def affiche_clusters_var_quanti(nom_cluster):
 2    """Affiche les variables quantitatives en fonction des clusters."""
 3    for var in var_numeriques:
 4        _, ax = plt.subplots(1, 2, figsize=(10, 3))
 5
 6        sns.boxplot(
 7            x=df[var],
 8            y=df_avec_clusters[nom_cluster],
 9            width=0.25,
10            ax=ax[0],
11        )
12
13        sns.histplot(
14            x=df[var],
15            kde=True,
16            ax=ax[1],
17            hue=df_avec_clusters[nom_cluster],
18            stat="probability",
19            common_norm=False,
20        )
21
22        plt.show()
 1def affiche_clusters_var_quali(nom_cluster):
 2    """Affiche les variables qualitatives en fonction des clusters et vice-versa."""
 3    for var in var_categoriques_fe:
 4        _, ax = plt.subplots(1, 2, figsize=(10, 4))
 5
 6        sns.histplot(
 7            x=df[var].astype(str),
 8            ax=ax[0],
 9            hue=df_avec_clusters[nom_cluster],
10            multiple="dodge",
11            shrink=0.5,
12            common_norm=True,
13        )
14
15        sns.histplot(
16            hue=df[var].astype(str),
17            ax=ax[1],
18            x=df_avec_clusters[nom_cluster],
19            multiple="dodge",
20            shrink=0.5,
21            common_norm=True,
22        )
23
24        plt.show()
1def affiche_clusters(nom_cluster):
2    """Affiche les variables en fonction des clusters."""
3    affiche_taille_clusters(nom_cluster)
4    affiche_clusters_acp(nom_cluster)
5    affiche_clusters_acm(nom_cluster)
6
7    affiche_clusters_var_quanti(nom_cluster)
8    affiche_clusters_var_quali(nom_cluster)
1affiche_clusters("KMeans2")
../_images/1d1b29e76420c746644bd3f94ac93e4ee2a6ec30d9bd07e883b443634512ff80.png ../_images/861b451db5ab8f004093097050b7e228957714086d2c503a8fd99965a567406e.png ../_images/bb3a907b696916eb09c855c6f0f281bdfea584d3020f0ae96edac1227eec6552.png ../_images/b2f93041e075f81d57e55ea073d80ff5a2ecff1567bc3bc4bbbdd74335d0d544.png ../_images/6612d5f2e37d53188f6afd1c0c4e73c0ddb6b4f5709d59e88ee009a26a8e315f.png ../_images/bff55f28e9a296440523eadda420a08eed78032c99111d7a16eba1baf6cf4572.png ../_images/8a0521ad78d6cc564088bcb12c8f9c0b583ac6f79c8c5ef7711abe1cd5b22e14.png ../_images/0708c1df1997ed677d49389710666405a455880a9e52146affb5065aba909d3e.png ../_images/8c00bf5dbe5871e5ce616d5974346218de3d37703e92a083bb279450b425a7a4.png ../_images/1591b49d0e7df2a8c1e3d86f016577645c40b110a30900a95f0fc042cbdbf185.png ../_images/86c8bac4e26389c324a304e5d50165ccbc65ef28191c51858e721d0db64989c7.png ../_images/0874eb40e59e59e4eb644c60c4d20f495531f26040d61c521de3dd4840abf26c.png ../_images/d0e09f7d55900c2dbcbef0d9e2f1c1dffe0f091f3e8124de34ccef806c13351d.png ../_images/8d1919cb7052c7033787843ee4a0f1c8d081c3bf6354441dfc57cc7611dc6190.png ../_images/fdb40d783666aa22bd9bcf53ca042812dafca51ecc942844e28cedb0b6dc737a.png ../_images/7f5eb3d0cd7115de8bbfcb0ffd074038f2ef2edc2cf0bc48ee274acf754325f6.png ../_images/a63da177a877394ff3fec3243afb3840b357138d3a4886b62207c4bf56204852.png ../_images/f30bc2df50145a57da69e3f3bf62f54c4c57ca1c4a1970202e0c870f019ff5cb.png ../_images/2ff94fee55cfa31829f1c3ae67ffaacfc3f4a7508242ad1d73ce1d996101a544.png ../_images/4dd7bc48f60202b0203da8bd3bba5b03e953f4cad6c3b2e9c2feb13d90049bc1.png ../_images/0b5458027a889ee59e82bdeea695dbf21bfcec290b2006f372d2ca8e3b7c1831.png ../_images/a6e934e0adf659cf03253021590b75c965c2594f19513663442e77ca49bea94a.png ../_images/15bcf075b404a818422db502c353171ac4439c76d3201a4c1e9158b7d5689cb9.png ../_images/c0358e24197661595769bc92e8d027f9bc1c7e9bd0ed5fd9bd5185116d0d41ca.png ../_images/9cc74390b94e14e2e8e49c8221c5c0467601946ff0c5ac0d432ea35ce8c35abe.png ../_images/e7936097b17d684653312691a04e00f6ba8a024561bf1702580f22213d4b3cfd.png ../_images/43364c2d7f6a4e82914c88aae67a8b8aeb58098fc630f5b5172b73a33febdddf.png ../_images/a74c87ccb7c6857d349b55340615d3f4845a1f8846b43b5eeaa059211655d8e8.png ../_images/44a15b8a156d9a8e3ae29f298a3eed35c715829bfa99b86b0e01e0781c6b1163.png ../_images/75b7b7838e02f8b862d8f7e38ea5860eb3b364fda6049157df62e65d231cec30.png
1affiche_clusters("GMM2")
../_images/cb351fbe4ba0e8c969aee4c5e4dbbde9cd35ce8a05bb5d8b9de33cf0c7b654ba.png ../_images/5a56e31aeb1c6ff34edfff0bc865633a54995ddb97daf1c55efc84d37b930442.png ../_images/268634882191e9e5658757d1b673cdebb994beccd29b7510b2dd37f37596a47c.png ../_images/f4f5c69046bf700aa3be7b78571b0c78b2c4a55384f01ff5ca28430874a5bb77.png ../_images/fd1f05aa20b0b39a6d13cd9f6f74aca894719e977e82ea85953764adef1235e8.png ../_images/5cc27530f9315e6f8465da7fb5f4cbd5e5ddf33179142c76d9abc5b70cf04a95.png ../_images/f0201f1db42916e41a94c5a64a6af1500bc65dc907a6d099d0dad8592ab61686.png ../_images/31947b03a7e86d0c64802380ad339220e41d6a33c095a709e2b0e627fc93276e.png ../_images/dfbb2e62cf66b65b6030565dc34846882051b939d87bff04437d8eab4e30d0c6.png ../_images/b0c611a8ed15e9d4b9e56ba6f919b55a103649fe1fccc471fb7f4f3e2fef935e.png ../_images/2de0f42afaaba61630a787b4ab8d90442a1dc586823e7472946de5758fb4d7fe.png ../_images/0688755ca89da592b95d03f1d1daaf9e98bf9574dcb955462534edb7702c5151.png ../_images/30cc552044752139e1a200a1a9462ac5f8b30278cf7b054e410ab2dfbc2bbde2.png ../_images/2e385ca978aa2698ed3c242ce6a6e6819e7d9eb5c4d7c215f31a52e18410e037.png ../_images/72b68636c61507e990cd6efff8ab2add1175085fb3873050628327d62ff8c8ef.png ../_images/60c4e2d69b06eace9d0f7e7decd068bf6fe2e5e12deeb29b4ac99354ae0eca4d.png ../_images/f3ef698da80501a2fa0d6cabd5a218706459e0ef1ffd8b722bf72018ef319abf.png ../_images/9b0663b11b3abdf2e5e49fd1ea3c8c9b97e3293edeca68c66b89e13bb7da4da6.png ../_images/58c4ade9fa9f15600af2226cd3a1e47a3a7342e0ebe58f86184fad3afeb6dd33.png ../_images/b20cce9256707408c753cb369cde13979bb47a56ecf4acd9dee9724522a281c5.png ../_images/6c8da56c20acc845094a4362f90b2ef9166462fb2caa60a8321454ff68fef202.png ../_images/bce48ad433f3e7bc34554e36347a3a1a46794e305dcf4c873b54f505dc589e55.png ../_images/7b5c2e20bd43264979a9f228d42319af8e964ceb52e32f5746e31fb8b95824bb.png ../_images/55c04d3c0e03f1764b62ec4bca074d8d6e91c9b15d985baf20f5b701d03a860c.png ../_images/16e6ee8077f761d2d9828a2d35c05062473836ff7c871be18ead7e262dc5c017.png ../_images/8b75e6318bde0419052352fdcbc266fe12a5589cd654805632dfd0d7bd29211a.png ../_images/477c362570284a8260940a6cc83dd0caf300dbc0990247376ed4480e09d8143a.png ../_images/24c5126d3e6096c26057d71c3a2052cd83eda297defe5cc3afb90932e466e423.png ../_images/b465808455718a612ad738f777303d1a888ccc87d747867e796930b1536dadfa.png ../_images/fbcd8e15777e20e0b416b7500554d817c3a70b81aa791530349918dc61fbd3c6.png
1affiche_clusters("CAH (Ward) 2")
../_images/0711fa44d96c8eb8ba5db5759a02464c6931ae51eaed1de7bd1b741f2f65ab60.png ../_images/66f8816139599a50b1106fbb7ac7a7d133c18d5b0dedb64e9e013ccdead9c21f.png ../_images/9724a890656e9144774bae932c425f27470a9d51b2ee0a6715bdda9dc42fba52.png ../_images/61962d516a841fe397e48f1eb3acd2039d5b128d547b639f940cb16a34d819e2.png ../_images/fb27a756a54008ee7fd7eca2d2e3cfdac59ba2c662a3bee2c501a10c8aa3fec2.png ../_images/1d3df938d973eb09dcdb659c491e78841511c210140d8b7d292d1f022424f677.png ../_images/e2d3e5e67f8c12afab2b07649b7ad4aa05294cbc335e67a550c97f9501bc8f11.png ../_images/2a440379b151d31c0c06685f372915a2f412435a0e218e58a07e158da020daaa.png ../_images/fbadde212a80da4b92b68a611d6036a32a73c46e15f3572b96dce5ffe6350e43.png ../_images/245bc8335c9d0d6a09e2d60c9a67be6af221236570e8ea8622b1706bf7f36e2d.png ../_images/f1948e73cfab653c709d44d3c09ac5ef6bf01f3f23634f36b99f1c1386fccd2b.png ../_images/881886e7a1724a86b19fa39f7480f9968ba3781b4d787e0d7ea88ee81b8eb696.png ../_images/95e0dc7c4422b390bf176d5f813e6a38edc2be9b257360d620609ad5c8e87e22.png ../_images/2f8fbc5c8e2ea5dd22b40d07da82ea6e77f5d50fa18cebf31b4114f73c0f7047.png ../_images/07aea331926150a2f0420e205ea1ccce2c1538987dc0aa6a8cc446da7567c4d1.png ../_images/4287ce9e101296522e17abbff5c0817a659098b8a65d387ba94f45c17371c837.png ../_images/2cf571e00bfd5b8ee6f36251a538aa4a7f0b5981867297c1e3ef9857ea631520.png ../_images/11bcde372e5749115e043060e21afdb058084cd38ab4924b7dfe779b41d5a32a.png ../_images/ae8d5b788a7c6c56c004c594c90df0e144b7886a57a35fe589c45921d81f041b.png ../_images/25f2afb8963a51662a52ea2da264d2248e85bdd83d1e022423a39eb8c9596088.png ../_images/c479d8689a2296ceabf52c4e1334a3c0b995f47ff3cb888ec5e8090d51410716.png ../_images/3383efb55b9c1a8d84f8467d79ec16ea49c22c3e3d8fd7270763051d2f7ae6a5.png ../_images/cf5d5ecf583f7f5a5ef261a1633ce71e968950fabe51d4b474c46fa4f144a8b1.png ../_images/30a9956598dc18e234a997fee1a429096e79fa2656b6841eae46abb799896b26.png ../_images/a5de9c3bc6118094d075d3727bdd1265d21aacefad81d32c3ecc7ad29cad1760.png ../_images/6643d2d71f717ea43447a0b8bad0303901eb2434d908ffa3478517e25019a0b9.png ../_images/70fb2cb7d43fd604946a296715f50d81c0a4cfd78e127bfb4630de19072923bb.png ../_images/2ce82684d127bb2c69f533c10e052e3701c8f36b397ddd7cacf14da646dfcadb.png ../_images/ef6af4a7c5a4d0369081353d6f3995dd82a265a73c1cdec0a494dec0ca788d4e.png ../_images/bdeae22c8fdc6269fb2921885b1d6a0c4460492b0b98ba00fa9d793dcab3f6bc.png

Conclusion#

Tableau. Description des clients types

Profil

Proportion

Education

Revenu

Campagnes
acceptées

Enfants

DĂ©penses

Année de
naissance

Site Internet

Clients qui achètent

30%

Bac - doctorat

Élevé

0-4

0 bas-âge
0-1 ado

Élevées

1970

Peu de visites

Clients qui n’achètent pas ou peu

68% - 70%

Brevet - doctorat

Moyen

0-1

0-3 enfants
(bas-âge et ado)

Peu élevées

1970

Beaucoup de visites

Clients qui n’achètent pas (n=3)

2%

Brevet

Le plus faible

0

0-1 bas-âge
0 ado

Aucune

1980

Beaucoup de visites

Notons aussi que parmi les clients qui achètent, la proportion d’acceptation des campagnes est beaucoup plus élevée.

Pour aller plus loin#

  • tester la stabilitĂ© des clusters (ici, l’initialisation des algorithmes a un impact significatif sur les clusters trouvĂ©s)

  • tester les diffĂ©rents paramètres de chacun des algorithmes de clusters pour comparaison

  • tester les algorithmes de clustering sur diffĂ©rents sous-ensembles de variables pour exhiber diffĂ©rents groupes

Sauvegarde des données#

1## todo: sauvegarder les clusters