logo_ige
logo_uga

Machine Learning to predict location of ice recrystallization



May - July 2022
UGA and IGE internship
M1 Statistics and Data Sciences (SSD)

Renan MANCEAUX
Supervisor : Thomas CHAUVE

Dimensional Reduction


8.6. t-distribued Stochastic Neighbor Embleding (t-SNE) on TJ dataset#

Exploration process before apply machine learning on craft data. Not-linear reduction of dimensions calculating similarities with Student-distribued probabilities to project individuals into 2 dimensional space, keeping smallest distances within neighbors.

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import xarray
import sklearn
import plotly.express as px

import sys
sys.path.append("../../scripts/")
import utils

from sklearn.manifold import TSNE

8.6.1. Loading data#

TJ_CI02 = utils.load_tj_data("../../data/TJ/TJ_CI02.npy").dropna()
TJ_CI04 = utils.load_tj_data("../../data/TJ/TJ_CI04.npy").dropna()
TJ_CI06 = utils.load_tj_data("../../data/TJ/TJ_CI06.npy").dropna()
TJ_CI09 = utils.load_tj_data("../../data/TJ/TJ_CI09.npy").dropna()
TJ_CI21 = utils.load_tj_data("../../data/TJ/TJ_CI21.npy").dropna()

TJ_CI02['batch'] = ['CI02'] * np.shape(TJ_CI02)[0]
TJ_CI04['batch'] = ['CI04'] * np.shape(TJ_CI04)[0]
TJ_CI06['batch'] = ['CI06'] * np.shape(TJ_CI06)[0]
TJ_CI09['batch'] = ['CI09'] * np.shape(TJ_CI09)[0]
TJ_CI21['batch'] = ['CI21'] * np.shape(TJ_CI21)[0]

data = pd.concat((TJ_CI02,TJ_CI04,TJ_CI06,TJ_CI09,TJ_CI21))

8.6.2. t-SNE on CI02 geom and craft data#

8.6.2.1. Variables selection#

data['RX'] = data['RX'].astype(object)
data['batch'] = data['batch'].astype(object)
X = data.loc[:,((data.columns != 'RX')&(data.columns != 'batch'))] 
y = data['RX']
b = data['batch'] 

8.6.2.2. Normalization#

norm_X = (X - X.mean())/X.std()

8.6.2.3. Apply t-SNE for class#

tsne = TSNE(n_components=2,perplexity=10,n_iter=2000, learning_rate='auto',init='random',verbose=1)
res_tsne  = tsne.fit_transform(norm_X)
[t-SNE] Computing 31 nearest neighbors...
[t-SNE] Indexed 722 samples in 0.000s...
[t-SNE] Computed neighbors for 722 samples in 0.162s...
[t-SNE] Computed conditional probabilities for sample 722 / 722
[t-SNE] Mean sigma: 1.123870
[t-SNE] KL divergence after 250 iterations with early exaggeration: 77.433929
[t-SNE] KL divergence after 2000 iterations: 1.281225

8.6.2.4. Projection of individuals on 2 dimensional space with labeling for pixel status#

import seaborn as sns

data['tsne-2d-one'] = res_tsne[:,0]
data['tsne-2d-two'] = res_tsne[:,1]

plt.figure(figsize=(16,10))
sns.scatterplot(
    x="tsne-2d-one", y="tsne-2d-two",
    hue="RX",
    palette=sns.color_palette("hls", 2),
    data=data,
    legend="full",
    alpha=1
)
plt.show()
../../_images/t_SNE_tj_14_0.png
plt.figure(figsize=(16,10))
sns.scatterplot(
    x="tsne-2d-one", y="tsne-2d-two",
    hue="batch",
    palette=sns.color_palette("colorblind", 5),
    data=data,
    legend="full",
    alpha=1
)
plt.show()
../../_images/t_SNE_tj_15_0.png