Atrición de Empleados
El abandono de empleados especialmente cuando son buenos, cuesta mucho, por costos de recontratación y entrenamiento. Es posible identificar cuales empleados pueden abandonar el empleo por medio de metodos de clasificación. El siguiente código utiliza una red neuronal profunda (Deep Learning) llamada: Perceptrón Multicapas.
In [1]:
import numpy as np
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.model_selection import train_test_split
In [2]:
# semilla rnd para reproduccion posterior
seed = 7
np.random.seed(seed)
In [3]:
data = pd.read_csv("/home/german/Desktop/lambda_bayes/atrition/data/WA_Fn-UseC_-HR-Employee-Attrition.csv")
data.head(2)
Out[3]:
In [4]:
data.columns
Out[4]:
Correlacion de las variables para buscar variables que no son de utilidad y cuales se relacion mas.
In [5]:
data.corr()
Out[5]:
In [6]:
sns.distplot(data.MonthlyIncome[data.Gender == 'Male'], bins = np.linspace(0,20000,60))
sns.distplot(data.MonthlyIncome[data.Gender == 'Female'], bins = np.linspace(0,20000,60))
plt.legend(['Males','Females'])
Out[6]:
Eliminar columnas no necesarias y "one hot encoding"¶
In [7]:
#one hot encode
data[['BusinessTravel', 'Department', 'EducationField', 'Gender', 'JobRole', 'MaritalStatus', 'Over18', 'OverTime']]
data.columns
Out[7]:
In [8]:
#Eliminacion de columbas de datos constantes
del data['Over18']
del data['StandardHours']
del data['EmployeeCount']
In [9]:
data.dtypes #object type
#data[['Age']].hist
data = pd.get_dummies(data, prefix=['BusinessTravel', 'Department', 'EducationField', 'Gender', 'JobRole', \
'MaritalStatus', 'OverTime'], columns=['BusinessTravel', 'Department',\
'EducationField', 'Gender', 'JobRole', 'MaritalStatus', 'OverTime'])
data.columns
Out[9]:
In [10]:
#normalizacion de datos
def preprocess(raw_X):
from sklearn import preprocessing
X = preprocessing.scale(raw_X)
return X
In [11]:
#covertir variables yes y no a 0 y 1s.
yes_no = lambda x: 1 if x == 'Yes'else 0
data['Attrition']=data.Attrition.apply(yes_no)
y=data['Attrition']
del data['Attrition']
del data['EmployeeNumber']
Separar 'split' dataset en entrenamiento y prueba. Comprobar forma de datos para modelo¶
In [12]:
data = preprocess(data)
X_train, X_test, y_train, y_test = train_test_split(data, y, test_size=0.33, random_state=seed)
X_train[3]
X_train.shape
X_test.shape
y_train.shape
y_test.shape
Out[12]:
In [13]:
#importar modelo
import model
In [14]:
from keras.models import Sequential
from keras.layers import Dense, Dropout
import numpy as np
import pandas as pd
from keras.regularizers import l2
from keras.utils import np_utils
import seaborn as sns
#'''
drop=0.3
# create model
model = Sequential()
model.add(Dense(102, input_dim=51, kernel_initializer='uniform', activation='relu'))
model.add(Dropout(drop))
#model.add(Dense(80, input_dim=51, kernel_initializer='uniform', activation='relu'))
#model.add(Dropout(drop))
model.add(Dense(40, input_dim=51, kernel_initializer='uniform', activation='relu'))
model.add(Dropout(drop))
model.add(Dense(1, activation='sigmoid'))
#'''
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
In [15]:
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
In [16]:
model.fit(X_train, y_train, epochs=100, batch_size=30, verbose=2)
Out[16]:
Exactitud de la prediccion final del modelo 85.6%¶
In [17]:
metrics =model.evaluate(X_test, y_test, batch_size=128, verbose=2)
print 'accuracy:'
print metrics[1]*100
In [ ]: