Tech Blog

Introduction to Keras: The Python Deep Learning library

WHAT IS KERAS?

It is a high-level neural networks API that is written in Python. The great thing about Keras is that is capable of running on top of TensorFlow, Theano or CNTK. Two of them, Theano and TensorFlow are the top numerical platforms in Python that provide the basis for Deep Learning research and development. Both are powerful libraries, but can be difficult to use directly for creating deep learning models. Keras Python library helps with providing a clean way to create a range of deep learning models on top of Theano or TensorFlow.

WHY YOU SHOULD USE KERAS?

Keras was developed and maintained with a focus on enabling fast experimentation by Google engineer François Chollet. It was developed to make implementing deep learning models fast and easy for research and development.

Introduction to Keras: The Python Deep Learning library

You should use Keras, if you need a deep learning library that is:

  • Easy and fast prototyping:
    •       User friendliness – It’s an API designed for human beings and the user experience is in the first plan, it offers consistent and simple APIs and it provides clear and actionable feedback upon user error.
    •       Modularity – A model can be understood as a sequence or a graph alone. For example – neural layers, cost functions, optimizers, initialization schemes, activation functions, regularization schemes are all standalone modules that you can combine to create new models.
    •       Extensibility – It’s simple to add new models and use within the framework. It’s intended for researchers to explore new ideas.
  • Supports convolutional networks and recurrent networks, as well as combinations of the two.
  • Runs seamlessly on CPU and GPU.
  • The models are described in Python code that is compact, easier to debug and allows for simplicity of expansion. There are no separate model files with custom file formats and everything is in native Python.

HOW TO INSTALL ?

As we said Keras abstracts way of building models and as back-end computation engine it can be any supported engine. In our case we will use TensorFlowPython 3.6 and miniconda  package and environment management system.

To install miniconda navigate yourself to this page and choose installer for your OS. Once when installation process is done, createenvironment with command below and this environment further will be used.

Fetching package metadata ………….
Solving package specifications: .Package plan for installation in environment 
…/miniconda3/envs/kerasEnv:The following NEW packages will be INSTALLED:    
bzip2:           1.0.6-1    conda-forge
ca-certificates: 2018.8.24-ha4d7672_0 conda-forge
certifi:         2018.8.24-py36_1001  conda-forge
libffi:          3.2.1-hfc679d8_5    conda-forge
ncurses:         6.1-hfc679d8_1    conda-forge
openssl:         1.0.2p-h470a237_0    conda-forge
pip:             18.1-py36_1000    conda-forge
python:          3.6.6-h5001a0f_0    conda-forge
readline:        7.0-haf1bffa_1    conda-forge
setuptools:      40.4.3-py36_0    conda-forge
sqlite:          3.25.2-hb1c47c0_0    conda-forge
tk:             8.6.8-ha92aebf_0    conda-forge
wheel:           0.32.1-py36_0    conda-forge
xz:             5.2.4-h470a237_1    conda-forge
zlib:            1.2.11-h470a237_3    conda-forge

Proceed ([y]/n)? ysqlite-3.25.2- 100% |#################################################| Time: 0:00:01   1.70 MB/s
setuptools-40. 100% |################################################| Time: 0:00:00   1.82 MB/s
wheel-0.32.1-p 100% |################################################| Time: 0:00:00   4.76 MB/s
pip-18.1-py36_ 100% |################################################| Time: 0:00:01   1.56 MB/s
#
# To activate this environment, use:
# > source activate kerasEnv
#
# To deactivate an active environment, use:
# > source deactivate
#

source activate kerasEn

To activate newly created environment type:

Now let’s install TensorFlow and Keras with the pip.

pip install tensorflow
pip install keras
pip install scikit-learnpip install pandas

Once when installation is done you will be able to see installed libraries in the list after typing:

pip freeze

 

…
Keras==2.2.4
Keras-Applications==1.0.6
Keras-Preprocessing==1.0.5
tensorboard==1.11.0
tensorflow==1.11.0
…

Additional libraries like scikit-learn and pandas which we installed are used for dataset manipulation and other auxiliary operations on the dataset.

INTRO IN KERAS

Keras uses one of the predefined computation engines to perform computations on tensors. A tensor is a multidimensional array used in backends for efficient symbolic computations and represent fundamental building blocks for creating neural networks and other machine learning algorithms. For more information about tensors you can refer on article where we investigate more about tensors.

There are two ways for creating models in Keras, sequential and functional composition. With sequential different predefined models are stacked in a linear pipeline of layers.

Example of sequential model creation looks like:

from keras.models import Sequential
from keras.layers import Dense, Activationmodel = Sequential([
   Dense(HIDDEN_N, input_shape=(1024,)),
   Activation(‘relu’),
   Dense(HIDDEN_N_1),
   Activation(‘softmax’),
])

Example 1. Sequential model creation

The second way for creating models is using functional API where is possible to define complex models with shared layers or multiple output models.

Example of functional way for creating model looks like:

from keras.layers import Input, Dense
from keras.models import Modelinputs = Input(shape=(784,))
x = Dense(64, activation=’relu’)(inputs)
x = Dense(64, activation=’relu’)(x)
predictions = Dense(10, activation=’softmax’)(x)model = Model(inputs=inputs, outputs=predictions)

 Example 2. Functional model creation

Keras has a number of prebuilt layers available for use. Check the table below.

Layer Description Signature
Dense Simple fully connected neural network layer which produces the output of activation function: act(w + bias) and act is one of the provided activation function like softmaxrelu etc. keras.layers.Dense(units, activation=None, use_bias=True, kernel_initializer=‘glorot_uniform’, bias_initializer=‘zeros’, kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)
Activation Applies specified activation function to the output. Activation functions which are available: softmax, elu, selu, softplus, softsign, relu, tanh, sigmoid, hard_sigmoid, linear. keras.layers.Activation(activation)
Dropout Applies dropout regularization to the inputs at specified rate. Interesting article how dropout may help in preventing overfitting. keras.layers.Dropout(rate, noise_shape=None, seed=None)
Flatten Flattens the input. For example for input shape (None, 64, 32, 32) produces (None, 65536) keras.layers.Flatten(data_format=None)
Reshape Converts input to specific shape. keras.layers.Reshape(target_shape)
Permute Reorders the input dimensions with defined pattern. keras.layers.Permute(dims)
RepeatVector Repats the input times. For example for input shape (samples, features) with times to repeat produces output shape (samples, n, features) keras.layers.RepeatVector(n)
Lambda Wraps arbitrary expression as Layer object. keras.layers.Lambda(function, output_shape=None, mask=None, arguments=None)
ActivityRegularization Applies an update to the cost function based input activity. keras.layers.ActivityRegularization(l1=0.0, l2=0.0)
Masking Masks a sequence by using a mask value to skip timestamp. keras.layers.Masking(mask_value=0.0)
Conv1D,
Conv2D,
Conv3D
Applies convulations. keras.layers.Conv1D(filters, kernel_size, strides=1, padding=‘valid’, data_format=‘channels_last’, dilation_rate=1, activation=None, use_bias=True, kernel_initializer=‘glorot_uniform’, bias_initializer=‘zeros’, kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)

keras.layers.Conv2D(filters, kernel_size, strides=(1, 1), padding=‘valid’, data_format=None, dilation_rate=(1, 1), activation=None, use_bias=True, kernel_initializer=‘glorot_uniform’, bias_initializer=‘zeros’, kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)

keras.layers.Conv3D(filters, kernel_size, strides=(1, 1, 1), padding=‘valid’, data_format=None, dilation_rate=(1, 1, 1), activation=None, use_bias=True, kernel_initializer=‘glorot_uniform’, bias_initializer=‘zeros’, kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)

SeparableConv1D,
SeparableConv2D
Applies separable 1D and 2D convolution. keras.layers.SeparableConv1D(filters, kernel_size, strides=1, padding=‘valid’, data_format=‘channels_last’, dilation_rate=1, depth_multiplier=1, activation=None, use_bias=True, depthwise_initializer=‘glorot_uniform’, pointwise_initializer=‘glorot_uniform’, bias_initializer=‘zeros’, depthwise_regularizer=None, pointwise_regularizer=None, bias_regularizer=None, activity_regularizer=None, depthwise_constraint=None, pointwise_constraint=None, bias_constraint=None)
Conv2DTranspose,
Conv3DTranspose
Transposed convolution layer keras.layers.Conv2DTranspose(filters, kernel_size, strides=(1, 1), padding=‘valid’, output_padding=None, data_format=None, dilation_rate=(1, 1), activation=None, use_bias=True, kernel_initializer=‘glorot_uniform’, bias_initializer=‘zeros’, kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)

keras.layers.Conv3DTranspose(filters, kernel_size, strides=(1, 1, 1), padding=‘valid’, output_padding=None, data_format=None, activation=None, use_bias=True, kernel_initializer=‘glorot_uniform’, bias_initializer=‘zeros’, kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)

Cropping1D,
Cropping2D,
Cropping3D
Cropping layer for 1D, 2D and 3D inputs. keras.layers.Cropping1D(cropping=(1, 1))

keras.layers.Cropping2D(cropping=((0, 0), (0, 0)), data_format=None)

keras.layers.Cropping3D(cropping=((1, 1), (1, 1), (1, 1)), data_format=None)

UpSampling1D,
UpSampling2D,
UpSampling3D
Upsampling for 1D, 2D and 3D inputs. keras.layers.UpSampling1D(size=2)
keras.layers.UpSampling2D(size=(2, 2), data_format=None, interpolation=‘nearest’)
keras.layers.UpSampling3D(size=(2, 2, 2), data_format=None)
ZeroPadding1D,
ZeroPadding2D,
ZeroPadding3D
Zero-padding layer for 1D, 2D and 3D inputs. keras.layers.ZeroPadding1D(padding=1)
MaxPooling1D,
MaxPooling2D,
MaxPooling3D
Max pooling operation for 1D, 2D and 3D inputs. keras.layers.MaxPooling1D(pool_size=2, strides=None, padding=‘valid’, data_format=‘channels_last’)
keras.layers.MaxPooling2D(pool_size=(2, 2), strides=None, padding=‘valid’, data_format=None)
keras.layers.MaxPooling3D(pool_size=(2, 2, 2), strides=None, padding=‘valid’, data_format=None)
AveragePooling1D,
AveragePooling2D,
AveragePooling3D
Average pooling operation for 1D, 2D and 3D inputs. keras.layers.AveragePooling1D(pool_size=2, strides=None, padding=‘valid’, data_format=‘channels_last’)
keras.layers.AveragePooling2D(pool_size=(2, 2), strides=None, padding=‘valid’, data_format=None)
keras.layers.AveragePooling3D(pool_size=(2, 2, 2), strides=None, padding=‘valid’, data_format=None)
GlobalMaxPooling1D,
GlobalMaxPooling2D,
GlobalMaxPooling3D
Global max pooling operation for 1D, 2D and 3D inputs. keras.layers.GlobalMaxPooling1D(data_format=‘channels_last’)
keras.layers.GlobalMaxPooling2D(data_format=None)
keras.layers.GlobalMaxPooling3D(data_format=None)
GlobalAveragePooling1D,
GlobalAveragePooling2D,
GlobalAveragePooling3D
Global average pooling operation 1D, 2D and 3D inputs. keras.layers.GlobalAveragePooling1D(data_format=‘channels_last’)
keras.layers.GlobalAveragePooling2D(data_format=None)
keras.layers.GlobalAveragePooling3D(data_format=None)
LocallyConnected1D,
LocallyConnected2D
Locally connected layer for 1D and 2D inputs. keras.layers.LocallyConnected1D(filters, kernel_size, strides=1, padding=‘valid’, data_format=None, activation=None, use_bias=True, kernel_initializer=‘glorot_uniform’, bias_initializer=‘zeros’, kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)
keras.layers.LocallyConnected2D(filters, kernel_size, strides=(1, 1), padding=‘valid’, data_format=None, activation=None, use_bias=True, kernel_initializer=‘glorot_uniform’, bias_initializer=‘zeros’, kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)
SimpleRNN,
GRU,
LSTM
Fully connected RNN, Gated Recurrent Unit, Long Short-Term Memory layer. keras.layers.SimpleRNN(units, activation=‘tanh’, use_bias=True, kernel_initializer=‘glorot_uniform’, recurrent_initializer=‘orthogonal’, bias_initializer=‘zeros’, kernel_regularizer=None, recurrent_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, recurrent_constraint=None, bias_constraint=None, dropout=0.0, recurrent_dropout=0.0, return_sequences=False, return_state=False, go_backwards=False, stateful=False, unroll=False)
keras.layers.GRU(units, activation=‘tanh’, recurrent_activation=‘hard_sigmoid’, use_bias=True, kernel_initializer=‘glorot_uniform’, recurrent_initializer=‘orthogonal’, bias_initializer=‘zeros’, kernel_regularizer=None, recurrent_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, recurrent_constraint=None, bias_constraint=None, dropout=0.0, recurrent_dropout=0.0, implementation=1, return_sequences=False, return_state=False, go_backwards=False, stateful=False, unroll=False, reset_after=False)
keras.layers.LSTM(units, activation=‘tanh’, recurrent_activation=‘hard_sigmoid’, use_bias=True, kernel_initializer=‘glorot_uniform’, recurrent_initializer=‘orthogonal’, bias_initializer=‘zeros’, unit_forget_bias=True, kernel_regularizer=None, recurrent_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, recurrent_constraint=None, bias_constraint=None, dropout=0.0, recurrent_dropout=0.0, implementation=1, return_sequences=False, return_state=False, go_backwards=False, stateful=False, unroll=False)
Embedding This layer takes a 2D tensor of shape (batch_size, seq_length)and produces output shape (batch_size, seq_length, output_dim) keras.layers.Embedding(input_dim, output_dim, embeddings_initializer=‘uniform’, embeddings_regularizer=None, activity_regularizer=None, embeddings_constraint=None, mask_zero=False, input_length=None)
Add,
Multiply,
Average,
Maximum,
Concatenate,
Dot
Layers compute the element-wise addition of input tensors. keras.layers.Add()
keras.layers.Multiply()
keras.layers.Average()
keras.layers.Maximum()
keras.layers.Concatenate(axis=-1)
keras.layers.Dot(axes, normalize=False)
LeakyReLU,
PReLU,
ThresholdedReLU,
ELU
Layer computes activation function on inputs. keras.layers.LeakyReLU(alpha=0.3)
keras.layers.PReLU(alpha_initializer=‘zeros’, alpha_regularizer=None, alpha_constraint=None, shared_axes=None)
keras.layers.ThresholdedReLU(theta=1.0)
keras.layers.ELU(alpha=1.0)
BatchNormalization Layer normalize the outputs of the previous layer such that the output of this layer is approximated to have a mean close to zero and a standard deviation close to 1 keras.layers.BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001, center=True, scale=True, beta_initializer=‘zeros’, gamma_initializer=‘ones’, moving_mean_initializer=‘zeros’, moving_variance_initializer=‘ones’, beta_regularizer=None, gamma_regularizer=None, beta_constraint=None, gamma_constraint=None)

Table 1: Layers in Keras

One more thing before we move on application and code example. When you create model you have to compile it to start learning phase. Let’s take example 1 where we have model variable which has (reference) on our model and in order to compile it we have to call method compile that has following signature:

compile(optimizer, loss=None, metrics=None, loss_weights=None, sample_weight_mode=None, weighted_metrics=None, target_tensors=None)

So our code from example 1 will look like:

model.compile(
 loss=‘<loss_function’>,
 optimizer=‘adam’,
 metrics=‘<defined_metrics>’
)
Argument Description
optimizer Function (which can be custom or one of the provided in Keras) used to update parameters in the optimization iterations
loss Objective function (or optimization score function) which evaluates how good model perform
metrics List of metrics that needs to be collected while training the model. Keras as well has some predefined metrics which may be used and as well allows custom metrics to be defined

Table 2. Argument explanations from compile method

Once when model is compiled now it is time to call fit method that has following signature:

fit(x=None, y=None, batch_size=None, epochs=1, verbose=1, callbacks=None, validation_split=0.0, validation_data=None, shuffle=True, class_weight=None, sample_weight=None, initial_epoch=0, steps_per_epoch=None, validation_steps=None)

So together with our example 1 it will look like:

model.fit(X_train, Y_train)

KERAS IN ACTION

In this section we will work on model which will help us to predict man of the match from FIFA 2018 statistics which is available on kaggle site. This dataset contains 27 columns starting with a date which represent date when game is played, team and opponent which represent teams, goal scored, ball possession etc. Once when we get feature set for our prediction problem then we will apply logistic regression in Keras.

Here is the full code of logistic regression in Keras for predicting which team will win man of the match:

import pandas as pd
import numpy as npfrom keras.optimizers import RMSprop

from keras.models import Sequential
from keras.layers import Dense
from keras.utils import np_utils
from keras.regularizers import L1L2from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_splitNUMBER_OF_FEATURES = 21

EPOCHS = 1300
BATCH_SIZE = 15

value_scale = StandardScaler()

# load dataset
dataset = pd.read_csv(‘../data/FIFA_2018_Statistics.csv’)

# let’s clean and prepare dataset
dataset = dataset.drop([‘Own goal Time’, ‘1st Goal’, ‘Team’, ‘Date’, ‘Round’, ‘Opponent’], axis=1)
dataset[‘PSO’] = pd.get_dummies(dataset.PSO).Yes
dataset[‘Man of the Match’] = pd.get_dummies(dataset[‘Man of the Match’]).Yes

# add new column which will shows who is the ‘winner’
dataset[‘winner’] = dataset.groupby(
       np.repeat(
           [
              n for n in range(len(dataset) // 2)
           ], 2)
  )[‘Goal Scored’].transform(lambda x: x == max(x))

dataset[‘winner’] = dataset[‘winner’].map({True: 1, False: 0})

# get train and test data
X = value_scale.fit_transform(dataset.drop(‘Man of the Match’, axis=1))
X[np.isnan(X)] = 0
Y = dataset[‘Man of the Match’]
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, random_state=42, test_size=0.30)
Y_train = np_utils.to_categorical(Y_train, 2)
Y_test = np_utils.to_categorical(Y_test, 2)

if __name__ == “__main__”:

optimizer = RMSprop(0.001)
model = Sequential()

model.add(
  Dense(
     2,
     activation=“softmax”,
     input_dim=NUMBER_OF_FEATURES, 
     kernel_regularizer=L1L2(l1=0.1, l2=0.1)
  )
)

model.compile(
  optimizer,
  “categorical_crossentropy”,
  [‘accuracy’]
)

model.fit(
  X_train, Y_train,
  batch_size=BATCH_SIZE, epochs=EPOCHS,
  validation_data=(X_test, Y_test),
  verbose=2
)

After loading dataset from file, we prepared features for predicting man of the match title where we dropped some irrelevant information like date, round, own goal time etc. At this point some deeper analysis across features is not done and we removed specific columns (based on our intuition). In case if we missed something, please let us know, we will be grateful. One more thing that we added here is column winner  which we found as interesting feature that can be used in logistic regression. Then we convert flag man of the match with Keras utils function to_categororical in order to get binary class matrix and then we split whole dataset on train and test data. We used train data for training phase and test data are used to validate our model. Test data are not seen by model in time of training. One more thing that we added is scaling our data with StandardScaler from sklearn library (because some algorithms works better when data is normalized). Not sure if we need it here, but let’s try what will happen if you delete scaling and how that will affect accuracy (if happens).

Once when we are done with dataset preparation we are proceeding to model creation. As we mentioned logistic regression will be used and we will create our model with sequential composition technique.

Our first layer is Dense which as first parameter has dimensionality of the output space (in our case is 2 since we have two vectors that represents flag that team won or not the man of the match title). Second parameter is activation function which in our case is softmax function and last kernel regularizer that helps is avoiding overfitting problem (we used L1L2).

Then we compiled model with RMSprop optimizer and as loss function we used categorical_crossentropy. Last metrics that we are interested is accuracy.  In fit method, we provided train and test data as well number of epoch and batch size.

 

In RMSprop optimizer we only provided learning rate which represent how much we adjusting weights of our model with respect of the loss function.

Now let’s start your model and see what will happen. In our case we set number of epoch to be 600 and batch size to be 32 and we got that validation accuracy is 79.49%.

Epoch 1299/1300
– 0s – loss: 0.5640 – acc: 0.8764 – val_loss: 0.6787 – val_acc: 0.7949
Epoch 1300/1300
– 0s – loss: 0.5639 – acc: 0.8764 – val_loss: 0.6781 – val_acc: 0.7949

If you have any questions, suggestions or comments, please be free to share with us.

Full code can be found on GitHub repository https://github.com/dincaus/BPU-Keras-Log-Regression-1.

Zimgo