Data Scientist at Cdiscount

In this tutorial, you will see how you can use a simple Keras model to train and evaluate an artificial neural network for multi-class classification problems.

Part-of-Speech tagging is a well-known task in Natural Language Processing. It refers to the process of classifying words into their parts of speech (also known as words classes or lexical categories). This is a supervised learning approach.


Artificial neural networks have been applied successfully to compute POS tagging with great performance. We will focus on the Multilayer Perceptron Network, which is a very popular network architecture, considered as the state of the art on Part-of-Speech tagging problems.

Let’s put it into practice

In this post you will get a quick tutorial on how to implement a simple Multilayer Perceptron in Keras and train it on an annotated corpus.

Ensuring reproducibility

In order to be sure that our experiences can be achieved again we need to fix the random seed for reproducibility:

Getting an annotated corpus

The Penn Treebank is an annotated corpus of POS tags. A sample is available in the NLTK python library which contains a lot of corpora that can be used to train and test some NLP models.

First of all, we download the annotated corpus:

Then we load the tagged sentences…

… and visualize one:

This yields a list of tuples (term, tag).

This is a multi-class classification problem with more than forty different classes.
POS tagging on Treebank corpus is a well-known problem and we can expect to achieve a model accuracy larger than 95%.

Datasets preprocessing for supervised learning

We split our tagged sentences into 3 datasets :

  • a training dataset which corresponds to the sample data used to fit the model,
  • a validation dataset used to tune the parameters of the classifier, for example to choose the number of units in the neural network,
  • a test dataset used only to assess the performance of the classifier.

We use approximately 60% of the tagged sentences for training, 20% as the validation set and 20% to evaluate our model.

Feature engineering

Our set of features is very simple.
For each term we create a dictionnary of features depending on the sentence where the term has been extracted from.
These properties could include informations about previous and next words as well as prefixes and suffixes.

We map our list of sentences to a list of dict features.

For training, validation and testing sentences, we split the attributes into X (input variables) and y (output variables).

Features encoding

Our neural network takes vectors as inputs, so we need to convert our dict features to vectors.
sklearn builtin function DictVectorizer provides a straightforward way to do that.

Our y vectors must be encoded. The output variable contains 49 different string values that are encoded as integers.

And then we need to convert those encoded values to dummy variables (one-hot encoding).

Building a Keras model

Keras is a high-level framework for designing and running neural networks on multiple backends like TensorFlow, Theano or CNTK.


We want to create one of the most basic neural networks: the Multilayer Perceptron. This kind of linear stack of layers can easily be made with the Sequential model. This model will contain an input layer, an hidden layer, and an output layer.
To overcome overfitting, we use dropout regularization. We set the dropout rate to 20%, meaning that 20% of the randomly selected neurons are ignored during training at each update cycle.

We use Rectified Linear Units (ReLU) activations for the hidden layers as they are the simplest non-linear activation functions available.

For multi-class classification, we may want to convert the units outputs to probabilities, which can be done using the softmax function. We decide to use the categorical cross-entropy loss function.
Finally, we choose Adam optimizer as it seems to be well suited to classification tasks.

Creating a wrapper between Keras API and Scikit-Learn

Keras provides a wrapper called KerasClassifier which implements the Scikit-Learn classifier interface.

All model parameters are defined below.
We need to provide a function that returns the structure of a neural network (build_fn).
The number of hidden neurons and the batch size are choose quite arbitrarily. We set the number of epochs to 5 because with more iterations the Multilayer Perceptron starts overfitting (even with Dropout Regularization).

Training our Keras model

Finally, we can train our Multilayer perceptron on train dataset.

With the callback history provided we can visualize the model log loss and accuracy against time.


After 2 epochs, we see that our model begins to overfit.

Evaluating our Multilayer Perceptron

Since our model is trained, we can evaluate it (compute its accuracy):

We are pretty close to 96% accuracy on test dataset, that is quite impressive when you look at the basic features we injected in the model.
Keep also in mind that 100% accuracy is not possible even for human annotators. We estimate humans can do Part-of-Speech tagging at about 98% accuracy.

Visualizing the model


Save the Keras model

Saving a Keras model is pretty simple as a method is provided natively:

This saves the architecture of the model, the weights as well as the training configuration (loss, optimizer).


  • Keras: The Python Deep Learning library: [doc]
  • Adam: A Method for Stochastic Optimization: [paper]
  • Improving neural networks by preventing co-adaptation of feature detectors: [paper]

In this post, you learn how to define and evaluate accuracy of a neural network for multi-class classification using the Keras library.
The script used to illustrate this post is provided here : [.py|.ipynb].


Data Scientist @Cdiscount

Comments are closed.