- 1 Project Card
- 2 Week 14: Going deeper in LSTM
- 3 Week 13: Studying the prediction in sinusoidal functions with non-recurrent networks
- 4 Week 11-12: Studying the prediction in other functions with non-recurrent networks
- 5 Week 10: Studying the prediction in linear functions with non-recurrent networks
- 6 Week 5: Adapting scripts
- 7 Week 4: Sequence generator modification
- 8 Week 3: First predictor
- 9 Week 2: LSTM
- 10 Week 1: Sequence generator
Project Name: Predicting images, learning time sequences
Author: Nuria Oyaga de Frutos [firstname.lastname@example.org]
Academic Year: 2017/2018
Degree: Computer Vision Master (URJC)
GitHub Repositories: TFM-Nuria-Oyaga
Tags: Deep Learning, Prediction, Time sequences
Week 14: Going deeper in LSTM
This week I was studying more thoroughly LSTM networks. After read different information sources, finally I used this blog for a better understanding of this kind of networks.
On the other hand, I have advanced in the implementation of these networks with the framework used in this project, Keras. The first thing to keep in mind when we want to develop an LSTM network is that it works better if the data is normalized in the interval [0,1] due to its activation functions. We make this in our code with the following lines:
scaler = MinMaxScaler(feature_range=(0, 1)) train_set = scaler.fit_transform(train_set) where the MinMAxScaler make the operation y = (x - min) / (max - min)
The next consideration is that the input to every LSTM layer must be three-dimensional and these three dimensions input are:
- Samples. One sequence is one sample. A batch is comprised of one or more samples.
- Time Steps. One time step is one point of observation in the sample.
- Features. One feature is one observation at a time step.
To reshape our data the following lines are necesary:
trainX = np.reshape(trainX, (trainX.shape, trainX.shape, 1)) where trainX are the observations with which the prediction will be made for each sample
Once the data are adequate for network training we must define it as follows:
mod = Sequential() mod.add(LSTM(neurons, input_shape=(dim, dim), activation='tanh', recurrent_activation='hard_sigmoid', unroll=True)) mod.add(Dense(1)) mod.compile(loss='mean_squared_error', optimizer='adam')
In this code it is important to take into account the following considerations:
- The parameter neurons refers to the number of LSTM cells, whose structure is explained in the blog, are implemented in the layer.
- In the input shape you must indicate the value of time steps(dim) and the features of each of them(dim).
- The parameter unroll indicates that the network will be unrolled.
The next steps in the LSTM training are the same that in others kinds of nets in Keras.
Week 13: Studying the prediction in sinusoidal functions with non-recurrent networks
After achieving a good result with simpler functions such as linear and quadratic we have gone a step further and make the prediction with non-recurrent neural networks of more complex functions, the sinusoidal.
For this type of functions I have developed a set of data in the same way as in the previous functions types, by modifying three parameters that in this case are the amplitude, frequency and phase shift of a sine. The distribution of these parameters is shown below.
Once the data set was created, I performed several tests modifying the number of layers in the neural network and the number of neurons in each of them. After obtaining the trained networks they were compared using the script that was developed for it and the following results were obtained:
The test was performed with the best network obtained in the training(the pink one) and the results were as follows:
Although this net provide us the best results the average of the absolute error and the relative error are not as good as we would like. To try to understand something more about the error obtained, I get the histogram of the relative error that does not show the ideal behavior which would be to obtain almost all the errors in the first bar.
These results show that this type of networks present a limitation as data become complicated, forcing us to continue the process with more complicated networks: the recurrent neural networks and specifically LSTM. In the coming weeks we will explore these types of networks and adapt them to our needs.
Week 11-12: Studying the prediction in other functions with non-recurrent networks
After the first approximation made last week on the prediction with neural networks without recurrence, the study has been extended to other types of functions and there has been a new way to limitate the parameters. On this last point, instead of limiting the parameters directly according to their values, as we did last week with the slope of the linear function, it has been decided to limit the maximum and minimum value that the function can take, that is, the range.
With these new databases I have been trained different networks that allow the prediction in linear, quadratic and sinusoidal functions.
Due to the limitation mentioned above, the distribution of the parameters used for the generation of samples in the three databases is modified with respect to the uniform distribution with which they were generated. For the analysis of these new distributions I have created a script that allows to show the histogram for each parameter and for the combination of them that gives rise to the final parameters. In the following images you can see the resulting distribution.
After this analysis of the dataset I trained a basic network (Simple MLP) with early stopping which, due to the simplicity of the problem of linear functions, works very well with very few epochs in training. The following is the result of the evaluation of the obtained network that is carried out with the test set.
For the quadratic functions datasets the same previous study is done, it also shows a 3D graph with the distribution of parameters in each database.
The gap that exists in all the dataset in the middle of the 3D chart is because the definition of a quadratic function makes the parameter "a" can not take the value 0, therefore all those values of "a" that are between -1 and 1 are eliminated.
The same previous network structure is trained with this dataset, modifying the early stopping "patience" parameter on 3 occasions. This parameter establishes the number of epochs that have to pass without improvement in the chosen measure to stop the training, therefore, when increasing it the number of epochs in training will also.I have created a script that allows you to compare different networks with the validation set and, in this case, provides the following results:
I chose the best network obtained in the previous comparison and performed the evaluation in the same way as before with the test dataset obtaining the following results:
Week 10: Studying the prediction in linear functions with non-recurrent networks
After resuming the work after a few weeks, we have gone into the training of non-recurrent networks that allow us to predict a sample in linear functions.
The first study that was carried out was a very simple perceptron training with the samples generated according to the previously developed code. This first network gave very bad results in terms of both absolute and relative error. To try to improve the prediction network parameters were modified as the number of epochs, layers and neurons in each layer. These modifications kept the bad results so we proceeded to analyze the samples used in training and validation.
To analyze the different dataset a code, available in GitHub, is produced. This script show in a 3D graph the parameters a, b and c used to generate each of the samples. A graph is also shown with the mean value and the standard deviation for each parameter and in each dataset (training, test and validation). With these charts we realized that any slope was being allowed, allowing even the vertical lines with infinite slope, being too general and considerably worsening the training.
To solve the problem, it is enough to limit the slope of the lines that are to be generated. In our case we decided to include in the set those lines whose slope is between -60 and 60 degrees, limiting the set used in the training to a type of straight line.
After correcting the dataset the initial network is trained again obtaining very good results, which confirms the point of failure found in the previous study. By including vertical lines the training is considerably worse since they are including samples very different from the ordinary. It is also proved to reduce the number of times and using a script that compares different networks, available in Git-Hub, the results are analyzed. It can be seen that the best results, with the same network structure, are obtained with a training of 20 epochs.
Week 5: Adapting scripts
This week only make the necessary modifications to the scripts that are already available to work correctly with the new format in which the samples are stored in the data set.
Now in the configuration file it is possible, in addition to everything incorporated the previous week, to indicate if it is desired to make the separation train-test-validation and in what proportion.
Week 4: Sequence generator modification
The new sequence generator is prepared to use a configuration file in YML format where we can clearly and easily specify necessary parameters such as the number of samples, the number of known points of each sample, the separation between the last known sample and the one to be predicted (gap) or if we want to add noise in the generation of the samples.
In case you want to add noise to the sample you must include the necessary parameters for its generation. We started with a Gaussian noise that will be generated as an array with the same number of elements as each sample contains (including the one we want to predict). This elements are random values generated according to a normal distribution with the mean and variance that have been specified in the configuration file. Below is a graphical example of a line contaminated with this type of noise:
In addition, the way of storing the sequences has been modified. Each line corresponds to a sample that has been generated according to the indicated function, which differ in the parameters of the function that are generated randomly for each sample. This sample is divided into two pieces that are stored in the file between "". The first of them indicates the parameters that have been used to generate it: parameters of the function, gap, type of function and noise parameters. The second piece stores the values that the function takes at the known points "x" and the point to predict. The form that obtains the file is the following:
Finally another script has been developed that contains functions to manipulate the data stored in the indicated format, "data_utils.py". It includes a function to read them, another to separate between the known samples and the one to be predicted and a last one to draw the sample that is desired.
Week 3: First predictor
Training a Multilayer Perceptron to predict
For the prediction task I have developed two scripts. The first one performs network training and saves it by providing training and test files. The second one, evaluates the network presenting the test file to the desired network. Both scripts make use of a third one that contains functions for processing the data through the indicated files.
The structure of the network is described by two Keras Dense layers that is simply a layer where each unit or neuron is connected to each neuron in the next layer. The first layer has 8 neurons and the second has only 1 that corresponds to the output of the network. It uses an activation function ReLu, a function of loss Mean Squared Error and an optimizer "Adam". For more details of the structure used, you can consult the tutorial followed in this process. We will try different epochs to obtain the best possible result.
In the test part we only read the file where the data is stored, enter the network and make predictions. Once the predictions are obtained they are compared with the real values and an error vector is obtained. With this vector the percentage of errors within a range is calculated and a bar graph is displayed. In addition, we calculate the average error and show the function and the predicted point in which the maximum error was obtained.
It has started with a neural network that is able to predict a point that follows a linear function. For this, 100 epochs have been necessary, obtaining an average error of 0.027.
Adapting the sequence generator
The sequence generator has been modified to generate a sequence of digits following a linear, quadratic or sinusoidal function. Once the samples are generated they are shaken so that the order is completely random and they are separated into the three necessary sets, training, test and validation, storing them in three independent files.
This new configuration will facilitate the work of training, testing and validation of neural networks that allow us to predict a sample in a moment of time from the previous ones, as well as an easy interpretation of the files when opening them.
To begin with the prediction of points we will use this generator with approximately 50000 samples that will be divided in the form 0.8-0.1-0.1 for training, testing and validation. Each sample consists of the initial 20 points (x = 0: 19) and a future point that is defined by a gap of 10 units (x = 29). Additionally, the parameters necessary to obtain said sequence are stored at the beginning of each array.
All the details of the code used can be found in my Git-Hub.
Week 2: LSTM
LSTMs (Long Short-Term Memory networks) are a type of RNN (Recurrent Neural Network) architecture that addresses the vanishing/exploding gradient problem and allows learning of long-term dependencies. The central idea consists in a memory cell as a interchangeably block which can maintain its state over time. This cell consists of an explicit memory, the cell state vector, and gating units which regulate the information flow into and out of the memory.
In this structure we find two important components that will be responsible for the network learning:
- Cell state vector: This element represents the memory of the LSTM and undergoes changes via forgetting of old memory and addition of new memory.
- Gates: The function of this element is control the flow of information in both directions, to the memory and from the memory. Each gate consists of a sigmoid neural net layer followed by pointwise multiplication operator and are controlled by a concatenation of the output from the previous time step and the current input and optionally the cell state vector. We can find three types of gates depending on the type of information they handle:
- Forget Gate: Controls what information to throw away from memory.
- Input Gate: Controls what new information is added to cell state from current input.
- Output Gate: Conditionally decides what to output from the memory.
Once we know the structure of the main component of this kind of networks we need to know how memory is updated and learning is done in this type of networks. This task is somewhat more complicated and will be interpreted in future sections but the main idea to update the memory consists in aggregate to the cell state vector the old memory via the forget gate and new memory via the input gate.
To implement a first simple example of this kind of networks a tutorial has been followed in which we discover how to develop an LSTM forecast model for a one-step univariate time series forecasting problem. The result of this implementation can be found in my Git-Hub.
Week 1: Sequence generator
The first step to address the problem of learning in temporal sequences has focused on creating a sequence generator that allows us to obtain examples for the training of neural networks. This generator has been programmed using Python with OpenCV and as a result of it we can get two types of sequences: numerical or frames.
The operation of the script is quite simple. First, to create the samples at different instants of time, it is necessary to establish a mathematical expression that allows us to relate the sample to the instant of time. In this case we have chosen the equation of uniform rectilinear motion obtaining at each instant the position that would have an object following the indicated speed.
def get_position(x0, t, u_x): x = x0 + (t * u_x) # URM return x
In order to generate the sequence, the temporal variable is modified obtaining the samples at time t, t + 1, t + 2, ..., t + n, corresponding to the n desired samples. In this case, it will be necessary to create the first 4 samples in a sequential manner and the sample number 10 which is the one to be predicted, creating 4 + 1 samples for a given speed.
Once we have the method to create different samples for one speed, to create several sequences that allow us to train the neural network for the prediction, we will vary the speed in a certain range obtaining for each of them the 4 + 1 samples. In the case of frames, the speed will be an integer, we speak of pixels, and will be limited by the size of the image in the direction we set the motion. For the numbers there is no restraint beyond what we want to impose.
After establishing the operation to obtain the sequences and the samples of each of them it is necessary to store them. For numbers it's quite simple, all we need to do is create a text document that stores each of the sequences as an array.
For the frames the process is somewhat more complicated since we must interpret the numerical information obtained according to the desired frame. The desired frame in this case consists of a background in which we draw a ball that will be the object to move, and a rectangle of occlusion that will remain fixed. The samples obtained correspond to the center where we will draw the ball.
def create_frame(x, size): frame = np.zeros(size) cv2.circle(frame, (int(x), 128), 10, (0, 255, 0), -1) cv2.rectangle(frame, (100, 0), (170, size), (255, 0, 0), -1) return frame ... cv2.imwrite('frames/frame_'+str(u_x)+'_'+str(t)+'.png', create_frame(get_position(10, t, u_x),imSize))