- 1 Project Card
- 2 Progress
- 2.1 Week 23: fixing bugs, datasets parsing
- 2.2 Week 22: starting to extract results, demo video, new user options
- 2.3 Week 21: new trackers and some bugs solved
- 2.4 Week 20: revisiting the previous work on the project report
- 2.5 Week 19: Keras models
- 2.6 Week 18: Offline mode and close
- 2.7 Week 17: Using dl-objectdetector
- 2.8 Week 16: Refinement of synchronism and tracker
- 2.9 Week 15: Camera buffer and tracker updates
- 2.10 Week 14: Net result bug in GUI fixed
- 2.11 Week 13: Circular buffer (first version) and buffer in Cam
- 2.12 Week 12: Tracker fixes and GUI changes
- 2.13 Week 11: First prototype of the delay buffer
- 2.14 Week 10: Continuing with the improvements
- 2.15 Week 9: Mask R-CNN improvements
- 2.16 Week 8: Optimization step
- 2.17 Week 7: Run continuous mode added
- 2.18 Week 6: Starting to build the object segmentator component
- 2.19 Week 5: Mask R-CNN code review and test
- 2.20 Week 4: State of the Art work
- 2.21 Week 3
- 2.22 Week 2: Code review and testing
- 2.23 Week 1: Getting started
Project Name: Object Tracker
Author: Alexandre Rodríguez Rendo [email@example.com]
Academic Year: 2017/2018
Degree: Computer Vision Master (URJC)
GitHub Repositories: 2017-tfm-alexandre-rodriguez
Tags: Deep Learning, object segmentation, object detection, object tracking
Week 23: fixing bugs, datasets parsing
This week some necessary bugs were fixed. First, the log from the Tensorflow networks are now done fine. In the previous versions only the logging from Keras networks were working. Also, the first and last frames from a local video source are now processed and logged. An initial version of the ground truth converters is available at https://github.com/RoboticsURJC-students/2017-tfm-alexandre-rodriguez/blob/develop/dl_objecttracker/groundtruths/convert_gt_to_pascalvoc.py which includes the OTB and NFS datasets partially parsed. This is because the classes from those datasets does not match always with the classes which the neural network has (COCO, VOC datasets...). So, for the moment this class is hardcoded. For the MOT dataset parsing is necessary to include a mechanism of IDs assignation which is in progress. Also in progress is the ROS image source.
Week 22: starting to extract results, demo video, new user options
This last two weeks were mainly used to introduce a way to start to obtain statistics from the application. The results obtained from both the Net (neural network detections) and the Tracker are now logged into .yaml files. The format of the file has the following structure:
- 60 <-- frame number - person <-- class - 0.9 <-- confidence - - 130 <-- left - 148 <-- top - - 170 <-- right - 340 <-- bottom
This log of the results is done online at the end of the application execution. To get statistics the idea is to use the Pascal VOC performance measurements: precision, recall, precision x recall curve and AP (average precision). This repo provides the metrics already calculated https://github.com/rafaelpadilla/Object-Detection-Metrics and you only need to adapt the format of your results and the ground truth to the format used.
Format used in the detection files (your results):
bottle 0.14981 80 1 295 500 bus 0.12601 36 13 404 316 horse 0.12526 430 117 500 307 pottedplant 0.14585 212 78 292 118 tvmonitor 0.070565 388 89 500 196
Format used in the ground truth files:
bottle 6 234 39 128 person 1 156 102 180 person 36 111 162 305 person 91 42 247 458
The results obtained in the .yaml files are easily converted to the required format using a bash file (+python script) made explicitly for the offline format conversion https://github.com/RoboticsURJC-students/2017-tfm-alexandre-rodriguez/tree/develop/dl_objecttracker/detections. The repo also provides other formats to use but I found that the more readable and easy to implement on my application. In terms of datasets, some reserch was done to find out which are the state of the art datasets in multiobject tracking. Some of the most significant are MOT (https://motchallenge.net/), VOT (http://votchallenge.net/), OTB, PETS (http://www.cvg.reading.ac.uk/PETS2009/) or NFS (http://ci2cv.net/nfs/index.html). With https://github.com/jvlmdr/trackdat you can download easily most of the previous datasets and many more. The next step is to convert the format of the ground truth in the datasets used to the one used in https://github.com/rafaelpadilla/Object-Detection-Metrics#create-the-ground-truth-files to obtain the statistics.
Also, the objecttracker.yml was modified (and the code too) to allow the user to choose between OpenCV or dlib tracking. And, if using OpenCVs tracking the option to select which tracker to use from KCF, BOOSTING, MIL, TLD, MEDIANFLOW, CSRT and MOSSE (default options are OpenCV tracking and MOSSE).
The next video provides an idea of the current state of the dl-objecttracker:
-Neural network detections: mask_rcnn_inception_v2_coco_2018_01_28
-Tracking: MOSSE OpenCV tracker
With respect to the different sources the application includes local video (as before) and live video from OpenCV local camera. I am working on introducing video using ROS.
Week 21: new trackers and some bugs solved
This week was dedicated to the introduction of the MOSSE and CSRT trackers included in recent OpenCV versions, also the dlib tracker was tested. To use the new OpenCV trackers I installed the last OpenCV version from source available at the moment (4.0.1) to work along with the jderobot environment. Both MOSSE and CSRT perform better in accuracy than the previous trackers tested. MOSSE is extremely fast but not as accurated as CSRT. The test of the dlib trackers was found very positive too in terms of accuracy but the speed seems to slow down with a increasing number of objects to track. For the moment, the chosen tracker is MOSSE.
Apart from that some bugs found with the bounding boxes coordinates used were fixed. The GUI off mode was improved too with the tagging of the image according to the frame number. When using a local video the last frames of the video in the buffer were not being processed, so this was solved.
The next steps include a way to extract some statistics or performance measurements from the application: IoU (intersection over union) of detection and tracking in datasets, speeds in FPS... with the different configurations.
Week 20: revisiting the previous work on the project report
The work done in Week 4 was reviewed to check the possible updates needed according to the State of the Art. I also had a look over the structure of the final report to continue its writing in parallel with the last changes in the application.
The MOSSE and CSRT tracker could not be tested inside the application due to the actual version available in JdeRobot but a tracker using dlib library is going to be reviewed and hopefully tested (https://www.pyimagesearch.com/2018/10/29/multi-object-tracking-with-dlib/). dlib is an opensource library containing machine learning algorithms and tools which is commonly used both on industry and academia.
Week 19: Keras models
This week the Keras network models were finally introduced into the application. The supported models include SSD_300x300 and SSD_512x512 architectures. The next step is going to be the test of new trackers.
Week 18: Offline mode and close
Last week I solved some small pending tasks. I have refactorized the offline mode to adapt it to the new architecture of the Network side. This mode was working at the very beginning of the project to allow the user to have an option of running the program without GUI. This way, you can save the results of the application as .jpg images. Apart from that, now the GUI can be fully closed by clicking the Close button. In the previous versions, the program was still running in background and it was necessary to close it in the terminal.
The last version is available at https://github.com/RoboticsURJC-students/2017-tfm-alexandre-rodriguez/tree/develop/dl_objecttracker. The component was renamed from dl_objectsegmentator to dl_objecttracker.
Week 17: Using dl-objectdetector
With the previous stable version of the tracker working fine, I started to work on introducing a bigger number of neural networks for the object detection using dl-objectdetector from JdeRobot (https://github.com/JdeRobot/dl-objectdetector). For that purpose, I tested the component using both Keras and Tensorflow models. After that I introduced the component in the dl-objecttracker. At the moment, it only uses Tensorflow models but a Keras version will be ready soon. I tested some models from Tensorflow detection model zoo as SSD or Mask R-CNN (https://github.com/tensorflow/models/blob/master/research/object_detection/).
Apart from that, new sources are going to be added to feed the program: local camera, local video and stream (ROS/ICE). By now, the program is tested with the local video because there are still some bugs in the others that need to be fixed.
Week 16: Refinement of synchronism and tracker
These last weeks I have been working on having a stable tracker which allows the application a better synchronization between the different branches (Camera, Net, GUI) and the tracking itself. For that purpose, the internal logic of the tracker was modified, allowing the tracking to work in a more flexible way with the buffer which takes as input. This gives as result a tracker that has 3 modes: slow, normal and fast (depending on the FPS average rate of tracking of the previous frames). For example, if the tracking is running slow, a number of frames in the buffer are skipped to avoid the buffer to grow more than expected. And, if the tracking is running fast, the tracker slows down to prevent that the tracking finishes before the neural network gives a result.
Related to the multiobject tracking problem (the tracking of multiple objects affects the FPS rate, slowing it) I looked for new trackers following these posts (LearnOpenCV: https://www.learnopencv.com/multitracker-multiple-object-tracking-using-opencv-c-python/ pyimagesearch: https://www.pyimagesearch.com/2018/07/30/opencv-object-tracking/). I am actually working with TLD tracker but it has some problems with false positives. However, it is the best option available for my purpose in the version 3.3.1 of OpenCV (included in JdeRobot). One of the next steps is to test the MOSSE and CSRT trackers that are available in more recent OpenCV versions and look promising, specially MOSSE due to the speed requisites.
At the same time, I am going to include the JdeRobot object detector (https://github.com/JdeRobot/dl-objectdetector) to make use of other networks apart from the actual Mask R-CNN.
Week 15: Camera buffer and tracker updates
Now, the GUI uses the images coming from the Camera buffer directly instead of using its own buffer. The tracker has a mechanism to avoid that the tracking process to be much faster than the neural network detections.
Week 14: Net result bug in GUI fixed
The GUI in the neural network result was showing the image segmented from the Mask R-CNN along with a bounding box from the tracker sometimes, so this was fixed. The next necessary improvement is to move the buffer to Cam completely without having it in the GUI branch too.
Other future tasks include the incorporation of the DetectionSuite (https://github.com/JdeRobot/dl-DetectionSuite) and the ObjectDetector (https://github.com/JdeRobot/dl-objectdetector) components from JdeRobot.
Week 13: Circular buffer (first version) and buffer in Cam
Once the first version of the buffer with delay is running the next necessary step is to implement a circular buffer to avoid the increase in size of this buffer. But first, the buffer with delay (and also the instructions which control the different branches) was moved to the Cam branch too, to allow the application run without GUI. At the moment, this GUI-off option saves the results that were displayed in the 'Combined' window in .jpg files. Now, to execute the main program you need to type in the terminal (in the case you do not want the GUI to start / in the case you want the GUI -> off / on):
python2 objectsegmentator.py objectsegmentator.yml off
The circular buffer is done to control the buffer size, which tends to increase due to the tracker changes in speed (FPS). For this reason, the first version of this tracker needs to handle two main situations: tracker fast and tracker slow (measured in FPS rate). In the first case, the old frames in the buffer are discarded for the next tracking. In the second case, some frames are skipped. With this changes, the tracking and the segmentation are closer to the real frames captured by the camera and, as I said before, the buffer does not increases its size without control.
But, as usually happens, this first version has some bugs that need to be fixed. For example, sometimes old frames are still 'alive' and it takes some time to the program to update the processing with the last frames in the buffer.
This buffer upgrade is not available without GUI for the moment.
Week 12: Tracker fixes and GUI changes
So, this new week I started by solving some problems that the tracker had which affected the flow of the application (some more still need to be fixed yet). Also, I had a look on different possible types of tracker implemented in OpenCV (at the moment I am using the TLD). For this purpose I used the website LearnOpenCV of Satya Mallik (https://www.learnopencv.com/object-tracking-using-opencv-cpp-python/) and the OpenCV documentation. I tested all the rest of the trackers mentioned in the post (including the GOTURN which uses deep learning with an offline trained model) but I found that the actual is the better for this purpose.
Furthermore, some bugs were fixed related to the buttons in the GUI and its behavior in the new buffer architecture (as the tracker some more still need to be fixed I guess). I included a new GUI setup with 4 images: the live input video, the combined result from tracker and neural network and the separated results of that two. Now, the images are tagged with the frame number for a better understanding of the application (and some debugging too).
The next image available at the link shows the actual state of the application:
Week 11: First prototype of the delay buffer
This week I implemented a first prototype of the delay buffer fixing some bugs of the previous version but it still has some little failures pending to be solved. On the other hand, the Docker container has not been configured yet to allow graphic sessions so the works in it are paused at the moment waiting for that. The next steps are going to include to move the buffer to the Cam branch, a circular buffer and an improved visualization (some GUI changes).
Week 10: Continuing with the improvements
These last 3 weeks I have been working in two different type of approaches to improve the behaviour of the application. On the one hand, I built a first prototype of the application with a buffer with delay. This buffer allows to show all the detections and segmentations in the GUI with a delay given by the length of the buffer in each moment but this way all frames injected to the Net are the last frames (this did not happen before). The buffer working at the moment has some little bugs that need to be fixed. Furthermore, another proposal was to do a double buffer technique but it is not implemented yet.
In the other hand, I had access to a GPU in a Docker container (thanks to Francisco Rivas) working with CUDA where all the necessary packages were installed (Tensorflow, Keras, ...) and I have launched the program without errors. The performance was not measured yet because some features need to be installed in the Docker container. The program will capture the video here from a recorded video instead of the webcam because there is not a real camera in the hardware used (using the cameraserver).
Some issues were fixed to allow the program to download the COCO trained weights of the model of the Mask R-CNN in the case they were not already downloaded.
Week 9: Mask R-CNN improvements
This week I tested the Mask R-CNN with different image sizes to reduce the execution time of a segmentation. The minimum size allowed by the net that obtains results and mantains the original aspect ratio is 540x404. The execution time ranges now between 23 and 25 seconds (using CPU).
I also improved the architecture of the program with 4 branches running independently: Camera, GUI, Net and Tracker. The temporary tracker uses a multitracker implemented in OpenCV (https://github.com/opencv/opencv_contrib/blob/master/modules/tracking/samples/multitracker.py). The continuous mode works at the moment with the detections (and segmentations) given by the Mask R-CNN (after a considerably amount of time) followed by the tracking-by-detection. In future improvements I expect to use a GPU to accelerate the detections.
In this video, the current functionality of the component can be seen:
Week 8: Optimization step
Once the Mask R-CNN is running the next necessary step is to optimize it with the objective of being able to perform activities in real-time conditions. For this purpose, we thought about two possible solutions.
The first one was to use GPU support, which could reduce the time of execution considerably. I tried to use the GPU that Google provides for free in Google Colab (https://colab.research.google.com/) but I've been having some problems to install the necessary dependencies of the project so I will try it in the GPU available at the JdeRobot lab.
The second improvement could be given with the incorporation of a feature-tracking algorithm in the application, this one has not been implemented yet. Furthermore I measured the execution times of the application with different input images. I watched the performance with one or more objects and the performance with a bigger area to be segmented or a smaller one. The conclusion is that the influence of this parameters is not really important in the final execution time. After that I tried with smaller input images but the network model seems to have problems with some image sizes.
Week 7: Run continuous mode added
This week I added the 'Run continuous' mode to the application which allows the user to segment the video stream from the webcam in 'real-time'. This real-time is obviously conditioned by the time that needs the computer to process the image using the Mask R-CNN, which in my case takes about 25 seconds (using CPU). The following image shows one of the results achieved with this new implemented mode:
Week 6: Starting to build the object segmentator component
With the objective of building a visual memory for a robot the first required step is to develop an object-segmentator component running in real-time with a video stream. It will be build with a structure of 3 branches: Camera, GUI and Net. The first approach will be an application working using the camera server component video with two toggle buttons that allow the user to choose between passing to the net a single frame or a continuous sequence of frames from the camera. For this purpose, I re-use parts of code already done from https://github.com/JdeRobot/dl-objectdetector, thanks to Nacho.
This week I put the Mask R-CNN model working in real time with a single frame from the camera. By now the net has recognized objects like 'person', 'cell-phone', 'bed', 'apple' and more without problems. The following video shows the application working (this is an early implementation, the Net is runned over a laptop without GPU and with a poor camera so the results could be improved).
Week 5: Mask R-CNN code review and test
The actual task is to understand the Mask R-CNN code implementation available at https://github.com/matterport/Mask_RCNN and test it on my personal laptop. To do so, I started by running a demo example of using a pre-trained model on MS COCO to segment objects in your own images (https://github.com/matterport/Mask_RCNN/blob/master/demo.ipynb). The first step to run the demo is to clone the Mask R-CNN repository mentioned before. Furthermore, this demo has the following main requisites of installation: pycocotools and Keras with Tensorflow backend. To install pycocotools (with Python 3) you need to type on terminal this instructions:
git clone https://github.com/pdollar/coco.git cd coco/PythonAPI make sudo make install sudo python3 setup.py install
and then append your COCO' local folder to the system path (example):
The next image shows one of the results that can be achieved when you run this demo on test images (from the folder 'images' of the Mask R-CNN repository):
Week 4: State of the Art work
This Christmas holidays I wrote an small part of the TFM report which includes the sections of introduction and State of the Art / Related works. In this document I talk about robotics in Computer Vision, neural networks, methods of tracking, detection and instance segmentation like Mask R-CNN, among other topics. The document is available at my GitHub repository https://github.com/RoboticsURJC-students/2017-tfm-alexandre-rodriguez/blob/master/latex/Estado_del_Arte_INVESTIGACION_Alexandre_Rodriguez_Rendo.pdf
This week I ran David's code once I had the Keras model to feed the classifier as it can be seen in the following video:
After that I studied the code and I saw the implementation done inside the digit classifier. This project follows the next design from a high level and a low level point of view (images are from David's project):
Furthermore I finished the testing of Marcos' project and I launched it successfully. The next video shows the tracking performance over a set of frames from MOT16-04:
This project uses a hybrid tracking approach with a neural network-based tracking and a feature-based tracking. The first one gives the system better detections but it is not able to work in real-time, so it returns detections every 30 frames. Meanwhile, the feature tracking component, which is able to work in real-time, computes the tracking between the frames. The process is shown in the next figure (image from Marcos' project):
I also read an article with a recent object detection and segmentation technique called Mask R-CNN. The paper is available at https://arxiv.org/pdf/1703.06870.pdf and also a Python implementation is available at https://github.com/matterport/Mask_RCNN. This framework efficiently detects objects in images and it generates high-quality segmentation masks in the instances. It can be easy generalized to achieve different tasks as person keypoint detections. This method extends Faster R-CNN and adds a branch for predicting an object mask in parallel with the branch for bounding box recognition. This way, it decouples mask and class prediction which allows better performance. They introduce a layer called RoIAlign to fix the pixel-to-pixel misalignment between network inputs and outputs of the Faster R-CNN and this way they preserve spatial locations.
Mask R-CNN outperforms existing state-of-art techniques on COCO suite of challenges (http://cocodataset.org/#home). And it gives better results in instance segmentation tasks on Cityscapes dataset (https://www.cityscapes-dataset.com/) too. The next images show some of the commented results in object segmentation (left) and keypoints detection for human pose (right):
Week 2: Code review and testing
This second week, the proposed task was to execute and study the code of the Final Master Project of Marcos Pieras and the Final Grade Project of David Pascual. The code is available at their GitHub repositories https://github.com/RoboticsURJC-students/2015-TFM-Marcos-Pieras/tree/master/TrackingComponent (Marcos) and https://github.com/RoboticsURJC-students/2016-tfg-david-pascual (David).
Once you have downloaded the repositories, you need to open a new terminal and type:
python digitclassifier.py digitclassifier.cfg
But, as expected, it is not so easy. First, I had to install all the necessary dependencies for each project which included Keras with Tensorflow and Theano backends following the installation process available at https://keras.io/#installation. After that I setup OpenCV and JdeRobot tools to work properly together.
To use Marcos' project you need to download also the content included in https://github.com/balancap/SSD-Tensorflow which allows you to use the SSD VGG 300 net and other tools used in the project. Besides, you need to have the dataset to test the detection project, for example the MOT16-14 dataset (https://motchallenge.net/data/MOT16/#download), and the checkpoint used in the project (https://drive.google.com/file/d/0B0qPCUZ-3YwWT1RCLVZNN3RTVEU/view). I had some problems loading the checkpoint but I solved it using the information provided in https://github.com/tensorflow/tensorflow/issues/2999.
After the configuration made previously, David's project should work fine but the Keras model used net_4conv_patience5.h5 was not available at his repository so I asked him personally and he is going to update the repository soon.
I could not finish the complete task yet but I hope to do it soon. Also to appreciate the friendly help of Marcos and David in this process :)
Week 1: Getting started
In this first week, I started installing the JdeRobot environment on my laptop following the steps provided in http://jderobot.org/Installation#From_Debian_packages. After that, I tested that my installation was working fine by playing with some examples provided in the Documentation section. For example, the OpenCV demo (http://jderobot.org/index.php/Examples#Cameraserver_.2B_Opencvdemo). If you want to use this example, you have to open two terminals and type on each one the following lines respectively:
Also, I have read some of the previous work from colleagues as the Final Grade Projects of Nuria Oyaga ("Análisis de Aprendizaje Profundo con la plataforma Caffe") and David Pascual ("Study of Convolutional Neural Networks using Keras Framework") and the Final Master Project of Marcos Pieras ("Visual people tracking with deep learning detection and feature tracking"). These works gave me an introduction to the Deep Learning basic concepts. Besides that, they show some of the State of the Art in detection, tracking and classification using Deep Learning techniques.