TensorFlow is a platform for machine learning that bundles a variety of modules that provide functions ranging from bitwise operations to implementations of neural networks. It provides a Python API for a C/C++ backend, and has support for hardware acceleration. Some common use cases for TensorFlow are image classification, object detection, text autocomplete suggestions, and optical character recognition. This page looks at how it could be integrated into this project and potential uses for it.
There are several ways to set up TensorFlow. For simple tutorials, Google Colab provides an online Jupyter notebook with Google Drive integration. This is useful for running code and seeing the result on a single nicely formatted page.
TensorFlow also provides Docker images that can be easily downloaded and run. I like the idea of dockerizing everything, and apparently this is the easiest way of enabling GPU acceleration, but it makes the setup and usage of TensorFlow a lot more complicated. Say you want to have TensorFlow in one container, and access it from a python script in another container. After you get your TensorFlow application running, you would create a TensorFlow Servable for it. This could then be served as a web endpoint from a docker container running a TensorFlow Serving image. The other container can now get the results of the model by making a http request with the data. I’m not sure if the overhead in development and processing time is worth it to do it this way.
The easiest way to set it up for this project is to install the TensorFlow Python library in the ROS container with
pip3 install tensorflow. It requires Python 3.5+ so ROS and any packages it uses would need to be updated. I think the updates are almost done for that. There are two versions of the TensorFlow python library, the regular one with all the tools for training models, and TensorFlow Lite, which only has an “interpreter” for models. Most full TensorFlow models can be converted to Lite models. When deploying models to low power devices this should be done so that it only has to load the model and run it.
Like I mentioned earlier, the easiest way to enable GPU acceleration is by using the docker container. But that adds issues if the data is needed elsewhere. Another hardware accelerator is the Coral Edge TPU (Tensor Processing Unit), a small device that plugs into a USB port and has a custom ASIC designed by Google and priced at $60. It consumes only 2 watts of power and can do 4 trillion fixed-point operations per second. That more operations per second than many dedicated GPUs (although GPUs do floating-point operations). Something to note is that the Edge TPU can only run TensorFlow Lite models and can’t be used to accelerate training.
The Edge TPU could be a good way of doing complex image recognition on a low power device like a small driving robot. It is much more efficient for machine learning than doing the calculations on the main CPU. This combined with models optimized for TensorFlow Lite has a lot of potential in my opinion.
TensorFlow uses Keras as its high-level API for building and training models using neural networks. TensorFlow also provides an implementation of this API. The TensorFlow Basics tutorial covers how to do handwritten digit recognition using Keras.
Here is a summary of the tutorial covering slightly different material than they do. To create a basic sequential neural network model with Keras, layers must be defined. The model will take images that are 28x28 pixels, so the first Flatten() layer will take that image and put it in a layer with 28^2 nodes, an input for each pixel. Next, 128 densely connected nodes are added, meaning each node connects to all input nodes. The dropout layer sets any given node’s output to zero 20% of the time while training. This helps prevent overfitting by giving things a stir now and then. The model can be trained using the compile() and fit() methods. x_train contains the images, and y_train contains the corresponding labels. See the appendix for an example of how to get data from this. Neural network models return arrays of weights, with a weight corresponding to each of the possible outcomes. The numpy.argmax(array) function returns the index of the highest value, which is effectively the model’s prediction. Model weights can be saved and loaded with model.save_weights(path) and model.load_weights(path) which is very useful so the model doesn’t need to be trained each time. The models can also be converted to TFLite (TensorFlow Lite) and saved in a file.
One of the great things about TensorFlow is that hundreds of models are published on the TensorFlow Hub. Models are sortable by neural network architecture, problem domain, and TensorFlow Lite compatibility among other things. Models are available for image and video classification, image resolution enhancement, speech recognition, and more.
One of the models available creates segmentation maps and is optimized for mobile devices. Segmentation mapping assigns a classification such as chair, background, etc. to each pixel in an image, resulting in a map of classifications. As an example, they show two people walking down a hallway and the model is able to distinguish them from the background despite many confusing patterns on their clothes and the walls.
The application I was interested in using segmentation maps for is assisting in creating depth maps for the robot. The camera we have currently (the Intel RealSense T265) has stereo fisheye cameras which work great for some things but are fairly low resolution for creating depth maps. Currently the depth map is the primary method of detecting obstacles, so it should be as robust as possible. Depth maps are currently created using OpenCV library, which compares blocks of the stereo images and calculates distance based on the disparity between them. This method works, but often has blotchy and inconsistent results. A segmentation map could be combined with this depth map to give a better idea of where objects are, and then use the depth map to calculate distance to that object.
Running a pre-built model involves downloading the model and opening it with the TensorFlow Lite interprepter, preparing input data, and interpreting output data. After loading the models, the code starts streaming images from the webcam, cropping and resizing them to the height and width expected by the model. Next it sets that image as the input to the model, runs it, and gets the output. The output is an array of pixels each with 21 scores, one for each classification the model can assign. Numpy is used to set the pixel value to the number of the classification with the highest confidence and a color map is created from these. A downside of the pre-built models is they don’t necessarily provide detailed instructions on how to format inputs or even what classifications mean. However there is an example of using the segmentation model on their GitHub here.
More Robotics Applications
There are huge numbers of options for how TensorFlow can be used to help robotics. The obvious options are segmentation maps, object detection, and voice command recognition where pre-built models are available. Creating custom models massively expands the possibilities. Using TensorFlow for pathfinding could be another project but looks fairly difficult. Training data must be collected and prepared and a simulation of the robot might be useful for training. Another option would be to take image and sensor data (from laser range finders for example) and use them to plan which directions are safe to move in. This field/use is often called sensor fusion, taking all the data coming in and creating a useful model from it. One more pre-built model I thought looked interesting is a pose-estimation model that estimates what pose a person is standing in. This could be used to add human interaction such as having the robot respond to gestures such as waves.
I think TensorFlow is a good option for a robot, especially the Coral Edge TPU for power efficiency. Tensorflow has many pre-trained models which as well as new models have the potential to majorly improve the robots perception.