Deep Learning for Real-time Human Activity Recognition on Mobile Phones
Cait Crawford & Jorge Ortiz, IBM Research
Mark Nutter, ARM Research
In this talk we present a deep learning based technique for human activity classification that runs in real time on mobile devices. Our technique minimizes the size of the model and computational overhead in order to run on the embedded processor and preserve battery life. Prior work shows that the inertial measurement unit (IMU) data from waist-mounted mobile phones can be used to develop accurate classification models for various human activities such as walking, running, stair-climbing, etc. However, these models have largely been based on hand crafted feature derived from temporal and spectral statistics. More recently, deep learning has been applied to IMU sensor data, but have not been optimized for resource-constrained devices. We present a detailed study of the traditional hand-crafted features used for shallow/statistical models. These consist of a over 561 manually chosen set of dimensions. We show, through principal component analysis (PCA), that this can be significantly reduced -- less than 100 features that give the same performance. In addition, we show that features derived from frequency-domain transformations do not contribute to the accuracy of these models.
Finally, we present our learning technique which creates 2D signal images from windowed samples of IMU data. Our pipeline includes a convolutional neural network (CNN) with several layers (1 convolutional layer and 1 averaging layer and a fully connected layer). We show that by removing the steps in the pipeline and layers in the CNN, we can still achieve over 0.94 F1 score but with a much smaller memory footprint and computational cost. To increase the classification accuracy of our pipeline we added a hybrid bi-class support vector machine (SVM) that was trained using the labeled and flattened convolutional layer after each training image was processed. The learned feature set is half the size of the original hand crafted feature set and combining the CNN with the SVM results in 0.98 F1 score. Finally, we investigate a novel application of transfer learning by using the time series 2D signal images to re-train two different publicly available networks, Inception/ImageNet and MobileNet. We find that re-trained networks could be created < 5.5MB (suitable for mobile phones) with classification accuracy over 0.93 F1; thus indicating that retraining can be a useful future direction to build new classifiers for continuously evolving activities quickly while also being applicable to mobile device classification
Catherine H. Crawford, PhD Distinguished Engineer IBM Research, graduated from MIT with a SBME and Princeton University with an MSE and PhD in Mechanical and Aerospace Engineering. Her PhD research was in the area of direct numerical simulation of turbulence in complex geometries and dynamical system simulation. Since then, she has spent over 20 years at IBM Research and has published and patented in areas as diverse as numerical simulation on parallel architectures, computer system performance analysis and modeling, high performance computing systems, embedded systems, and mobile devices. Her work has earned her corporate awards, a selection to the Mass High Tech Women to Watch and even a reference in the congressional record for her foundational software work on hybrid systems which ultimately led to the world’s first petaflop computer, Roadrunner. Her current work focuses on machine learning problems, including computer vision, leveraging mobile and embedded systems, e.g. Edge Analytics and Computing, while addressing problems on dimensionality reduction, transfer learning and continuous learning.