I Introduction
Overview:
Autonomous Driving (AD) aka selfdriving vehicles are cars or trucks in which human drivers are never required to take control to safely operate the vehicle. They combine sensors and software to control, navigate, and drive the vehicle. Though still in its infancy, selfdriving technology is becoming increasingly common and could radically transform our transportation system economy and society. Many thousands of people die in motor vehicle crashes every year in the United States (more than 30,000 in 2015); selfdriving vehicles could, hypothetically, reduce that number—software could prove to be less errorprone than humans. Based on automaker and technology company estimates, level 4 (the car is fullyautonomous in some driving scenarios, though not all) selfdriving cars could be for sale in the next several years
[1].Motivation: Currently, safety is an overarching concern in AD technology, that is preventing its full deployment in real world. The question to answer is whether the AD system can handle potentially dangerous and anomalous driving situations. Before AD systems can be fully deployed, they need to know how to handle such scenarios, which in turn calls for heavy training of an AD system on such scenarios. The challenge lies in that, these scenarios are very rare. They constitute the ‘long tail of rare events’ (on a Gaussian considering all events) and comprise less than
of all events in a given driving dataset. Hence Artificial Intelligence (AI) technology need to be used to both mine these “gems” in a given dataset and then to train the AD system to handle such ‘special’ situations. Detecting such anomalous driving scenarios that is crucial to building failsafe AD systems offers two advantages—both offline and online. In the former, given a dataset, the identified anomalous driving scenarios can be used to train an AD system to better handle such scenarios. This can be achieved, for example, via weighted training—give more weight to learning anomalous scenarios than normal scenarios. For online purposes, detecting anomalous driving scenarios ahead of time can help prevent accidents in some cases, by taking a corrective action so as to steer the system in a safe direction (e.g., apply appropriate control signals or if possible, handing over the control to a human driver). We specifically consider only the Controller Area Network (CAN) bus sensor data such as pedal pressure, steer angle, etc. (multimodal timeseries data) due to its simplicity, while still providing valuable (though not complete) information about the driving profile; augmenting with video data will be our future work. We consider any unusual pattern (such as abruptness, rarity, etc.) among different modalities as a sign of an anomaly. Such a pattern could have happened due to unusual reaction of driver on the pedal, accelerator, steering wheel, etc. which in turn implies the driver has gone through a challenging (anomalous) driving situation. Though modelbased (rulebased) approaches can be used to detect anomalies in multimodal timeseries data, they are good only for simple cases such as threshold based anomaly detection (speed/deceleration greater than a threshold, etc.). It is difficult as well as tedious to compose rules for complex and even unknown (apriori) situations. On the other hand, datadriven approaches, can learn representations directly from the data, and use them to detect anomalies. This gives them the ability to detect complex and unknown anomalies directly from data. Though, the performance of datadriven approaches is only as good as the data, this limitation can be addressed to some extent, when large amount of data is considered for training. In datadriven approaches, deeplearning based approaches (as opposed to classical machine learning techniques) are especially interesting due to their ability to learn the features on their own without the need for expert domainexpertise. Existing deep learning approaches for anomaly detection in multimodal time series data include reconstructionerror based approaches such as Long Short Term Memory (LSTM) autoencoders. These approaches do not perform well in the case where multiple “normal” situations (multiple positive classes) exist with class imbalance problem. In case of driving data, these classes correspond to rightturn, goingstraight, uturn, etc. where the data for uturn is far lesser than the data for goingstraight. There is a higher chance that the classifier in these approaches overfits smaller (lessfrequent) classes resulting in poor performance. Since reconstruction error is used as a measure of anomaly by these approaches, they classify the less frequent normal classes (e.g., uturn) also as anomalous, further degrading the performance.
Our Approach: We make the observation that, while reconstructionerror based approaches perform poorly with rare but nonanomalous events, their performance can be greatly improved with the help of simple domainknowledge (availability of maneuver labels in our case). Leveraging these maneuver labels, we add a symbol predictor to the autoencoder system (creating a multitask learning system) which acts as a regularizer to the autoencoder, thereby achieving better performance than a standalone autoencoder system. The proposed deep multitask learning based anomaly detection system is shown in Fig. 1. The two tasks in the proposed approach are a convolutional bidirectional LSTM (BiLSTM) based autoencoder and a convolutional BiLSTM based sequencetosequence (seq2seq) symbol predictor (in contrast to simple LSTM predictor that predicts raw sensor data rather than symbols). In the seq2seq predictor, the predicted symbols/labels correspond to automobile’s next series of maneuvers (e.g., goingstraight, leftturn, etc.). These labels are obtained from manually annotated driving data. We show that the proposed multitask learning approach performs better than existing deep learning based anomaly detection approaches such as LSTM autoencoder and LSTM predictor as one task acts as a regularizer for the other. In addition to reconstructing the input data (via autoencoder), the network is also constrained to predict the next series of maneuvers (via symbol predictor) and as such the chance of overfitting is reduced. Such a regularizer system also helps solve the problem of overfitting to smaller class mentioned above. Secondly, the proposed multitask learning approach leverages these maneuver labels to define a custom anomaly metric (rather than simple reconstruction error) that weighs down detection of rare but nonanomalous patterns such as uturns as anomalies. The approach has been tested on 150 hours of raw driving data [2] in and around Mountain View, California, USA and is shown to perform better than stateoftheart approach, LSTMbased autoencoder [3].
Our contributions can be summarized as follows.

We propose a novel multitask learning (convolutional BiLSTM autoencoder and symbol predictor) approach for detecting anomalous driving with multiple “normal” classes and a class imbalance problem. Our approach leverages simple domainknowledge (manuever labels) to build a regularizer system that reduces overfitting and enhances overall reconstruction performance.

We propose an anomaly scoring metric that leverages such maneuver labels and reduces the cases where rare, but nonanomalous, events are classified as anomalies.

We evaluated our approach both quantitatively and qualitatively on 150 hours of real driving data and compare it with stateoftheart LSTM autoencoder and multiclass LSTM autoencoder approaches to show its advantages over them.
Ii Related Work
In this section, we describe important related work in the domain of anomaly detection for multimodal/multivariate time series data. Anomaly detection is generally an unsupervised machine learning (ML) technique due to lack of sufficient examples for the anomalous class. Within unsupervised learning, it can be broadly classified into the following categories—contextual anomaly detection, ensemble based methods and finally deep learning approaches. These methods internally use statistical/regression based approaches, dimensionality reduction, distribution based approaches. In statistical approaches, features are generally handmade from the data such as mean, variance, entropy, energy, etc. Certain statistical tests/formal rule checking actions are performed on these features to determine if the data is anomalous. In dimensionality reduction, the data is projected onto a lowdimensional representation (such as princial components in Principal Component Analysis (PCA)). The idea is that, this lowdimensional representation captures the most important features of the input data. Then clustering techniques such as kmeans or Gaussian Mixture Models (GMMs) are used to cluster these lowdimensional features to identify anomalies. In distributionbased approaches, the training data is fit to a distribution (such as multivariate gaussian distribution or a mixture of them). Then given a test point, distance is calculated of this test point from the fitted distribution (e.g., using Mahalanobis distance) that represents the measure of anomaly.
Contextual anomaly detection: An anomaly may not be considered anomaly when the context under which it happens is wellknown. For example, the CANbus sensor data of a car may look anomalous when the car is taking a Uturn, which is not considered an anomaly. This is also called seasonal anomaly detection in other domains such as building energy consumption, retail sales, etc. Hayes et al. [4] and Capozzoli et al. [5] present a twostep approach for contextual anomaly detection. In the former, in step 1, only the sensor’s past data is used to identify anomalous behavior. For this it uses univariate Gaussian function. Later in step 2, if the output of step 1 is found to be anomalous, then it passes to step 2 to check if it is contextually anomalous or not. Twitter [6] recently published a seasonal anomaly detection framework based on Seasonal Hybrid Extreme Studentized Deviate test (SHESD). Netflix [7] too recently released an approach for anomaly detection in big data using Robust PCA (RPCA). Even though Netflix’s approach seemed successful, their statistical approach relies on high dimensionality datasets to compute a low rank approximation which limits its applicability. Finally, Toledano et al. [8] propose a bank filter and fast autocorrelation based approach to detect anomalies in large scale timeseries data considering seasonal variations.
Ensemble based methods: In ensemble learning, different models are trained on the same data (or random sets of samples from the original data) and a majority voting (or another fusion technique) is used to decide the final output. Another advantage of ensemble learning is that the member models are chosen such that they are complementary to each other in terms of their strengths/weaknesses, i.e., the weaknesses of one are compensated by the strengths of the other. For example, Araya et al. [9]
, proposed an ensemble based collective and contextual anomaly detection framework. The ensemble consisted of pattern recognition algorithms such as Autoencoder and PCA, as well as prediction based anomaly detectors such as Support Vector Regression (SVR) and Random Forest. They showed that the ensemble classifier is able to perform well compared to the base classifiers.
Deep learning methods: In deep learning techniques, the features are generally learned by the classifier itself, so there is no need to handengineer these features. The techniques within this can be broadly classified into two categories: (i) Representation learning for reconstruction: Here the input data is mapped to a latent space (generally lower dimension than input data) using an encoder and then the latent space is remapped to input space using a decoder. The latent space captures a representation of the input data similar to PCA. The reconstruction error at the end of this process is a measure of anomaly. Autoencoders are prime examples in this category. For example, Malhotra et al. [3] present an LSTM based encoderdecoder approach for multisensor timeseries anomaly detection. The approach has been tested on multiple datasets including power demand, space shuttle valve, medical cardiac data and a proprietary engine data and showed promising results. (ii) Predictive modeling: Here the current/future data is predicted from past data using LSTM modules that capture long term temporal behavior. The prediction error is a measure of anomaly. LSTM sequence predictors are examples in this category. For example, Taylor et al. [10] proposed an LSTM predictor based anomaly detection framework for automobiles based on Controller Area Network (CAN) bus data of an automobile similar to ours. Hallac et al. [11] present an embedding approach for driving data called Drive2vec which can be used to encode the identity of the driver. However this approach only complements ours, as our approach can work both with raw data as well as embedded data. Malhotra et al. [12] proposed an LSTM based predictor for anomaly detection in time series data that is shown to perform well on four kinds of datasets mentioned above.
In contrast to these approaches, we propose a multitask deep learning based approach, that overcomes the shortcomings on (i) and (ii) by— incorporating a builtin regularizer (as one task acts as regularizer for the other) and leveraging domain knowledge (such that rare but nonanomalous maneuvers such as Uturns are not classified as anomalies).
Iii Proposed Solution
In this section, we first explain the LSTM autoencoder (reconstructionerror) based approach [3] which is currently the best performing (unsupervised) anomaly detection framework for multimodal timeseries data. We then present our semisupervised approach for anomaly detection in driving data which leverages the maneuver labels to improve the performance. Anomaly detection using unsupervised learning consists of two steps. In step 1, the system is trained with several normal examples to learn representations of the input data e.g., GMM clustering. Because we are dealing with temporal data, a sliding window approach needs to be adopted to learn these representations. In step 2, given a test data point, we define an anomaly score based on the learned representations, e.g., distance from the mean of the cluster.
Iiia LSTM Autoencoder (Existing Approach)
Fig. 2 shows the LSTM autoencoder highlevel architecture. Input time series data, , of size (corresponding to one window of data segmented from full data) is fed to the encoders which consist of LSTM cells. Each LSTM cell encodes its input and the cell state from previous cell into its own cell state, which is passed onto the next LSTM cell. Finally, the cell state of the last LSTM cell has the encoded representation—which we call embedding—of all the input data . The size of this embedding is equal to the number of units (also called hidden size) in the last LSTM cell. The decoders similarly consist of a series of LSTM cells, however the input to these decoders is given as zero as the goal is to regenerate the input data. Another approach of feeding the output of the previous cell () as input to the next cell is also possible. The first decoder LSTM cell takes the embedding as one of the inputs (the other input being zero) and passes on its cell state to the next decoder cell. The process is repeated for time steps. During each step, the LSTM cell generates an output , finally resulting in after steps. The network is trained by minimizing the difference,
using stochastic gradient descent and backpropagation. After sufficient training, the network is able to learn good representations of the input data stored in its embedding, which completes step 1. The network is then able to reconstruct new data very well i.e., with lower reconstruction error, as long as it has seen similar pattern data during training. However, when the network is fed with data that has completely different pattern than is used during training, there will be a large reconstruction error.
Though reconstruction error can directly be used as a measure of anomaly for step 2, better results can be achieved, with further processing. The method currently adopted [3] is shown in Fig. 3. After the network is trained, the train data is again fed to the trained network to capture the reconstruction errors. These errors are then fit to a multivariate gaussian distribution as shown in Fig. 3. Given a test data point, the reconstruction error is first calculated using the trained model. Mahalanobis distance of the error is then calculated with respect to the fitted gaussian model using the formula shown in Fig. 3. These distances, which are considered anomaly scores are then sorted in decreasing order and analyzed as per requirements e.g., analyze top 0.01%.
IiiB Multitask Learning (Proposed Approach)
As mentioned earlier, fully unsupervised reconstruction error based approaches such as LSTM autoencoder fare poorly when there are rare occurring positive (nonanomalous) classes in the data. For such relatively rare cases, the network is unable to learn representations, thereby producing large reconstruction error. We solve this problem, by designing a semisupervised multitask learning framework that leverages driving maneuver labels as shown in Fig. 1. Here task A is the autoencoder, while task B is a symbol/maneuver predictor. Task B acts as a regularizer to the autoencoder as the overall network is also constrained to predict the next series of maneuvers apart from reconstructing the input data. For this to be possible, better representations need to learned by the network that can help in both reconstruction and prediction. This combined (multitask) system performs better reconstruction than a standalone autoencoder. Similarly, autoencoder system (task A) acts as a regularizer for symbol predictor (task B). This is because autoencoder helps in learning good representations of the input data. These representations can then be used by the symbol predictor to predict next series of maneuvers. Hence, in a similar way as mentioned above, the combined system produces a better symbol predictor than a standalone symbol predictor. Both these are possible as both tasks mutually help each other. Our approach is semisupervised as we make use of maneuver labels to design a regularizer in Task B, but is not supervised as we do not have anomaly and nonanomaly labels. We will now explain the encoder and the decoders of both tasks in detail.
Convolutional BiLSTM Encoder: The basic encoder in an LSTM autoencoder (Fig. 2) does not perform sufficiently well as it not does take into account: (i) inter channel correlations (ii) directionality of data. We design an encoder that addresses these issues as shown in Fig. 4. It consists of a series of 1dimensional (1D) convolutional layers followed by bidirectional LSTM layers. The convolutional layers help in capturing interchannel spatial correlations, while the LSTM layers help in capturing inter and intrachannel temporal correlations. Unidirectional LSTM layers capture temporal patterns only in onedirection, while the data might exhibit interesting patterns in both directions. Hence to capture these patterns, we have a second set of LSTM cells for which the data is fed in the reverse order. Further, we have multiple layers of these bidirectional LSTM (biLSTM) layers to extract more hierarchical information. All the data that has been processed through multiple convolutional and biLSTM layers is available in the cell states of final LSTM cells. This is the output of the encoder which will be fed as input to the decoder tasks.
Decoder (Autoencoder, Task A): The decoder in autoencoder (task A) performs encoder operations in reverse order so as to reconstruct the input data (Fig. 5). It first consists of biLSTM layers which take the final cell states from encoder as one of the inputs (the other input being zero). As mentioned in Sect. IIIA, the other input (other than the previous cell state) can be either zero or the output of the previous LSTM cell. The outputs of LSTM layers are fed as input to a series of 1D deconvolutional layers which perform reverse of convolution (transposed convolution) to generate data with same shape as that of input data to encoder.
Decoder (Predictor, Task B): The decoder of the symbol predictor (task B) is shown in Fig. 6
. It takes only forward cell states from encoder as it has only unidirectional LSTM layers. It adopts a greedy decoder, where the most probable symbol output of the previous LSTM cell is fed as input to the next LSTM cell. The first LSTM cell takes a special symbol
Training (Step 1)
: The loss function for Task A is the Mean Square Error (MSE) between the input data to encoder and the output of decoder. The loss function for Task B is weighted crossentropy loss with weights being the inverse of the frequency of maneuvers in the train data. That is, the weight for symbol,
= , where is the frequency ratio of maneuver in the train data and is determined empirically for best results. The overall network is trained by minimizing the weighted losses of task A, task B and regularization losses (i.e., overall loss, , where , , are the weights for , , , the task A, task B and regularization loss respectively.Inference (Step 2): During inference, given a test data point, an anomaly score is calculated as mentioned in Sect. IIIA and Fig. 3. This anomaly score (say ), however fares poorly with rare positive classes leading to multiple false positives. In order to address this problem, we define a new anomaly score leveraging the predicted maneuvers from task B as shown in Fig. 7. Assume correspond to the maneuvers predicted by task B. We then calculate the negative loglikelihood of such a sequence using (we assume independence for simplicity). This value is low for more frequent maneuvers (e.g., goingstraight) and high for rare maneuvers (e.g., uturns). We divide with this value to obtain the scaled anomaly score. This is high for morefrequent maneuvers and low for lessfrequent maneuvers such as uturns. In this way, rare but nonanomalous situations are weighed down leading to lesser false positives.
Iv Performance Evaluation
In this section, we first explain the experimental setup (data and training) then present quantitative and qualitative results for two scenarios—comparison with unsupervised LSTM autoencoder (without using the information of maneuver labels) and semisupervised multiclass LSTM autoencoder (that uses the information of maneuver labels).
Dataset description: We evaluated our approach on a 150 hours HDD driving dataset [2], which is collected from February 2017 to March 2018, predominantly during daytime. The data consists of Controller Area Network (CAN) bus data that has information about six driving modalities—steer angle, steer speed, speed, yaw, pedal angle and pedal pressure. The data has been downsampled to from the original as we observed better results with lower sampled data. Since this is timeseries data, we adopted a slidingwindow approach as follows. For both autoencoder and symbol predictor, the size of the input window is
, with a stride length of
. For symbol predictor, the size of prediction window is . In order to obtain meaningful results (e.g., anomalous results corresponding to when the car is parked are not useful), we filtered out those windows where the maximum speed of the vehicle is less than . This results in a total of datapoints (windows). We then scaled this data between and in order to make the network invariant to scales of data. Of this data, 70% is used for training the models and rest for evaluating the performance (i.e., windows for train and windows for test). Table I shows the annotated maneuvers/labels present in the HDD dataset (‘Background’ indicates goingstraight) with corresponding percentage of occurrence.Label  Percent [%] 

Background  87.15 
Intersection Passing  6.00 
Left turn  2.58 
Right turn  2.31 
Left lane change  0.54 
Right lane change  0.50 
Crosswalk passing  0.27 
Uturn  0.23 
Left lane branch  0.20 
Right lane branch  0.08 
Merge  0.14 
Training:
We used tensorflow to build, train and test the models with a minibatch size of 512 windows. Weights for reconstruction loss (task A), crossentropy loss (task B) and regularization loss have been set empirically as follows—
, and . We used to scale the weights in crossentropy loss as mentioned in Sect. IIIB. We used twolayers of biLSTMs with a hidden size units for each LSTM cell. We trained the overall network for about epochs using Adam optimizer [13] with a learning rate of and epsilon value of .Comparison with LSTM autoencoder: We compare our approach with fully unsupervised LSTM autoencoder. The network architecture, training method and parameters are similar to that of the LSTM autoencoder part of our multitask network.
Feature  LSTM Autoencoder [3]  Our Approach 

Steer Angle  0.0005  0.0003 
Steer Speed  0.0004  0.0003 
Speed  0.0004  0.0003 
Yaw  0.0004  0.0003 
Pedal Angle  0.0012  0.0012 
Pedal Pressure  0.0012  0.0003 
Combined  0.3043  0.2082 
Category  LSTM Autoencoder [3]  Our Approach  Our Approach (Scaled Scores) 

Speed  21.7%  21.7%  30.4% 
Kturns  13.0%  8.8%  17.4% 
Uturns  4.4%     
Lane Change  34.8%  47.8%  39.1% 
Normal  26.1%  21.7%  13.1% 
Total  100% (23)  100% (23)  100% (23) 
Quantitative results: After the network has been trained, we tested it on evaluation/test data. Fig. 7(a) compares the reconstruction MSE loss between our approach and LSTM autoencoder vs. the number of epochs on test data. We can notice that our approach converges to a lower loss. Table II shows the average normalized reconstruction loss on test data for different modalities between our approach (multitask learning) and LSTM autoencoder. We can notice that, our approach results in lower reconstruction loss with lower error () compared to the standalone autoencoder () in the ‘combined’ category. This shows that the combined system does a better job of learning representations than the standalone autoencoder, resulting in lower loss. Fig. 7(b) compares the weighted crossentropy loss between our approach and standalone symbol predictor. We can notice that our approach achieves lower loss than symbol predictor. Also, we can observe that by coupling an autoencoder to a symbol predictor, the zigzag behavior of the latter has been smoothened out. We can observe similar behavior in Fig. 7(c) for symbol prediction accuracy (as our data is annotated with maneuvers, we are able to calculate the maneuver prediction accuracy with respect to ground truth). Fig. 9 compares the reconstruction performance of three sample turns in test data—Left, Right, U—between our approach and standalone autoencoder. We can notice in all three cases that our approach does a better job of reconstruction when compared to original data (to get these results, we used scaled steer angle data).
Feature  Multiclass LSTM Autoencoder  Our Approach 

Steer Angle  0.0007  0.0004 
Steer Speed  0.0005  0.0004 
Speed  0.0006  0.0004 
Yaw  0.0006  0.0003 
Pedal Angle  0.0014  0.0013 
Pedal Pressure  0.0012  0.0004 
Combined  0.4058  0.2456 
Percentile Top Scores  Multiclass LSTM Autoencoder  Our Approach  Our Approach (Scaled Scores) 

0.001  0.39% (3/765)  1.70% (13/765)  7.97% (61/765) 
0.01  1.96% (15/765)  7.97% (61/765)  29.02% (222/765) 
0.1  13.33% (102/765)  17.25% (132/765)  48.63% (372/765) 
0.5  73.46% (562/765)  52.68% (403/765)  84.44% (646/765) 
1  100.00% (765/765)  99.87% (764/765)  100.00% (765/765) 
Qualitative results: After the network is trained, the reconstruction errors (of dimension due to sampling for ) for each modality are fit to variable gaussian distribution as explained in Sect. IIIB. We also considered another modality which is a combination of all of them. The errors corresponding to this combined modality are fit to a variable () gaussian distribution. We then passed the test data (in windows) to the network and calculated the mahalanobis distances (anomaly scores) for each window of data as per Fig. 3. We also calculated the scaled anomaly scores using the predicted maneuvers by dividing the anomaly scores with the negative loglikelihood of the predicted maneuvers as per Fig. 7. For both cases (scaled and nonscaled), we analyzed the top scores and their corresponding windows. For this purpose, we extracted the video segment corresponding to each window and manually inspected to check if there is any anomalous behavior. By analyzing the video segments corresponding to top anomaly scores, we could classify them into five categories—‘Speed’ anomalies (e.g., abrupt braking), ‘Kturns’, ‘Uturns’, ‘Unusual lane change’ and finally ‘Normal’ (no anomaly has been noticed when inspected visually). We have summarized our analysis results in Table III. We can notice that, while autoencoder classifies Uturns as anomalous, our approach (both scaled and unscaled) does not. We can also notice that our scaled approach classifies lesser ‘Normal’ and more ‘Speed’ anomalies. By comparing, the percentage of ‘Normal’ cases classified as anomalous, we can tell that scaled approach performs better than unscaled, which in turns performs better than standalone autoencoder approach. We have included a video demo showing the different kinds of anomalies (listed in Table III) detected using the above approaches along with the submission.
Comparison with multiclass/ensemble LSTM autoencoder: While the above fully unsupervised LSTM autoencoder did not make use of the maneuver labels, we compared our approach with multiclass LSTM autoencoder that makes use of the maneuver labels like our approach. For this purpose and in order to test the performance of the algorithms, we considered one of the maneuvers viz., uturn as an anomaly. That is, after we split the entire data into train and test data windows, we discarded those windows in the train data where the majority maneuver is a uturn. The remaining train data, which is mainly devoid of any uturn windows, is fed to our multitask classifier. For multiclass LSTM autoencoder, we further divided this train data into 10 parts, each part corresponding to one of the 10 maneuvers in Table I except Uturn. Then we trained 10 LSTM autoencoder classifiers (i.e., an ensemble) corresponding to these 10 maneuvers by providing only the data specific to that maneuver. When given a test data/window, each of the 10 classifiers are used to find 10 reconstruction loss values. Then the lowest of these is considered the reconstruction loss for that test data point.
Quantitative results. Fig. 10 shows the quantitative results, which compare the eval data MSE reconstruction loss between our approach (Multitask) and multiclass LSTM autoencoder approach as the number of training epochs is increased. We recall that the reconstruction loss for multiclass approach is obtained as the lowest reconstruction loss corresponding to 10 different class (maneuver)specific autoencoder classifiers. We can observe that Multitask approach finally achieves a lower loss compared to multiclass approach. The final (after 300 epochs of training) reconstruction loss on the eval/test data for each feature is summarized in Table IV. We can notice that our approach achieves lower reconstruction error for all features, compared to multiclass approach.
Qualitative results. In order to evaluate the qualitative performance of the algorithms, we first sorted the reconstruction losses/scores in decreasing order and then found the number of uturn windows detected in the test data by each approach in the top percentile anomaly scores. The results are shown in Table V. For our multitask approach, we have two scenarios—actual reconstruction loss and scaled reconstruction loss. Considering the especially the top percentiles, we can notice that our approach with scaled scores performs better than our approach with normal scores which in turn performs better than multiclass approach. For example, considering the top percentile anomaly scores for each approach—our approach with scaled scores is able to detect i.e., of a total uturn windows in test data (consisting of windows), while this number is for our approach with actual scores and only for multiclass autoencoder approach.
V Conclusion and Future Work
We have presented a multitask learning based anomaly detection framework that performs better than existing LSTM autoencoder based appraoches. We leverage domain knowledge to reduce false positives. We have validated the proposed approach on 150 hours of driving data and showed the benefits of our approach both quantitatively and qualitatively.
Though we have seen some artifacts in the data corresponding to ‘Normal’ cases (leading them to be classified as anomalous), we will investigate why some other ‘Normal’ cases are classified as anomalous. This along with, improved network architectures and the use of video data (not just CANbus) will be the focus of our future work.
References
 [1] Digital Trends, “Volvo to Release Level 4 Autonomous XC90 in 2021,” https://www.digitaltrends.com/cars/volvoxc90level4autonomy/, 2018.

[2]
V. Ramanishka, Y.T. Chen, T. Misu, and K. Saenko, “Toward Driving Scene Understanding: A Dataset for Learning Driver Behavior and Causal Reasoning,” in
Conference on Computer Vision and Pattern Recognition (CVPR)
, 2018.  [3] P. Malhotra, A. Ramakrishnan, G. Anand, L. Vig, P. Agarwal, and G. Shroff, “LSTMbased EncoderDecoder for Multisensor Anomaly Detection,” in Anomaly Detection Workshop, International Conference on Machine Learning (ICML), New York, NY, USA, 2016.
 [4] M. A. Hayes and M. A. Capretz, “Contextual anomaly detection framework for big sensor data,” Journal of Big Data, vol. 2, no. 1, p. 2, 12 2015.
 [5] A. Capozzoli, F. Lauro, and I. Khan, “Fault detection analysis using data mining techniques for a cluster of smart office buildings,” Expert Systems with Applications, vol. 42, no. 9, pp. 4324–4338, 6 2015.
 [6] Twitter, “Introducing practical and robust anomaly detection in a time series,” 2015. [Online]. Available: https://blog.twitter.com/engineering/en_us/a/2015/introducingpracticalandrobustanomalydetectioninatimeseries.html

[7]
Netflix, “RAD—Outlier Detection on Big Data,”
http://techblog.netflix.com/2015/02/radoutlierdetectiononbigdata.html, 2015.  [8] M. Toledano, I. Cohen, Y. BenSimhon, and I. Tadeski, “Realtime anomaly detection system for time series at scale,” in Proceedings of the KDD: Workshop on Anomaly Detection in Finance, ser. Proceedings of Machine Learning Research, vol. 71, 2018, pp. 56–65.
 [9] D. B. Araya, K. Grolinger, H. F. ElYamany, M. A. Capretz, and G. Bitsuamlak, “An ensemble learning framework for anomaly detection in building energy consumption,” Energy and Buildings, vol. 144, pp. 191–206, 6 2017.

[10]
A. Taylor, S. Leblanc, and N. Japkowicz, “Anomaly Detection in Automobile
Control Network Data with Long ShortTerm Memory Networks,” in
2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)
. IEEE, 10 2016, pp. 130–139.  [11] D. Hallac, S. Bhooshan, M. Chen, K. Abida, R. Sosic, and J. Leskovec, “Drive2Vec: Multiscale StateSpace Embedding of Vehicular Sensor Data,” in 2018 21st International Conference on Intelligent Transportation Systems (ITSC). IEEE, 11 2018, pp. 3233–3238.

[12]
P. Malhotra, L. Vig, G. Shroff, and P. Agarwal, “Long Short Term Memory
Networks for Anomaly Detection in Time Series,” in
European Symposium on Artificial Neural Networks
, Bruges Belgium, 2015.  [13] D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” in International Conference on Learning Representations, San Diego, CA, USA, 5 2015.
Comments
There are no comments yet.