Rule-Based Error Detection and Correction to Operationalize Movement Trajectory Classification
Abstract
Classification of movement trajectories has many applications in transportation. Supervised neural models represent the current state-of-the-art. Recent security applications require this task to be rapidly employed in environments that may differ from the data used to train such models for which there is little training data. We provide a neuro-symbolic rule-based framework to conduct error correction and detection of these models to support eventual deployment in security applications. We provide a suite of experiments on several recent and state-of-the-art models and show an accuracy improvement of 1.7% over the SOTA model in the case where all classes are present in training and when 40% of classes are omitted from training, we obtain a 5.2% improvement (zero-shot) and 23.9% (few-shot) improvement over the SOTA model without resorting to retraining of the base model.
1 Introduction
The identification of a mode of travel for a time-stamped sequence of global position system (GPS) known as “movement trajectories” has important applications in travel demand analysis (Huang et al. 2019), transport planning (Lin & Hsu 2014), and analysis of sea vessel movement (Fikioris et al. 2023). The current state-of-the-art has relied on supervised neural models (Kim et al. 2022). More recently this problem has been of interest for security applications such as leading to efforts such as the IARPA HAYSTAC program111https://www.iarpa.gov/research-programs/haystac. In this domain, models may be deployed in environments with different geography, transportation infrastructure, and socio-cultural dynamics than in the training data and expected to adapt to such environments with little or no labeled data specific to those circumstances. Further, such deployments may happen rapidly, precluding extensive data engineering or model retraining.
In this paper, we extend the current supervised neural methods with a lightweight error detection and correction rule (EDCR) framework providing an overall neurosymbolic system. The key intuition is that training and operation data can be used to learn rules that predict and correct errors in the supervised model. Once trained, the rules are employed operationally in two phases: first detection rules identify potentially misclassified movement trajectories. A second type of rule to re-classify the trajectories (“correction rules”) is then used to re-assign the sample to a new class. Our key contributions are as follows: (1.) We present a strong theoretical framework for EDCR rooted in logic and rule mining and formally prove how quantities related to learned rules (e.g., confidence and support) are related to changes in class-level machine learning metrics such as precision and recall. (2.) We conduct experiments where rules trained on the same data as the original model can improve machine learning metrics across various settings and model types, including the SOTA LRCN model. Specifically, the employment of EDCRs leads to a 1.7% improvement in accuracy over the original LRCN model when data leakage between training and testing is minimized (3.) By excluding 40% of the classes during the training process, we enhance 5.2% (zero-shot) and 23.9% improvement (few-shot) compared to the SOTA model. This progress is accomplished without necessitating any retraining of the underlying base model. (4.) In addition to offering domain knowledge akin to other papers, we furnish a neural network-incorporated condition, characterized by its overarching generality, thereby enhancing the versatility of EDCR for diverse problem domains. (5.) As a side result, we extend the LRCN SOTA model of (Kim et al. 2022) with attention mechanisms that establish a new SOTA baseline in certain cases without EDCR. This model is also improved with EDCR.
The rest of the paper is outlined as follows. In Section 2, we describe the movement trajectory classification problem (MTCP) and associated classification approaches, including our new “LRCN with attention” (LRCNa) model. Then we introduce our error detecting and correcting rule framework (Section 3) which formalizes our strategy for EDC and provides analytical results that support our algorithm development. This is followed by experimental results in Section 4 followed by a discussion on related work and future directions. Additional details supporting the reproducibility of both formal results (e.g., proofs) and experiments (e.g., data preprocessing and experimental details) along with code can be found in an online appendix available at https://github.com/lab-v2/Error-Detection-and-Correction.
2 Technical Preliminaries
In this section, we introduce MTCP, describe the vector embeddings used for a neural based classifier (Dabiri & Heaslip 2018; Kim et al. 2022) as well as the three neural architectures utilized CNN (Dabiri & Heaslip 2018), Long-term Recurrent Convolutional Network (LRCN) (Kim et al. 2022), and (newly introduced in this work) LRCN with attention (LRCNa).
Movement Trajectory Classification Problem. We define the MTCP problem as given a sequence of GPS points, , and assign a movement class from . The number of classes in is . In this work, as per others (e.g., (Dabiri & Heaslip 2018; Kim et al. 2022) )we define , though we will typically not refer to specific classes outside of the description of the experiments for purposes of generalizability. The current paradigm for the MTCP problem is to create a neural model that maps sequences to movement classes using a set of weights, . In this approach traditional methods (i.e., gradient descent) find a set of parameters such that a loss function is minimized based on some training set (where each sample is associated with a ground truth class ). Formally: . We also note that with each sample , we will associate three predicates for each class : , , and that we will later use to describe a logic for reasoning about error correction.
-
•
: if the model predicted class : is true iff .
-
•
: the correct movement class for : is true iff .
-
•
if the model had an error: is true iff . In other words: the model is wrong and predicted class .
Vector Embedding. The current SOTA approaches that we examine for rely on an embedding of a sequence that consists of a stack of vectors describing the velocity, acceleration, jerk(time rate of change of acceleration), and bearing rate. In this paper, we based these calculations on prior work (Kim et al. 2022; Dabiri & Heaslip 2018) and included details in the appendix.
CNN (Dabiri & Heaslip 2018). Utilizing a convolutional neural network (CNN) presents a viable solution for inferring mobility modes from GPS trajectories, as it can autonomously extract highly efficient features (Dabiri & Heaslip 2018). Here, the CNN incorporates a comprehensive set of layers, including the input layer, convolutional layers, pooling layers, fully-connected layers, and dropout layers.
LRCN (Kim et al. 2022). To further enhance the accuracy of extracting mobility modes from GPS trajectories, the application of a Long-term Recurrent Convolutional Network (LRCN) proves beneficial (Kim et al. 2022). The layers of the LRCN model follow a hierarchical structure with three components, proceeding from bottom to top: the convolutional layers, LSTM layers, and fully connected layers.
LRCN with Attention (new in this paper). Due to the notable performance improvement transformer architecture (Vaswani et al. 2017) has provided on related problems, we felt it would be important to include a transformer-based approach. Hence, we created a simple extension to LRCN that utilizes attention. We shall refer to this architecture as LRCNa. We provide an overview in Figure 1 in the appendix. LRCNa is a neural network architecture comprising several essential components, including convolutional layers employed for feature extraction purposes, LSTM layers, and an attention layer, which collaboratively contribute to sequence learning, and lastly, fully connected layers strategically utilized for effective classification tasks.
3 Error Detection and Correction Rules
A key issue with the deployment of model is that it may encounter sequences whose distribution differs from the data used to train the model. Further, in our target application, there may not be sufficient labeled data or time to properly retrain . We also note that in some cases, may be inaccessible for fine-tuning (e.g., behind an API). Additionally, understanding why the results of change is also important for our envisioned security application. As such, we are employing a rule-based approach to correcting . The intuition is that using limited data, we will learn a set of rules (denoted ) that will be able to detect and correct errors of by logical reasoning (Aditya et al. 2023). Then, upon deployment for some new sequence , we would first compute the class and then use the rules in set to conclude if the result of should be accepted and if not, provide an alternate class in an attempt to correct the mistake. In this section, we formalize the error correcting framework with a simple first order logic (FOL) and provide analytical results relating aspects of learned rules that inform our analytical approach to learning such error detecting and correcting rules. We complete the section with a discussion on how various potential “failure conditions” are extracted to create the rules to correct errors.
Throughout this section, we shall assume a set of operational sequences for which there is ground truth available after model training. The size of set is and generally, this is expected to be much smaller than (the set of training data). Later, in our experiments, we look at cases where and - however these are not requirements as our results are based on model performance on - and we envision use-cases where is significantly different from . On these samples, for each class , the model () returns class for of the samples, and for each class we have the number of true positives, false positives, true negatives, and false negatives . We have precision , recall , and prior of predicting class : .
Language. We assume simple first order language where samples are represented by constant symbols, and we have unary predicates associated with each sample. This language includes a set of “condition” predicates associated with each sample that can be either true or false for a given sample. Additionally, the language includes the following:
-
•
“Correct” predicates which denotes the ground truth class for the sample (i.e., for a given sample one will be true and the rest false),
-
•
“Prediction” predicates denotes the predicted class for the model (i.e., for a given sample one will be true and the rest false)
-
•
“Error” predicates if the sample is incorrect for class . Note that is true iff both is true and is false
Rules The set of rules will consist of two rules for each class: one “error detecting” and one “error correcting.” Error detecting rules will determine if a prediction by is invalid. In essence, we can think of such a rule as changing the movement class assigned by to some sample from to “unknown.” For a given class , we will have an associated set of detection conditions that is a subset of conditions, the disjunction of which is used to determine if gave an incorrect classification.
(1) |
After the application of the error detection rules for each class, we may consider re-assigning the samples to another class using a second type of rule called the “corrective rule.” Such rules are formed based on a subset of conditions-class pairs .
(2) |
Associated with the rules of both types are the following values - both are defined as zero if there are no conditions.
Support (): fraction of samples in where the body is true.
Support w.r.t. class (): given the subset of samples where the model predicts class , the fraction of those samples where the body is true (note the denominator is ).
Confidence (): the number of times the body and head are true together divided by the number of times the body is true.
Now we present some analytical results that inform our learning algorithms. Our strategy for learning involves first learning detection rules (which establish conditions for which a given classification decision by is deemed incorrect) and then learning correction rules (which then correct the detected errors by assigning a new movement class to the sample). We formalize these two tasks as follows.
Improvement by error detecting rule. For a given class , find a set of conditions such that precision is maximized and recall decreases by, at most .
Improvement by error correcting rule. For a given class , find a subset of such that either precision or recall is maximized.
Properties of Detection Rules. First, we examine the effect on precision and recall when an error detecting rule is used. Our first result shows a bound on precision improvement. If class support () is less than , which we would expect (as the rule would be designed to detect the portion of results that failed), then we can also show that the quantity gives us a lower bound on the improvement in precision. In the appendix, we also note that precision will always increase under a reasonable condition (specifically when ). The proof of this and all other formally stated results can be found in the appendix.
Theorem 1.
Under the condition , the precision of model for class , with initial precision , after applying an error detecting rule with support and confidence increases by a function of and and is greater than or equal to .
The error detecting rules can cause the recall to stay the same or decrease. Our next result tells us precisely how much recall will decrease.
Theorem 2.
After applying the rule to correct errors, the recall will decrease by .
It turns out that both quantities identified in the theorem 1 and theorem 2 are submodular and monotonic - a property we can use algorithmically (formal statements and proofs are included in the appendix). Specifically, we can see that the selection of a set of rules to maximize subject to the constraint that is a special case of the “Submodular Cost Submodular Knapsack” (SCSK) problem and can be approximated with a simple greedy algorithm (Iyer & Bilmes 2013) with approximation guarantee with polynomial run time (Theorem 4.7 of (Iyer & Bilmes 2013)). Our algorithm DetRuleLearn is an instantiation of such an approach to creating an error detecting rule for a given class. As this algorithm will only select conditions for error detecting rules for a given movement class that ensure that recall does not decrease more than epsilon, we can be assured it meets our requirement for recall. Here are simply the number of samples that satisfy the conditions for some set as well as satisfy (for ) and (for ) respectively. In other words, given a set of condition class pairs and the rule of interest, BOD here is the number of examples that satisfy the body (class-condition pair) of the error detection rules, and POS here is the number of examples that satisfy the body (class-condition pair) and the head of the error detection rules. are precision and recall for class while is the number of samples that the model classifies as class .
Properties of Corrective Rules. In what follows, we shall examine the results for corrective rules. Here, the error correcting rule with predicate in the head will have a disjunction of elements of set . Also, note that here the support is used instead of class support (). Here we find that both precision and recall increase with rule confidence (Theorem 3). We also show a corollary that ensures that recall is always non-decreasing for corrective rules and that precision increases when the rule confidence exceeds .
Theorem 3.
For the application of error correcting rules, both precision and recall increase if and only if rule confidence () increases.
It is clear that confidence is the right quantity to optimize for error correcting rules as it will get both precision and recall. With these results in mind, we can optimize both precision and recall using an error correcting rule (with respect to the class specified in the rule head) but optimizing for confidence. Note that this does not consider the precision and recall for the class specified in the rule body (however, we shall assume that the impact on precision and recall for the class in the body was handled with the application of the initial error detection rules). However, it is noteworthy that confidence is not monotonic as we add conditions to set as the precision can decrease. We will consider an initial set of condition-class pairs that is a subset of . For a given class for which we create an error correcting rule, we select from this larger set. To do so, we adapt the simple “Deterministic USM” algorithm of (Buchbinder et al. 2012) that we call 2. Note here that is the number of samples that satisfy the rule body and head ( in this case) given a set of condition-class pairs while is the number of samples that satisfy the body formed with set .
Learning Detection and Correction Rules Together. Error correcting rules created using CorrRuleLearn will provide optimal improvement to precision and recall for the rule in the target class, but in the case of multi-class problems, it will cause recall to drop for some other classes. However, we can combine both error detecting and correcting rules to overcome this difficulty. The intuition is first to create error detecting rules for each class, which effectively re-assigns any sample into an “unknown” class. Then, we create a set (used as input for CorrRuleLearn) based on the conditions selected by the error detecting rules. In this way, we will not decrease recall beyond what occurs in the application of error detecting rules.
Conditions for Error Detection and Correction
In this section, we describe the methods we used to create conditions (set ) from dataset . As mentioned in section LABEL:Introduction, in addition to offering domain-specific knowledge, our contribution extends to the provision of a condition integrated with a neural network, referred to as the model based in our paper. This condition, marked by its comprehensive generality, serves to amplify the adaptability of the EDCR across a spectrum of diverse problem domains.
Model Based The field of Deep Learning witnesses a continuous influx of new and improved models for solving complex problems. The prevailing trend involves the adoption of the latest and supposedly superior models, often leading to the abandonment of previously successful ones. We present a method that challenges this paradigm, proposing a technique to harness the potential of older, proven models to augment the performance of the latest and most advanced models. We employ a collection of diverse pre-existing neural models as a set of conditions to enhance the efficacy of the current model. More specifically, a more coarse-grain model can also provide insight into the conditions. As such, we utilized a binary classifier for each class for a given sample. Hence, given class , we have a binary classifier which returns “true” for sample if assigns it as and “false” otherwise. In this way, for each sample we have a condition for each of the classes. We used the LRCNa architecture for the binary classifier and the details are in the appendix.
Domain Knowledge Harnessing domain expertise in outlier analysis can yield valuable insights and conditions. Specifically, our attention was drawn to the maximum velocity records within our dataset. Consequently, for each class denoted as , we formulated a set of conditions encapsulated by , each of which is linked to the maximum velocity criterion. So, for a given sample , is true if the velocity for is greater than the maximum velocity observed in set and false otherwise.
4 Experimental Evaluation
GeoLife Dataset. The proposed methodology is validated and assessed using GPS trajectories obtained from the GeoLife project, which involved data collected from 69 users (Zheng et al. 2008). Details on the preprocessing of the data can be found in the appendix.
No Overlap | Segment Overlap | Data point Overlap | ||||
---|---|---|---|---|---|---|
Random | Sequential | Random | Sequential | Random | Sequential | |
(least leakage) | (prev. studies) | |||||
LRCNa (ours) | 0.747 | 0.751 | 0.971 | 0.758 | 0.921 | 0.760 |
LRCNa+EDCR (ours) | 0.759 (+1.6%) | 0.763 (+1.6%) | 0.971 ( 0%) | 0.769 (+1.5%) | 0.921 ( 0%) | 0.780 (+2.6%) |
LRCN (prev. SOTA) | 0.749 | 0.747 | 0.952 | 0.767 | 0.887 | 0.774 |
LRCN+EDCR (ours) | 0.761 (+1.6%) | 0.760 (+1.7%) | 0.952 ( 0%) | 0.768 (+0.1%) | 0.889 (+0.2%) | 0.783 (+1.1%) |
CNN | 0.742 | 0.755 | 0.851 | 0.763 | 0.853 | 0.779 |
CNN+EDCR (ours) | 0.743 (+0.1%) | 0.755 ( 0%) | 0.866 (+1.8%) | 0.763 ( 0%) | 0.862 (+1.0%) | 0.779 ( 0%) |
No Overlap | Segment Overlap | Data point Overlap | ||||
---|---|---|---|---|---|---|
Random | Sequential | Random | Sequential | Random | Sequential | |
(least leakage) | (prev. studies) | |||||
LRCNa (ours) | 0.727 | 0.734 | 0.971 | 0.742 | 0.906 | 0.715 |
LRCNa+EDCR (ours) | 0.742 (+2.06%) | 0.751 (+2.32%) | 0.971 ( 0%) | 0.757 (+2.02%) | 0.906 ( 0%) | 0.749 (+4.76%) |
LRCN (prev. SOTA) | 0.732 | 0.738 | 0.951 | 0.751 | 0.864 | 0.737 |
LRCN+EDCR (ours) | 0.75 (+2.46%) | 0.741 (+0.41%) | 0.951 ( 0%) | 0.76 (+1.2%) | 0.864 (+0%) | 0.755 (+2.44%) |
CNN | 0.722 | 0.737 | 0.846 | 0.745 | 0.826 | 0.748 |
CNN+EDCR (ours) | 0.723 (+0.14%) | 0.737 ( 0%) | 0.866 (+2.36%) | 0.745 ( 0%) | 0.83 (+0.48%) | 0.748 ( 0%) |
Training and Test Splits. Previous work such as (Kim et al. 2022) is known to have data leakage based on the split between training and test primarily due to segments of a movement sequence existing in both training and test sets resulting from ransom assignment to each. To address this data leakage issue, we examine our algorithms under various conditions based on ordering and overlap. For ordering, we examine random (which can allow previous behavior of the same agent in the training set, as in previous work) and sequential (which orders the agents to avoid this issue). For overlap, we examine no overlap between the training and test sets, segment overlap that allows training and test samples to overlap each other(as in previous work), and data point overlap (that allows for data points of a trajectory to span both training and test).
Compute and Implementation. All experiments were performed on a 2000 MHz AMD EPYC 7713 CPU, and a NVIDIA GA100 GPU using Python 3.10 with PyTorch.
All Classes Observed. In our first set of experiments, we examined how error detecting and correcting rules (EDCR) can affect the performance of the underlying model. In Table 1 we examine the accuracy of each model, both with and without EDCR. Models enabled with EDCR performed the same or better with improvement most noticeable when samples are sequential (which has less data leakage between training and test). In terms of overall performance, LRCNa with EDCR performed the best in five of six cases with LRCN with EDCR performing the best in the sixth. Of particular importance, in the “no overlap - sequential” case - the least likely to exhibit data leakage - EDCR improves the performance of both LRCNa and LRCN, 1.6% and 1.7% respectively. Additionally, we scrutinized the F1 scores in Table 2 for all models, both with and without EDCR, revealing more improvement in performance metrics compared to accuracy.
Hyperparameter Sensitivity. In the “all classes observed” set of experiments, we also examined hyperparameter sensitivity for . Recall that is interpreted as the maximum decrease in recall. We observed and validated the theoretical reduction(TR) in recall empirically and the experiments show us that in all cases, recall was no lower than the threshold specified by the hyperparameter though recall decreases as increases. In many cases, the experimental evaluation reduced recall significantly less than expected. In Figure 2, as the value of (x-axis) ranges from 0 to 0.10, it is evident that the decline in recall for all classes remains within the confines of 0.10. Likewise, precision only increases with , which is aligned with our theoretical results. We show precision, recall, and F1 by class for the “no overlap - sequential” of LRCNa in Figure 2. Though the algorithm DetCorrRuleLearn calls for a single hyperparameter, it is possible to set it differently for each class (e.g., lower values for classes where recall is important, higher values for classes where false positives are expensive). This may be beneficial as F1 for different classes seemed to peak for different values of . We leave the study of heterogeneous settings to future work.
Removal of Movement Classes from Training. Our experimental focus was on assessing how the introduction of EDCR impacts model performance in scenarios where certain movement classes are excluded from training. In Figure 3, we trained the CNN, LRCN, and LRCNa models without incorporating the walk and drive classes. Remarkably, employing EDCR without any supplementary data yielded a 5.2%(zero-shot) improvement over the base models, and a 23.9% (few-shot) improvement over the SOTA model without resorting to retraining of the base model, with even more pronounced results than in the initial experiment set. Utilizing a mere 30% of data from previously unseen classes, EDCR demonstrates a 21.3% to elevate the performance of the baseline model, all achieved without the need for direct access to the model itself. This outcome implies the potential for conducting few-shot learning, enabling the adaptation of to novel scenarios with impressive efficacy. This enhancement significantly boosts accuracy using limited data for unseen samples, without extensive model modifications. This is crucial when direct model access is limited, for example through an API.
5 Related Work and Conclusion
As described earlier, the MTCP problem was previously studied in (Dabiri & Heaslip 2018; Kim et al. 2022), which introduces the LRCN and CNN architectures, respectively. Earlier work has also explored this problem with other machine learning approaches (Zheng et al. 2008; Wang et al. 2017; Simoncini et al. 2018). Note that error detection and correction have not previously been explored in these earlier works. Also note that both this prior work and this paper differ from trajectory generation (Janner et al. 2021; Chen et al. 2021; Itkina & Kochenderfer 2022) - which differs from trajectory classification.
Earlier work on machine learning introspection (Daftry et al. 2016; Ramanagopal et al. 2018) examined error detection on various perceptual models. Unlike this work, these approaches were not applied to the MTCP, only focused on error detection, and did not provide theoretical guarantees of improvement. Another area of related work is machine learning verification that (Ivanov et al. 2021; Jothimurugan et al. 2021; Ma et al. 2020)) that looks to ensure the output of an ML model meets a logical specification. Like our work, some of these contributions (e.g. (Ma et al. 2020)) adjust the output of a machine learning model to meet a logic-based specification. However, to our knowledge, there has been no work on the use of machine learning verification to correct a machine learning model as this work does. Other related areas include meta-learning and domain generalization (Hospedales et al. 2021; Zhou et al. 2022; Vanschoren 2018; Maes & Nardi 1988) which attempt to account for changes in the distribution of data and/or selection of a model that was trained on data similar to the current problem. While our approach can use additional data, it does not depend on training data generated by different distributions. To our knowledge, these other methods have not been applied to MTCP. Recent studies on abductive learning (Huang et al. 2023; Dai et al. 2019) and neural symbolic reasoning (Cornelio et al. 2022) incorporate error correction mechanisms rooted in inconsistency with domain knowledge as logical rules. These approaches typically necessitate direct access to the perceptual model for effective implementation. In contrast, our work takes a distinct approach by avoiding reliance on predefined learning rule pairs and eliminating the need for direct access to the perceptual model. We conjecture that these approaches could be complementary to EDCR, and we leave it to future work to explore how they can work together.
Conclusion. A key near-term direction for future work is the employment of these methods in government-administered tests of the IARPA HAYSTAC program which will provide an assessment of utility more closely related to real-world use cases. Likewise, an extension related to the aforementioned IARPA program would be to identify a sequence of movement classes in the case where an agent’s mode of transit may change. For example, Here we would look to apply our error detection and correction framework to recently introduced models such as those described in (Zeng et al. 2023). Separately, we framed rule learning as a pair of submodular maximization problems, but there are several options for algorithms beyond this paper. Finally, the use of rules for error detection and correction of machine learning models presented here may be useful in domains such as vision.
6 Acknowledgments
This research is supported by the Intelligence Advanced Research Projects Activity (IARPA) via the Department of Interior/ Interior Business Center (DOI/IBC) contract number 140D0423C0032. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DOI/IBC, or the U.S. Government. Additionally, some of the authors are supported by ONR grant N00014-23-1-2580 as well as internal funding from ASU Fulton Schools of Engineering.
References
- Aditya et al. (2023) Aditya, D., Mukherji, K., Balasubramanian, S., Chaudhary, A., and Shakarian, P. PyReason: Software for open world temporal logic. In AAAI Spring Symposium, 2023.
- Buchbinder et al. (2012) Buchbinder, N., Feldman, M., Naor, J., and Schwartz, R. A tight linear time (1/2)-approximation for unconstrained submodular maximization. In 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science, pp. 649–658, 2012. doi: 10.1109/FOCS.2012.73.
- Chen et al. (2021) Chen, L., Lu, K., Rajeswaran, A., Lee, K., Grover, A., Laskin, M., Abbeel, P., Srinivas, A., and Mordatch, I. Decision transformer: Reinforcement learning via sequence modeling. CoRR, abs/2106.01345, 2021. URL https://arxiv.org/abs/2106.01345.
- Cornelio et al. (2022) Cornelio, C., Stuehmer, J., Hu, S. X., and Hospedales, T. Learning where and when to reason in neuro-symbolic inference. In The Eleventh International Conference on Learning Representations, 2022.
- Dabiri & Heaslip (2018) Dabiri, S. and Heaslip, K. Inferring transportation modes from gps trajectories using a convolutional neural network. Transportation research part C: emerging technologies, 86:360–371, 2018.
- Daftry et al. (2016) Daftry, S., Zeng, S., Bagnell, J. A., and Hebert, M. Introspective perception: Learning to predict failures in vision systems, 2016. URL http://arxiv.org/abs/1607.08665.
- Dai et al. (2019) Dai, W.-Z., Xu, Q., Yu, Y., and Zhou, Z.-H. Bridging machine learning and logical reasoning by abductive learning. Advances in Neural Information Processing Systems, 32, 2019.
- Fikioris et al. (2023) Fikioris, G., Patroumpas, K., Artikis, A., Pitsikalis, M., and Paliouras, G. Optimizing vessel trajectory compression for maritime situational awareness. GeoInformatica, 27(3):565–591, 2023.
- Hospedales et al. (2021) Hospedales, T., Antoniou, A., Micaelli, P., and Storkey, A. Meta-learning in neural networks: A survey. IEEE transactions on pattern analysis and machine intelligence, 44(9):5149–5169, 2021.
- Huang et al. (2019) Huang, H., Cheng, Y., and Weibel, R. Transport mode detection based on mobile phone network data: A systematic review. Transportation Research Part C: Emerging Technologies, 101:297–312, 2019.
- Huang et al. (2023) Huang, Y.-X., Dai, W.-Z., Jiang, Y., and Zhou, Z.-H. Enabling knowledge refinement upon new concepts in abductive learning. 2023.
- Itkina & Kochenderfer (2022) Itkina, M. and Kochenderfer, M. J. Interpretable self-aware neural networks for robust trajectory prediction, 2022.
- Ivanov et al. (2021) Ivanov, R., Carpenter, T., Weimer, J., Alur, R., Pappas, G., and Lee, I. Verisig 2.0: Verification of neural network controllers using taylor model preconditioning. In Computer Aided Verification: 33rd International Conference, CAV 2021, Virtual Event, July 20–23, 2021, Proceedings, Part I, pp. 249–262. Springer-Verlag, 2021. ISBN 978-3-030-81684-1. doi: 10.1007/978-3-030-81685-8˙11. URL https://doi.org/10.1007/978-3-030-81685-8˙11.
- Iyer & Bilmes (2013) Iyer, R. and Bilmes, J. Submodular optimization with submodular cover and submodular knapsack constraints. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, NIPS’13, pp. 2436–2444, Red Hook, NY, USA, 2013. Curran Associates Inc.
- Janner et al. (2021) Janner, M., Li, Q., and Levine, S. Offline reinforcement learning as one big sequence modeling problem. In Advances in Neural Information Processing Systems, 2021.
- Jothimurugan et al. (2021) Jothimurugan, K., Bansal, S., Bastani, O., and Alur, R. Compositional reinforcement learning from logical specifications. In Advances in Neural Information Processing Systems, 2021.
- Kim et al. (2022) Kim, J., Kim, J. H., and Lee, G. Gps data-based mobility mode inference model using long-term recurrent convolutional networks. Transportation Research Part C: Emerging Technologies, 135:103523, 2022.
- Lin & Hsu (2014) Lin, M. and Hsu, W.-J. Mining gps data for mobility patterns: A survey. Pervasive and mobile computing, 12:1–16, 2014.
- Ma et al. (2020) Ma, M., Gao, J., Feng, L., and Stankovic, J. Stlnet: Signal temporal logic enforced multivariate recurrent neural networks. Advances in Neural Information Processing Systems, 33:14604–14614, 2020.
- Maes & Nardi (1988) Maes, P. and Nardi, D. Meta-level architectures and reflection. 1988.
- Ramanagopal et al. (2018) Ramanagopal, M. S., Anderson, C., Vasudevan, R., and Johnson-Roberson, M. Failing to learn: Autonomously identifying perception failures for self-driving cars. 3(4):3860–3867, 2018. ISSN 2377-3766, 2377-3774. doi: 10.1109/LRA.2018.2857402. URL http://arxiv.org/abs/1707.00051.
- Simoncini et al. (2018) Simoncini, M., Taccari, L., Sambo, F., Bravi, L., Salti, S., and Lori, A. Vehicle classification from low-frequency gps data with recurrent neural networks. Transportation Research Part C: Emerging Technologies, 91:176–191, 2018.
- Vanschoren (2018) Vanschoren, J. Meta-learning: A survey. arXiv preprint arXiv:1810.03548, 2018.
- Vaswani et al. (2017) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems, pp. 5998–6008, 2017.
- Vincenty (1975) Vincenty, T. Direct and inverse solutions of geodesics on the ellipsoid with application of nested equations. Survey review, 23(176):88–93, 1975.
- Wang et al. (2017) Wang, H., Liu, G., Duan, J., and Zhang, L. Detecting transportation modes using deep neural network. IEICE TRANSACTIONS on Information and Systems, 100(5):1132–1135, 2017.
- Zeng et al. (2023) Zeng, J., Yu, Y., Chen, Y., Yang, D., Zhang, L., and Wang, D. Trajectory-as-a-sequence: A novel travel mode identification framework. 146:103957, 2023. ISSN 0968-090X. doi: https://doi.org/10.1016/j.trc.2022.103957. URL https://www.sciencedirect.com/science/article/pii/S0968090X22003709.
- Zheng et al. (2008) Zheng, Y., Li, Q., Chen, Y., Xie, X., and Ma, W.-Y. Understanding mobility based on gps data. In Proceedings of the 10th international conference on Ubiquitous computing, pp. 312–321, 2008.
- Zhou et al. (2022) Zhou, K., Liu, Z., Qiao, Y., Xiang, T., and Loy, C. C. Domain generalization: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
Appendix A Appendix
Details on Vector Embedding of Sequences
We begin with a set of GPS points where each point is a tuple of timestamp (), latitude (), and longitude (), . Each point also has an associated class label . To embed these tuples as vector embeddings that can be consumed by the neural model , three essential preprocessing steps must be performed. These steps include normalizing the data size to meet the input requirements, extracting movement behaviors from the GPS points, and refining the data. In this section, we draw upon previous approaches (Zheng et al., 2008a,b; Dabiri et al., 2018; Kim, 2022) to guide the data preprocessing process.
As part of the data size normalization step we sequentially group chronologically ordered GPS points into uniform lengths of . The class label of every point in this sequence is the same and the entire sequence represents the movement trajectory of that class for time units. The resulting sequence , where is the set of all sequences that are curated.
To capture patterns of movement behaviors from GPS points the distance time-series vector is computed as follows. is the distance between two GPS point tuples and , where and , and is computed using the Vincenty Distance formula (Vincenty 1975). Here represents the distance between two points and from the sequence. There could be cases where a distance time-series vector falls short of data points. To maintain a consistent length of sequence we pad the shorter vector with zeros.
Additionally, we extract the velocity (), acceleration (), jerk () and bearing rate () time-series vectors for each sequence as follows:
(1) | |||
(2) | |||
(3) | |||
(4) | |||
(5) | |||
(6) | |||
(7) | |||
We finally stack the vectors , , and for each sequence , which is passed as the input to the neural model as detailed in section 2.
Formal Statements of Additional Theorems Corollaries for Error Detection Rules
Corollary 1.
If and only if then the rule will cause precision not to decrease.
Corollary 2.
If (the minimum condition for precision improvement from Corollary 1 then recall decreases by at most .
Theorem 4.
For a given error detecting rule, the quantity is a normalized polymatroid function w.r.t. set .
Corollary 3.
The quantity (decrease in recall) is a normalized polymatroid function w.r.t. set .
Corollary 4.
GreedyRuleSelect provides an approximation of that is within of optimal.
Formal Statements of Additional Theorems Corollaries for Error Correction Rules
Corollary 5.
Precision increases for class with the application of an error correcting rule if and only if .
Corollary 6.
Recall is non-decreasing for class with the application of an error correcting rule.
Theorem 5.
Confidence is submodular with respect to .
Corollary 7.
For an arbitrarily small constant , DetUSMPosRuleSelect provides a approximation of confidence if the returned confidence is greater than the initial precision.
Proof of Theorem 1
Under the condition , the precision of model for class , with initial precision , after applying an error correcting rule with support and confidence increases by a function of and and is greater than or equal to .
Proof.
CLAIM 1: The precision of model for class , with initial precision , after applying an error correcting rule with support and confidence increases by:
(8) |
The total number of items that will attempt to classify as before error correction is . Out of those, will be corrected by the rule. However, a fraction of will be samples that would have been true positives if not corrected. Hence, the new precision can be written as follows:
(9) |
As , we have:
(10) | |||
(11) |
Now we subtract from that quantity the initial precision.
(12) | |||
(13) | |||
(14) | |||
(15) |
CLAIM 2: If then is a lower bound on the improvement in precision.
BWOC, then by Claim 1 we have.
(16) | |||
(17) | |||
(18) | |||
(19) | |||
(20) |
However, as this is a contradiction.
The proof of the theorem then follows directly from claim 2. ∎
Proof of Corollarly 1
If and only if then the rule will cause precision not to decrease.
Proof.
Suppose, BWOC, the statement is false. By Theorem 1 then the following must be true.
(21) | |||
(22) | |||
(23) | |||
(24) |
However, as this cannot hold.
Likewise, suppose BWOC that and BWOC the statement is false:
(25) | |||
(26) | |||
(27) | |||
(28) |
Again, a contradiction. ∎
Proof of Theorem 4
For a given error detecting rule, the quantity is a normalized polymatroid function w.r.t. set .
Proof.
CLAIM 1: where is the number of samples where both the rule body and head are satisfied.
Let be the number of samples that the body of the rule is true. This gives us which is equivalent to the statement of the claim.
CLAIM 2: The quantity is submodular w.r.t. set .
We show this by the subodularitiy of as is a constant as well as the result of Claim 1. BWOC, is not submodular for some set . We use the symbol to denote this and assume the exsitence of two sets of conditions . Then, the following must be true:
(29) |
Which can be re-written as:
(30) | |||
(31) |
This quantity is less than the following:
(32) |
However, this would imply there is at least one element in not in either or which is a contradiction.
CLAIM 3: monotonically increases with .
By claim 1, as the quantity equals and is a constant, we just need to show monotonicity of . Clearly increases monotonically as additional elements in can only make it increase.
CLAIM 4: When , .
Follows directly from the fact that we define as zero is no conditions are used.
Proof of theorem. Follows directly from claims 2-4. ∎
Proof of Theorem 2
After applying the rule to correct errors, the recall will decrease by
(33) |
Proof.
The number of corrections made by the rule is with fraction of these being incorrect (increasing false negatives). Note that the sum does not change after error correction, as any “corrected” false positive becomes a false negative, and false negatives do not otherwise change from error correction. Therefore, the new recall is:
(34) |
When this quantity is subtracted from the original recall (), we obtain:
(35) |
We note that which gives us:
(37) | |||
(38) | |||
(39) |
∎
Proof of Corollary 2
If (the minimum condition for precision improvement from Corollary 1 then recall decreases by at most .
Proof.
Suppose BWOC the statement is false. By Theorem 2, recall decrease by . This gives us:
(40) |
Precision cannot be less than , so recall must then decrease by:
(41) | |||
(42) |
∎
Proof of Corollary 3
The quantity (decrease in recall) is a normalized polymatroid function w.r.t. set .
Proof.
Note that is the number of samples that satisfy the body, while is the number of samples that satisfy the body and head, .
(43) | |||||
(44) | |||||
(45) |
As is a constant, we need to show the submodularity of which follows the same argument for as per Claim 2 of Theorem 4. Likewise, is montonic (mirroring the argument of Claim 3 of Theorem 4) and normalized by the defintion of in the case where there are no conditions. The statement of the theorem follows. ∎
Proof of Theorem 3
For the application of positive rules, precision increases if and only if rule confidence () increases.
Proof.
CLAIM 1: Precision increases by .
The new precision is equal to the following:
(46) |
The improvement of the precision can be derived as follows.
(47) | |||||
(48) | |||||
(49) | |||||
(50) | |||||
(51) |
CLAIM 2: If count of samples satisfying both rule body and head (the numerator of confidence) increases, then precision increases.
Suppose BWOC the claim is not true. Then for some value of for which the improvement in precision is greater than . Note that, in this case, the number of samples satisfying the body also increases by . First, we know that we can re-write the result of claim 1 as follows.
(52) |
Therefore, using the result from Claim 1, the following relationship must hold.
(53) | |||
(54) | |||
(55) |
This gives us a contradiction, as and by definition.
CLAIM 3: If the difference in precision increases, the number of samples satisfying both rule body and head must increase.
By definition, the only way for this to occur is if increases and does not - as they can both increase or only increase. If neither there is no change, and it is not possible for to increase without . Therefore the following must be true.
(56) |
However, this is clearly a contradiction the expression on the right is clearly smaller (the numerator is smaller as is positive, and the denominator is larger).
CLAIM 4: Precision increases if and only if increases.
Follows directly from claims 1-3.
CLAIM 5: When adding more samples that satisfy the body of the rule, confidence increases if and only if increases.
Note that confidence is defined as . Clearly, there confidence decreases if increases but not and it is not possible for to increase alone. Therefore, BWOC, the following must hold true.
(57) | |||
(58) | |||
(59) |
This is a contradiction as .
Going other way, suppose BWOC confidence increases but POS does not. We get:
(60) | |||
(61) | |||
(62) |
However, by the statement, as we add more samples that satisfy the body of the rule, we must have . Hence a cotnradiction.
CLAIM 6: Recall increases if and only if increases.
As we can write the new recall in this case simply as the following, the claim immediately follows.
(63) |
CLAIM 7: Recall increases if and only if increases.
Follows directly from claims 5-6.
Proof of theorem.
Follows directly from claims 4 and 7. ∎
Proof of Corollary 4
GreedyRuleSelect provides an approximation of that is within of optimal.
Proof.
Follows directly from Theorem 4.7 of (Iyer & Bilmes 2013). ∎
Proof of Corollary 7
For an arbitrarily small constant , DetUSMPosRuleSelect provides a approximation of confidence if the returned confidence is greater than the initial precision.
Proof.
Follows directly from the fact that confidence is zero when and Theorem 2.3 of (Buchbinder et al. 2012). ∎
Conditions for Error Detection and Correction
This section describes the various methods we used to create conditions (set ) in detail with examples.
Model based. In this study, we employed multiple models, denoted as , each corresponding to a specific class. These models were constructed using our LRCNa architecture, as detailed in this paper. However, during the training process, we adapted the model to perform binary class classification. To illustrate, for the drive class, we divided the training data into two distinct datasets: one exclusively containing samples labeled as drive, and the other encompassing samples labeled as walk, bike, bus, train, collectively forming the non_drive class. We employ this binary class classification approach to establish a set of conditions C.
In the realm of Deep Learning, the constant evolution of models poses the challenge of choosing the most optimal solution for a given problem. It is a common practice to discard older SOTA models in favor of newer ones. However, this paper introduces a novel approach aimed at leveraging the capabilities of older, proven models to enhance the performance of the latest SOTA models.
In the context of classification problems, the conventional practice involves employing a threshold of 0.5 for evaluating final results. As illustrated in many receiver operating characteristic(ROC) curves, it is evident that precision generally escalates with an increase in the threshold. Consequently, a higher threshold is advocated as a standard in older state-of-the-art models to enhance their performance.
Examining the ROC curve as an illustrative example, with a threshold of 0.5, the True Positive Rate (TPR) approximates 0.65. Elevating the threshold to 0.9 corresponds to an increased TPR of approximately 0.8. In the event of the introduction of a new state-of-the-art model with a TPR below 0.8 at the 0.5 threshold, adopting the 0.9 threshold from the prior model is recommended. Here, values predicted above 0.9 are considered true positives, while those below 0.9 are designated as unknown predictions. For the latter, the state-of-the-art model can be employed for prediction.
Similar principles are applicable when utilizing the False Positive Rate curve and reducing the threshold. A lowered threshold yields a higher true-false prediction ratio, thereby offering a basis for refining predictions. This methodology, originally designed for binary classification, is adaptable for enhancing predictions in the realm of multiple classifications as well.
Domain knowledge. Leveraging domain knowledge pertaining to outliers, we focused on the maximum velocity values present in our dataset. Notably, the highest speed records were associated with the drive labels. To ensure fair and consistent comparisons across the dataset, we conducted data normalization based on the maximum speed observed in the drive data. The highest velocity recorded in our dataset is 1, associated with the label drive.” Following closely is the train label, exhibiting a maximum velocity of 0.751.
In our datasets, any sample with a speed exceeding the maximum speed recorded for the train (0.751 in our dataset) is unambiguously classified as a drive. In a broader context, we apply the following condition: For instance, if a sample’s maximum speed measures 0.73—falling below both the maximum speeds of 0.751 attributed to the train class and 1 associated with the drive class, yet surpassing those of other categories—it indicates that the sample is likely to be categorized as either drive or train. we proceed to assess its multiclass prediction values. The class with the higher prediction value will ultimately determine our final classification for the sample.
Model based. In this study, we employed multiple models, denoted as , each corresponding to a specific class. These models were constructed using our LRCNa architecture, as detailed in this paper. However, during the training process, we adapted the model to perform binary class classification. To illustrate, for the drive class, we divided the training data into two distinct datasets: one exclusively containing samples labeled as drive, and the other encompassing samples labeled as walk, bike, bus, train, collectively forming the non_drive class. We employ this binary class classification approach to establish a set of conditions C.