Rule-Based Error Detection and Correction to Operationalize Movement Trajectory Classification

Bowen Xi, Kevin Scaria, Paulo Shakarian
Abstract

Classification of movement trajectories has many applications in transportation. Supervised neural models represent the current state-of-the-art. Recent security applications require this task to be rapidly employed in environments that may differ from the data used to train such models for which there is little training data. We provide a neuro-symbolic rule-based framework to conduct error correction and detection of these models to support eventual deployment in security applications. We provide a suite of experiments on several recent and state-of-the-art models and show an accuracy improvement of 1.7% over the SOTA model in the case where all classes are present in training and when 40% of classes are omitted from training, we obtain a 5.2% improvement (zero-shot) and 23.9% (few-shot) improvement over the SOTA model without resorting to retraining of the base model.

1 Introduction

The identification of a mode of travel for a time-stamped sequence of global position system (GPS) known as “movement trajectories” has important applications in travel demand analysis (Huang et al. 2019), transport planning (Lin & Hsu 2014), and analysis of sea vessel movement (Fikioris et al. 2023). The current state-of-the-art has relied on supervised neural models (Kim et al. 2022). More recently this problem has been of interest for security applications such as leading to efforts such as the IARPA HAYSTAC program111https://www.iarpa.gov/research-programs/haystac. In this domain, models may be deployed in environments with different geography, transportation infrastructure, and socio-cultural dynamics than in the training data and expected to adapt to such environments with little or no labeled data specific to those circumstances. Further, such deployments may happen rapidly, precluding extensive data engineering or model retraining.

In this paper, we extend the current supervised neural methods with a lightweight error detection and correction rule (EDCR) framework providing an overall neurosymbolic system. The key intuition is that training and operation data can be used to learn rules that predict and correct errors in the supervised model. Once trained, the rules are employed operationally in two phases: first detection rules identify potentially misclassified movement trajectories. A second type of rule to re-classify the trajectories (“correction rules”) is then used to re-assign the sample to a new class. Our key contributions are as follows: (1.) We present a strong theoretical framework for EDCR rooted in logic and rule mining and formally prove how quantities related to learned rules (e.g., confidence and support) are related to changes in class-level machine learning metrics such as precision and recall. (2.) We conduct experiments where rules trained on the same data as the original model can improve machine learning metrics across various settings and model types, including the SOTA LRCN model. Specifically, the employment of EDCRs leads to a 1.7% improvement in accuracy over the original LRCN model when data leakage between training and testing is minimized (3.) By excluding 40% of the classes during the training process, we enhance 5.2% (zero-shot) and 23.9% improvement (few-shot) compared to the SOTA model. This progress is accomplished without necessitating any retraining of the underlying base model. (4.) In addition to offering domain knowledge akin to other papers, we furnish a neural network-incorporated condition, characterized by its overarching generality, thereby enhancing the versatility of EDCR for diverse problem domains. (5.) As a side result, we extend the LRCN SOTA model of (Kim et al. 2022) with attention mechanisms that establish a new SOTA baseline in certain cases without EDCR. This model is also improved with EDCR.

The rest of the paper is outlined as follows. In Section 2, we describe the movement trajectory classification problem (MTCP) and associated classification approaches, including our new “LRCN with attention” (LRCNa) model. Then we introduce our error detecting and correcting rule framework (Section 3) which formalizes our strategy for EDC and provides analytical results that support our algorithm development. This is followed by experimental results in Section 4 followed by a discussion on related work and future directions. Additional details supporting the reproducibility of both formal results (e.g., proofs) and experiments (e.g., data preprocessing and experimental details) along with code can be found in an online appendix available at https://github.com/lab-v2/Error-Detection-and-Correction.

2 Technical Preliminaries

In this section, we introduce MTCP, describe the vector embeddings used for a neural based classifier (Dabiri & Heaslip 2018; Kim et al. 2022) as well as the three neural architectures utilized CNN (Dabiri & Heaslip 2018), Long-term Recurrent Convolutional Network (LRCN) (Kim et al. 2022), and (newly introduced in this work) LRCN with attention (LRCNa).

Movement Trajectory Classification Problem. We define the MTCP problem as given a sequence of GPS points, ω𝜔\omegaitalic_ω, and assign a movement class from 𝒞𝒞\mathcal{C}caligraphic_C. The number of classes in 𝒞𝒞\mathcal{C}caligraphic_C is n𝑛nitalic_n. In this work, as per others (e.g., (Dabiri & Heaslip 2018; Kim et al. 2022) )we define 𝒞={walk,bike,bus,drive,train}𝒞walkbikebusdrivetrain\mathcal{C}=\{\textsf{walk},\textsf{bike},\textsf{bus},\textsf{drive},\textsf{% train}\}caligraphic_C = { walk , bike , bus , drive , train }, though we will typically not refer to specific classes outside of the description of the experiments for purposes of generalizability. The current paradigm for the MTCP problem is to create a neural model fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT that maps sequences to movement classes using a set of weights, θ𝜃\thetaitalic_θ. In this approach traditional methods (i.e., gradient descent) find a set of parameters such that a loss function is minimized based on some training set 𝒯𝒯\mathcal{T}caligraphic_T (where each sample ω𝒯𝜔𝒯\omega\in\mathcal{T}italic_ω ∈ caligraphic_T is associated with a ground truth class gt(ω)𝑔𝑡𝜔gt(\omega)italic_g italic_t ( italic_ω )). Formally: argminθ𝔼ω𝒯𝐿𝑜𝑠𝑠(fθ(ω),gt(ω))subscript𝜃subscript𝔼𝜔𝒯𝐿𝑜𝑠𝑠subscript𝑓𝜃𝜔𝑔𝑡𝜔\arg\min_{\theta}\mathbb{E}_{\omega\in\mathcal{T}}\mathit{Loss}(f_{\theta}(% \omega),gt(\omega))roman_arg roman_min start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT italic_ω ∈ caligraphic_T end_POSTSUBSCRIPT italic_Loss ( italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_ω ) , italic_g italic_t ( italic_ω ) ). We also note that with each sample ω𝜔\omegaitalic_ω, we will associate three predicates for each class i𝑖iitalic_i: predi𝑝𝑟𝑒subscript𝑑𝑖pred_{i}italic_p italic_r italic_e italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, corri𝑐𝑜𝑟subscript𝑟𝑖corr_{i}italic_c italic_o italic_r italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, and errori𝑒𝑟𝑟𝑜subscript𝑟𝑖error_{i}italic_e italic_r italic_r italic_o italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT that we will later use to describe a logic for reasoning about error correction.

  • predi𝑝𝑟𝑒subscript𝑑𝑖pred_{i}italic_p italic_r italic_e italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT: if the model predicted class i𝑖iitalic_i: predi(ω)𝑝𝑟𝑒subscript𝑑𝑖𝜔pred_{i}(\omega)italic_p italic_r italic_e italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_ω ) is true iff fθ(ω)=isubscript𝑓𝜃𝜔𝑖f_{\theta}(\omega)=iitalic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_ω ) = italic_i.

  • corri𝑐𝑜𝑟subscript𝑟𝑖corr_{i}italic_c italic_o italic_r italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT: the correct movement class for ω𝜔\omegaitalic_ω: corri(ω)𝑐𝑜𝑟subscript𝑟𝑖𝜔corr_{i}(\omega)italic_c italic_o italic_r italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_ω ) is true iff gt(ω)=i𝑔𝑡𝜔𝑖gt(\omega)=iitalic_g italic_t ( italic_ω ) = italic_i.

  • errori𝑒𝑟𝑟𝑜subscript𝑟𝑖error_{i}italic_e italic_r italic_r italic_o italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT if the model had an error: errori(ω)𝑒𝑟𝑟𝑜subscript𝑟𝑖𝜔error_{i}(\omega)italic_e italic_r italic_r italic_o italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_ω ) is true iff fθ(ω)gt(ω)subscript𝑓𝜃𝜔𝑔𝑡𝜔f_{\theta}(\omega)\neq gt(\omega)italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_ω ) ≠ italic_g italic_t ( italic_ω ). In other words: the model is wrong and predicted class i𝑖iitalic_i.

Vector Embedding. The current SOTA approaches that we examine for fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT rely on an embedding of a sequence ω𝜔\omegaitalic_ω that consists of a stack of vectors describing the velocity, acceleration, jerk(time rate of change of acceleration), and bearing rate. In this paper, we based these calculations on prior work (Kim et al. 2022; Dabiri & Heaslip 2018) and included details in the appendix.

CNN (Dabiri & Heaslip 2018). Utilizing a convolutional neural network (CNN) presents a viable solution for inferring mobility modes from GPS trajectories, as it can autonomously extract highly efficient features (Dabiri & Heaslip 2018). Here, the CNN incorporates a comprehensive set of layers, including the input layer, convolutional layers, pooling layers, fully-connected layers, and dropout layers.

LRCN (Kim et al. 2022). To further enhance the accuracy of extracting mobility modes from GPS trajectories, the application of a Long-term Recurrent Convolutional Network (LRCN) proves beneficial (Kim et al. 2022). The layers of the LRCN model follow a hierarchical structure with three components, proceeding from bottom to top: the convolutional layers, LSTM layers, and fully connected layers.

LRCN with Attention (new in this paper). Due to the notable performance improvement transformer architecture (Vaswani et al. 2017) has provided on related problems, we felt it would be important to include a transformer-based approach. Hence, we created a simple extension to LRCN that utilizes attention. We shall refer to this architecture as LRCNa. We provide an overview in Figure 1 in the appendix. LRCNa is a neural network architecture comprising several essential components, including convolutional layers employed for feature extraction purposes, LSTM layers, and an attention layer, which collaboratively contribute to sequence learning, and lastly, fully connected layers strategically utilized for effective classification tasks.

Refer to caption
Figure 1: The LRCNa architecture introduced in this paper.

3 Error Detection and Correction Rules

A key issue with the deployment of model fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT is that it may encounter sequences whose distribution differs from the data used to train the model. Further, in our target application, there may not be sufficient labeled data or time to properly retrain fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT. We also note that in some cases, fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT may be inaccessible for fine-tuning (e.g., behind an API). Additionally, understanding why the results of fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT change is also important for our envisioned security application. As such, we are employing a rule-based approach to correcting fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT. The intuition is that using limited data, we will learn a set of rules (denoted ΠΠ\Piroman_Π) that will be able to detect and correct errors of fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT by logical reasoning (Aditya et al. 2023). Then, upon deployment for some new sequence ω𝜔\omegaitalic_ω, we would first compute the class fθ(ω)subscript𝑓𝜃𝜔f_{\theta}(\omega)italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_ω ) and then use the rules in set ΠΠ\Piroman_Π to conclude if the result of fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT should be accepted and if not, provide an alternate class in an attempt to correct the mistake. In this section, we formalize the error correcting framework with a simple first order logic (FOL) and provide analytical results relating aspects of learned rules that inform our analytical approach to learning such error detecting and correcting rules. We complete the section with a discussion on how various potential “failure conditions” are extracted to create the rules to correct errors.

Throughout this section, we shall assume a set 𝒪𝒪\mathcal{O}caligraphic_O of operational sequences for which there is ground truth available after model training. The size of set 𝒪𝒪\mathcal{O}caligraphic_O is N𝑁Nitalic_N and generally, this is expected to be much smaller than 𝒯𝒯\mathcal{T}caligraphic_T (the set of training data). Later, in our experiments, we look at cases where 𝒪=𝒯𝒪𝒯\mathcal{O}=\mathcal{T}caligraphic_O = caligraphic_T and 𝒯𝒪𝒯𝒪\mathcal{T}\subseteq\mathcal{O}caligraphic_T ⊆ caligraphic_O - however these are not requirements as our results are based on model performance on 𝒪𝒪\mathcal{O}caligraphic_O - and we envision use-cases where 𝒪𝒪\mathcal{O}caligraphic_O is significantly different from 𝒯𝒯\mathcal{T}caligraphic_T. On these samples, for each class i𝑖iitalic_i, the model (fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT) returns class i𝑖iitalic_i for Nisubscript𝑁𝑖N_{i}italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT of the samples, and for each class i𝑖iitalic_i we have the number of true positives, false positives, true negatives, and false negatives TPi,FPi,TNi,FNi𝑇subscript𝑃𝑖𝐹subscript𝑃𝑖𝑇subscript𝑁𝑖𝐹subscript𝑁𝑖TP_{i},FP_{i},TN_{i},FN_{i}italic_T italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_F italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_T italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_F italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. We have precision Pi=TPi/Nisubscript𝑃𝑖𝑇subscript𝑃𝑖subscript𝑁𝑖P_{i}=TP_{i}/N_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_T italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT / italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, recall Ri=TPi/(TPi+FNi)subscript𝑅𝑖𝑇subscript𝑃𝑖𝑇subscript𝑃𝑖𝐹subscript𝑁𝑖R_{i}=TP_{i}/(TP_{i}+FN_{i})italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_T italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT / ( italic_T italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_F italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), and prior of predicting class i𝑖iitalic_i: 𝒫i=Ni/Nsubscript𝒫𝑖subscript𝑁𝑖𝑁\mathbf{\mathcal{P}}_{i}=N_{i}/Ncaligraphic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT / italic_N.

Language. We assume simple first order language where samples are represented by constant symbols, and we have unary predicates associated with each sample. This language includes a set C𝐶Citalic_C of m𝑚mitalic_m “condition” predicates cond1,,condm𝑐𝑜𝑛subscript𝑑1𝑐𝑜𝑛subscript𝑑𝑚cond_{1},\ldots,cond_{m}italic_c italic_o italic_n italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_c italic_o italic_n italic_d start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT associated with each sample that can be either true or false for a given sample. Additionally, the language includes the following:

  • “Correct” predicates corr1,,corri,,corrn𝑐𝑜𝑟subscript𝑟1𝑐𝑜𝑟subscript𝑟𝑖𝑐𝑜𝑟subscript𝑟𝑛corr_{1},\ldots,corr_{i},\ldots,corr_{n}italic_c italic_o italic_r italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_c italic_o italic_r italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , … , italic_c italic_o italic_r italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT which denotes the ground truth class for the sample (i.e., for a given sample one corri𝑐𝑜𝑟subscript𝑟𝑖corr_{i}italic_c italic_o italic_r italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT will be true and the rest false),

  • “Prediction” predicates pred1,,predi,,predn𝑝𝑟𝑒subscript𝑑1𝑝𝑟𝑒subscript𝑑𝑖𝑝𝑟𝑒subscript𝑑𝑛pred_{1},\ldots,pred_{i},\ldots,pred_{n}italic_p italic_r italic_e italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_p italic_r italic_e italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , … , italic_p italic_r italic_e italic_d start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT denotes the predicted class for the model (i.e., for a given sample one predi𝑝𝑟𝑒subscript𝑑𝑖pred_{i}italic_p italic_r italic_e italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT will be true and the rest false)

  • “Error” predicates error1,,errori,,errorn𝑒𝑟𝑟𝑜subscript𝑟1𝑒𝑟𝑟𝑜subscript𝑟𝑖𝑒𝑟𝑟𝑜subscript𝑟𝑛error_{1},\ldots,error_{i},\ldots,error_{n}italic_e italic_r italic_r italic_o italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_e italic_r italic_r italic_o italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , … , italic_e italic_r italic_r italic_o italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT if the sample is incorrect for class i𝑖iitalic_i. Note that errori𝑒𝑟𝑟𝑜subscript𝑟𝑖error_{i}italic_e italic_r italic_r italic_o italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is true iff both corri𝑐𝑜𝑟subscript𝑟𝑖corr_{i}italic_c italic_o italic_r italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is true and predi𝑝𝑟𝑒subscript𝑑𝑖pred_{i}italic_p italic_r italic_e italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is false

Rules The set of rules ΠΠ\Piroman_Π will consist of two rules for each class: one “error detecting” and one “error correcting.” Error detecting rules will determine if a prediction by fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT is invalid. In essence, we can think of such a rule as changing the movement class assigned by fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT to some sample ω𝜔\omegaitalic_ω from i𝑖iitalic_i to “unknown.” For a given class i𝑖iitalic_i, we will have an associated set of detection conditions DCi𝐷subscript𝐶𝑖DC_{i}italic_D italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT that is a subset of conditions, the disjunction of which is used to determine if fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT gave an incorrect classification.

errori(ω)predi(ω)jDCicondj(ω)𝑒𝑟𝑟𝑜subscript𝑟𝑖𝜔𝑝𝑟𝑒subscript𝑑𝑖𝜔subscript𝑗𝐷subscript𝐶𝑖𝑐𝑜𝑛subscript𝑑𝑗𝜔\displaystyle error_{i}(\omega)\leftarrow pred_{i}(\omega)\wedge\bigvee_{j\in DC% _{i}}cond_{j}(\omega)italic_e italic_r italic_r italic_o italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_ω ) ← italic_p italic_r italic_e italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_ω ) ∧ ⋁ start_POSTSUBSCRIPT italic_j ∈ italic_D italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_c italic_o italic_n italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_ω ) (1)

After the application of the error detection rules for each class, we may consider re-assigning the samples to another class using a second type of rule called the “corrective rule.” Such rules are formed based on a subset of conditions-class pairs CCiC×𝒞𝐶subscript𝐶𝑖𝐶𝒞CC_{i}\subseteq C\times\mathcal{C}italic_C italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊆ italic_C × caligraphic_C.

corri(ω)q,rCCi(condq(ω)predr(ω))𝑐𝑜𝑟subscript𝑟𝑖𝜔subscript𝑞𝑟𝐶subscript𝐶𝑖𝑐𝑜𝑛subscript𝑑𝑞𝜔𝑝𝑟𝑒subscript𝑑𝑟𝜔\displaystyle corr_{i}(\omega)\leftarrow\bigvee_{q,r\in CC_{i}}\left(cond_{q}(% \omega)\wedge pred_{r}(\omega)\right)italic_c italic_o italic_r italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_ω ) ← ⋁ start_POSTSUBSCRIPT italic_q , italic_r ∈ italic_C italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_c italic_o italic_n italic_d start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( italic_ω ) ∧ italic_p italic_r italic_e italic_d start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_ω ) ) (2)

Associated with the rules of both types are the following values - both are defined as zero if there are no conditions.

Support (s𝑠sitalic_s): fraction of samples in 𝒪𝒪\mathcal{O}caligraphic_O where the body is true.

Support w.r.t. class i𝑖iitalic_i (sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT): given the subset of samples where the model predicts class i𝑖iitalic_i, the fraction of those samples where the body is true (note the denominator is Nisubscript𝑁𝑖N_{i}italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT).

Confidence (c𝑐citalic_c): the number of times the body and head are true together divided by the number of times the body is true.

Now we present some analytical results that inform our learning algorithms. Our strategy for learning involves first learning detection rules (which establish conditions for which a given classification decision by fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT is deemed incorrect) and then learning correction rules (which then correct the detected errors by assigning a new movement class to the sample). We formalize these two tasks as follows.

Improvement by error detecting rule. For a given class i𝑖iitalic_i, find a set of conditions DCi𝐷subscript𝐶𝑖DC_{i}italic_D italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT such that precision is maximized and recall decreases by, at most ϵitalic-ϵ\epsilonitalic_ϵ.

Improvement by error correcting rule. For a given class i𝑖iitalic_i, find a subset CCi𝐶subscript𝐶𝑖CC_{i}italic_C italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT of C×𝒞𝐶𝒞C\times\mathcal{C}italic_C × caligraphic_C such that either precision or recall is maximized.

Properties of Detection Rules. First, we examine the effect on precision and recall when an error detecting rule is used. Our first result shows a bound on precision improvement. If class support (sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT) is less than 1Pi1subscript𝑃𝑖1-P_{i}1 - italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, which we would expect (as the rule would be designed to detect the 1Pi1subscript𝑃𝑖1-P_{i}1 - italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT portion of results that failed), then we can also show that the quantity csi𝑐subscript𝑠𝑖c\cdot s_{i}italic_c ⋅ italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT gives us a lower bound on the improvement in precision. In the appendix, we also note that precision will always increase under a reasonable condition (specifically when c1Pi𝑐1subscript𝑃𝑖c\geq 1-P_{i}italic_c ≥ 1 - italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT). The proof of this and all other formally stated results can be found in the appendix.

Theorem 1.

Under the condition si1Pisubscript𝑠𝑖1subscript𝑃𝑖s_{i}\leq 1-P_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ 1 - italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, the precision of model fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT for class i𝑖iitalic_i, with initial precision Pisubscript𝑃𝑖P_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, after applying an error detecting rule with support sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and confidence c𝑐citalic_c increases by a function of sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and c𝑐citalic_c and is greater than or equal to csi𝑐subscript𝑠𝑖c\cdot s_{i}italic_c ⋅ italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

The error detecting rules can cause the recall to stay the same or decrease. Our next result tells us precisely how much recall will decrease.

Theorem 2.

After applying the rule to correct errors, the recall will decrease by (1c)siRiPi1𝑐subscript𝑠𝑖subscript𝑅𝑖subscript𝑃𝑖(1-c)s_{i}\frac{R_{i}}{P_{i}}( 1 - italic_c ) italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT divide start_ARG italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG.

Algorithm 1 DetRuleLearn
Class i𝑖iitalic_i, Recall reduction threshold ϵitalic-ϵ\epsilonitalic_ϵ, Condition set C𝐶Citalic_C
Subset of conditions DCi𝐷subscript𝐶𝑖DC_{i}italic_D italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
DCi:=assign𝐷subscript𝐶𝑖DC_{i}:=\emptysetitalic_D italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := ∅
DC:={cC s.t. NEG{c}ϵNiPiRi}assign𝐷superscript𝐶𝑐𝐶 s.t. 𝑁𝐸subscript𝐺𝑐italic-ϵsubscript𝑁𝑖subscript𝑃𝑖subscript𝑅𝑖DC^{*}:=\{c\in C\textit{ s.t. }NEG_{\{c\}}\leq\epsilon\cdot\frac{N_{i}P_{i}}{R% _{i}}\}italic_D italic_C start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT := { italic_c ∈ italic_C s.t. italic_N italic_E italic_G start_POSTSUBSCRIPT { italic_c } end_POSTSUBSCRIPT ≤ italic_ϵ ⋅ divide start_ARG italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG }
while DC𝐷superscript𝐶DC^{*}\neq\emptysetitalic_D italic_C start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ≠ ∅ do
     cbest=argmaxcDCPOSDCi{c}subscript𝑐𝑏𝑒𝑠𝑡subscript𝑐𝐷superscript𝐶𝑃𝑂subscript𝑆𝐷subscript𝐶𝑖𝑐c_{best}=\arg\max_{c\in DC^{*}}POS_{DC_{i}\cup\{c\}}italic_c start_POSTSUBSCRIPT italic_b italic_e italic_s italic_t end_POSTSUBSCRIPT = roman_arg roman_max start_POSTSUBSCRIPT italic_c ∈ italic_D italic_C start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_P italic_O italic_S start_POSTSUBSCRIPT italic_D italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∪ { italic_c } end_POSTSUBSCRIPT
     Add cbestsubscript𝑐𝑏𝑒𝑠𝑡c_{best}italic_c start_POSTSUBSCRIPT italic_b italic_e italic_s italic_t end_POSTSUBSCRIPT to DCi𝐷subscript𝐶𝑖DC_{i}italic_D italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
     DC:={cCDCi s.t. NEGDCi{c}ϵNiPiRi}assign𝐷superscript𝐶𝑐𝐶𝐷subscript𝐶𝑖 s.t. 𝑁𝐸subscript𝐺𝐷subscript𝐶𝑖𝑐italic-ϵsubscript𝑁𝑖subscript𝑃𝑖subscript𝑅𝑖DC^{*}:=\{c\in C\setminus DC_{i}\textit{ s.t. }NEG_{DC_{i}\cup\{c\}}\leq% \epsilon\cdot\frac{N_{i}P_{i}}{R_{i}}\}italic_D italic_C start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT := { italic_c ∈ italic_C ∖ italic_D italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT s.t. italic_N italic_E italic_G start_POSTSUBSCRIPT italic_D italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∪ { italic_c } end_POSTSUBSCRIPT ≤ italic_ϵ ⋅ divide start_ARG italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG }
end while
return DCi𝐷subscript𝐶𝑖DC_{i}italic_D italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT

It turns out that both quantities identified in the theorem 1 and theorem 2 are submodular and monotonic - a property we can use algorithmically (formal statements and proofs are included in the appendix). Specifically, we can see that the selection of a set of rules to maximize csi𝑐subscript𝑠𝑖c\cdot s_{i}italic_c ⋅ italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT subject to the constraint that (1c)siRiPiϵ1𝑐subscript𝑠𝑖subscript𝑅𝑖subscript𝑃𝑖italic-ϵ(1-c)s_{i}\frac{R_{i}}{P_{i}}\leq\epsilon( 1 - italic_c ) italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT divide start_ARG italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ≤ italic_ϵ is a special case of the “Submodular Cost Submodular Knapsack” (SCSK) problem and can be approximated with a simple greedy algorithm (Iyer & Bilmes 2013) with approximation guarantee with polynomial run time (Theorem 4.7 of (Iyer & Bilmes 2013)). Our algorithm DetRuleLearn is an instantiation of such an approach to creating an error detecting rule for a given class. As this algorithm will only select conditions for error detecting rules for a given movement class i𝑖iitalic_i that ensure that recall does not decrease more than epsilon, we can be assured it meets our requirement for recall. Here POSDC,NEGDC𝑃𝑂subscript𝑆𝐷𝐶𝑁𝐸subscript𝐺𝐷𝐶POS_{DC},NEG_{DC}italic_P italic_O italic_S start_POSTSUBSCRIPT italic_D italic_C end_POSTSUBSCRIPT , italic_N italic_E italic_G start_POSTSUBSCRIPT italic_D italic_C end_POSTSUBSCRIPT are simply the number of samples that satisfy the conditions for some set DC𝐷𝐶DCitalic_D italic_C as well as satisfy errori(ω)𝑒𝑟𝑟𝑜subscript𝑟𝑖𝜔error_{i}(\omega)italic_e italic_r italic_r italic_o italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_ω ) (for POSDC𝑃𝑂subscript𝑆𝐷𝐶POS_{DC}italic_P italic_O italic_S start_POSTSUBSCRIPT italic_D italic_C end_POSTSUBSCRIPT) and corri(ω)predi(ω)𝑐𝑜𝑟subscript𝑟𝑖𝜔𝑝𝑟𝑒subscript𝑑𝑖𝜔corr_{i}(\omega)\wedge pred_{i}(\omega)italic_c italic_o italic_r italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_ω ) ∧ italic_p italic_r italic_e italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_ω ) (for NEGDC𝑁𝐸subscript𝐺𝐷𝐶NEG_{DC}italic_N italic_E italic_G start_POSTSUBSCRIPT italic_D italic_C end_POSTSUBSCRIPT) respectively. In other words, given a set of condition class pairs and the rule of interest, BOD here is the number of examples that satisfy the body (class-condition pair) of the error detection rules, and POS here is the number of examples that satisfy the body (class-condition pair) and the head of the error detection rules. Pi,Risubscript𝑃𝑖subscript𝑅𝑖P_{i},R_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are precision and recall for class i𝑖iitalic_i while Nisubscript𝑁𝑖N_{i}italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the number of samples that the model classifies as class i𝑖iitalic_i.

Properties of Corrective Rules. In what follows, we shall examine the results for corrective rules. Here, the error correcting rule with predicate corrj𝑐𝑜𝑟subscript𝑟𝑗corr_{j}italic_c italic_o italic_r italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT in the head will have a disjunction of elements of set CCiC×𝒞𝐶subscript𝐶𝑖𝐶𝒞CC_{i}\subseteq C\times\mathcal{C}italic_C italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊆ italic_C × caligraphic_C. Also, note that here the support s𝑠sitalic_s is used instead of class support (sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT). Here we find that both precision and recall increase with rule confidence (Theorem 3). We also show a corollary that ensures that recall is always non-decreasing for corrective rules and that precision increases when the rule confidence exceeds Pisubscript𝑃𝑖P_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

Theorem 3.

For the application of error correcting rules, both precision and recall increase if and only if rule confidence (c𝑐citalic_c) increases.

It is clear that confidence is the right quantity to optimize for error correcting rules as it will get both precision and recall. With these results in mind, we can optimize both precision and recall using an error correcting rule (with respect to the class specified in the rule head) but optimizing for confidence. Note that this does not consider the precision and recall for the class specified in the rule body (however, we shall assume that the impact on precision and recall for the class in the body was handled with the application of the initial error detection rules). However, it is noteworthy that confidence is not monotonic as we add conditions to set CCi𝐶subscript𝐶𝑖CC_{i}italic_C italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT as the precision can decrease. We will consider an initial set of condition-class pairs CCall𝐶subscript𝐶𝑎𝑙𝑙CC_{all}italic_C italic_C start_POSTSUBSCRIPT italic_a italic_l italic_l end_POSTSUBSCRIPT that is a subset of C×𝒞𝐶𝒞C\times\mathcal{C}italic_C × caligraphic_C. For a given class for which we create an error correcting rule, we select CCi𝐶subscript𝐶𝑖CC_{i}italic_C italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT from this larger set. To do so, we adapt the simple “Deterministic USM” algorithm of (Buchbinder et al. 2012) that we call 2. Note here that POSCC𝑃𝑂subscript𝑆𝐶𝐶POS_{CC}italic_P italic_O italic_S start_POSTSUBSCRIPT italic_C italic_C end_POSTSUBSCRIPT is the number of samples that satisfy the rule body and head (corri(ω)𝑐𝑜𝑟subscript𝑟𝑖𝜔corr_{i}(\omega)italic_c italic_o italic_r italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_ω ) in this case) given a set of condition-class pairs CC𝐶𝐶CCitalic_C italic_C while BODCC𝐵𝑂subscript𝐷𝐶𝐶BOD_{CC}italic_B italic_O italic_D start_POSTSUBSCRIPT italic_C italic_C end_POSTSUBSCRIPT is the number of samples that satisfy the body formed with set CC𝐶𝐶CCitalic_C italic_C.

Algorithm 2 CorrRuleLearn
Class i𝑖iitalic_i, Set of condition-class pairs CCall𝐶subscript𝐶𝑎𝑙𝑙CC_{all}italic_C italic_C start_POSTSUBSCRIPT italic_a italic_l italic_l end_POSTSUBSCRIPT
Subset of condition-class pairs CCi𝐶subscript𝐶𝑖CC_{i}italic_C italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
CCi:=assign𝐶subscript𝐶𝑖CC_{i}:=\emptysetitalic_C italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := ∅
CCi:=CCallassign𝐶superscriptsubscript𝐶𝑖𝐶subscript𝐶𝑎𝑙𝑙CC_{i}^{\prime}:=CC_{all}italic_C italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT := italic_C italic_C start_POSTSUBSCRIPT italic_a italic_l italic_l end_POSTSUBSCRIPT
Sort each (c,j)CCall𝑐𝑗𝐶subscript𝐶𝑎𝑙𝑙(c,j)\in CC_{all}( italic_c , italic_j ) ∈ italic_C italic_C start_POSTSUBSCRIPT italic_a italic_l italic_l end_POSTSUBSCRIPT from greatest to least by POS{(c,j)}BOD{(c,j)}𝑃𝑂subscript𝑆𝑐𝑗𝐵𝑂subscript𝐷𝑐𝑗\frac{POS_{\{(c,j)\}}}{BOD_{\{(c,j)\}}}divide start_ARG italic_P italic_O italic_S start_POSTSUBSCRIPT { ( italic_c , italic_j ) } end_POSTSUBSCRIPT end_ARG start_ARG italic_B italic_O italic_D start_POSTSUBSCRIPT { ( italic_c , italic_j ) } end_POSTSUBSCRIPT end_ARG and remove POS{(c,j)}BOD{(c,j)}Pi𝑃𝑂subscript𝑆𝑐𝑗𝐵𝑂subscript𝐷𝑐𝑗𝑃𝑖\frac{POS_{\{(c,j)\}}}{BOD_{\{(c,j)\}}}\leq Pidivide start_ARG italic_P italic_O italic_S start_POSTSUBSCRIPT { ( italic_c , italic_j ) } end_POSTSUBSCRIPT end_ARG start_ARG italic_B italic_O italic_D start_POSTSUBSCRIPT { ( italic_c , italic_j ) } end_POSTSUBSCRIPT end_ARG ≤ italic_P italic_i
for (c,j)CCall𝑐𝑗𝐶subscript𝐶𝑎𝑙𝑙(c,j)\in CC_{all}( italic_c , italic_j ) ∈ italic_C italic_C start_POSTSUBSCRIPT italic_a italic_l italic_l end_POSTSUBSCRIPT selected in order of the sorted list do
     a:=POSCCi{(c,j)}BODCCi{(c,j)}POSCCiBODCCiassign𝑎𝑃𝑂subscript𝑆𝐶subscript𝐶𝑖𝑐𝑗𝐵𝑂subscript𝐷𝐶subscript𝐶𝑖𝑐𝑗𝑃𝑂subscript𝑆𝐶subscript𝐶𝑖𝐵𝑂subscript𝐷𝐶subscript𝐶𝑖a:=\frac{POS_{CC_{i}\cup\{(c,j)\}}}{BOD_{CC_{i}\cup\{(c,j)\}}}-\frac{POS_{CC_{% i}}}{BOD_{CC_{i}}}italic_a := divide start_ARG italic_P italic_O italic_S start_POSTSUBSCRIPT italic_C italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∪ { ( italic_c , italic_j ) } end_POSTSUBSCRIPT end_ARG start_ARG italic_B italic_O italic_D start_POSTSUBSCRIPT italic_C italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∪ { ( italic_c , italic_j ) } end_POSTSUBSCRIPT end_ARG - divide start_ARG italic_P italic_O italic_S start_POSTSUBSCRIPT italic_C italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG start_ARG italic_B italic_O italic_D start_POSTSUBSCRIPT italic_C italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG
     b:=POSCCi{(c,j)}BODCCi{(c,j)}POSCCiBODCCiassign𝑏𝑃𝑂subscript𝑆𝐶superscriptsubscript𝐶𝑖𝑐𝑗𝐵𝑂subscript𝐷𝐶superscriptsubscript𝐶𝑖𝑐𝑗𝑃𝑂subscript𝑆𝐶superscriptsubscript𝐶𝑖𝐵𝑂subscript𝐷𝐶superscriptsubscript𝐶𝑖b:=\frac{POS_{CC_{i}^{\prime}\setminus\{(c,j)\}}}{BOD_{CC_{i}^{\prime}% \setminus\{(c,j)\}}}-\frac{POS_{CC_{i}^{\prime}}}{BOD_{CC_{i}^{\prime}}}italic_b := divide start_ARG italic_P italic_O italic_S start_POSTSUBSCRIPT italic_C italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∖ { ( italic_c , italic_j ) } end_POSTSUBSCRIPT end_ARG start_ARG italic_B italic_O italic_D start_POSTSUBSCRIPT italic_C italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∖ { ( italic_c , italic_j ) } end_POSTSUBSCRIPT end_ARG - divide start_ARG italic_P italic_O italic_S start_POSTSUBSCRIPT italic_C italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG start_ARG italic_B italic_O italic_D start_POSTSUBSCRIPT italic_C italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG
     if ab𝑎𝑏a\geq bitalic_a ≥ italic_b then
         CCi:=CCi{(c,j)}assign𝐶subscript𝐶𝑖𝐶subscript𝐶𝑖𝑐𝑗CC_{i}:=CC_{i}\cup\{(c,j)\}italic_C italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := italic_C italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∪ { ( italic_c , italic_j ) }
     else
         CCi:=CCi{(c,j)}assign𝐶superscriptsubscript𝐶𝑖𝐶superscriptsubscript𝐶𝑖𝑐𝑗CC_{i}^{\prime}:=CC_{i}^{\prime}\setminus\{(c,j)\}italic_C italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT := italic_C italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∖ { ( italic_c , italic_j ) }
     end if
end for
if POSCCiBODCCiPi𝑃𝑂subscript𝑆𝐶subscript𝐶𝑖𝐵𝑂subscript𝐷𝐶subscript𝐶𝑖subscript𝑃𝑖\frac{POS_{CC_{i}}}{BOD_{CC_{i}}}\leq P_{i}divide start_ARG italic_P italic_O italic_S start_POSTSUBSCRIPT italic_C italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG start_ARG italic_B italic_O italic_D start_POSTSUBSCRIPT italic_C italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG ≤ italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT then
     CCi:=assign𝐶subscript𝐶𝑖CC_{i}:=\emptysetitalic_C italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := ∅
end if
return CCi𝐶subscript𝐶𝑖CC_{i}italic_C italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT

Learning Detection and Correction Rules Together. Error correcting rules created using CorrRuleLearn will provide optimal improvement to precision and recall for the rule in the target class, but in the case of multi-class problems, it will cause recall to drop for some other classes. However, we can combine both error detecting and correcting rules to overcome this difficulty. The intuition is first to create error detecting rules for each class, which effectively re-assigns any sample into an “unknown” class. Then, we create a set CCall𝐶subscript𝐶𝑎𝑙𝑙CC_{all}italic_C italic_C start_POSTSUBSCRIPT italic_a italic_l italic_l end_POSTSUBSCRIPT (used as input for CorrRuleLearn) based on the conditions selected by the error detecting rules. In this way, we will not decrease recall beyond what occurs in the application of error detecting rules.

Algorithm 3 DetCorrRuleLearn
Recall reduction threshold ϵitalic-ϵ\epsilonitalic_ϵ, Condition set C𝐶Citalic_C
Set of rules ΠΠ\Piroman_Π
Π:=assignΠ\Pi:=\emptysetroman_Π := ∅
CCall:=assign𝐶subscript𝐶𝑎𝑙𝑙CC_{all}:=\emptysetitalic_C italic_C start_POSTSUBSCRIPT italic_a italic_l italic_l end_POSTSUBSCRIPT := ∅
for Each class i𝑖iitalic_i do
     DCi:=DetRuleLearn(i,ϵ,C)assign𝐷subscript𝐶𝑖DetRuleLearn𝑖italic-ϵ𝐶DC_{i}:=\textsf{DetRuleLearn}(i,\epsilon,C)italic_D italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := DetRuleLearn ( italic_i , italic_ϵ , italic_C )
     if DCi𝐷subscript𝐶𝑖DC_{i}\neq\emptysetitalic_D italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≠ ∅ then
         Π:=ΠassignΠlimit-fromΠ\Pi:=\Pi\cuproman_Π := roman_Π ∪
          {errori(ω)predi(ω)jDCicondj(ω)}𝑒𝑟𝑟𝑜subscript𝑟𝑖𝜔𝑝𝑟𝑒subscript𝑑𝑖𝜔subscript𝑗𝐷subscript𝐶𝑖𝑐𝑜𝑛subscript𝑑𝑗𝜔\{error_{i}(\omega)\leftarrow pred_{i}(\omega)\wedge\bigvee_{j\in DC_{i}}cond_% {j}(\omega)\}{ italic_e italic_r italic_r italic_o italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_ω ) ← italic_p italic_r italic_e italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_ω ) ∧ ⋁ start_POSTSUBSCRIPT italic_j ∈ italic_D italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_c italic_o italic_n italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_ω ) }
     end if
     for condDCi𝑐𝑜𝑛𝑑𝐷subscript𝐶𝑖cond\in DC_{i}italic_c italic_o italic_n italic_d ∈ italic_D italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT do
         CCall:=CCall{(cond,i)}assign𝐶subscript𝐶𝑎𝑙𝑙𝐶subscript𝐶𝑎𝑙𝑙𝑐𝑜𝑛𝑑𝑖CC_{all}:=CC_{all}\cup\{(cond,i)\}italic_C italic_C start_POSTSUBSCRIPT italic_a italic_l italic_l end_POSTSUBSCRIPT := italic_C italic_C start_POSTSUBSCRIPT italic_a italic_l italic_l end_POSTSUBSCRIPT ∪ { ( italic_c italic_o italic_n italic_d , italic_i ) }
     end for
end for
for Each class i𝑖iitalic_i do
     CCi:=CorrRuleLearn(i,CCall)assign𝐶subscript𝐶𝑖CorrRuleLearn𝑖𝐶subscript𝐶𝑎𝑙𝑙CC_{i}:=\textsf{CorrRuleLearn}(i,CC_{all})italic_C italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := CorrRuleLearn ( italic_i , italic_C italic_C start_POSTSUBSCRIPT italic_a italic_l italic_l end_POSTSUBSCRIPT )
     if CCi𝐶subscript𝐶𝑖CC_{i}\neq\emptysetitalic_C italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≠ ∅ then
         Π:=ΠassignΠlimit-fromΠ\Pi:=\Pi\cuproman_Π := roman_Π ∪
          {corri(ω)q,rCCi(condq(ω)predr(ω))}𝑐𝑜𝑟subscript𝑟𝑖𝜔subscript𝑞𝑟𝐶subscript𝐶𝑖𝑐𝑜𝑛subscript𝑑𝑞𝜔𝑝𝑟𝑒subscript𝑑𝑟𝜔\{corr_{i}(\omega)\leftarrow\bigvee_{q,r\in CC_{i}}\left(cond_{q}(\omega)% \wedge pred_{r}(\omega)\right)\}{ italic_c italic_o italic_r italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_ω ) ← ⋁ start_POSTSUBSCRIPT italic_q , italic_r ∈ italic_C italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_c italic_o italic_n italic_d start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( italic_ω ) ∧ italic_p italic_r italic_e italic_d start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_ω ) ) }
     end if
end for
return ΠΠ\Piroman_Π

Conditions for Error Detection and Correction

In this section, we describe the methods we used to create conditions (set C𝐶Citalic_C) from dataset 𝒪𝒪\mathcal{O}caligraphic_O. As mentioned in section LABEL:Introduction, in addition to offering domain-specific knowledge, our contribution extends to the provision of a condition integrated with a neural network, referred to as the model based in our paper. This condition, marked by its comprehensive generality, serves to amplify the adaptability of the EDCR across a spectrum of diverse problem domains.

Model Based The field of Deep Learning witnesses a continuous influx of new and improved models for solving complex problems. The prevailing trend involves the adoption of the latest and supposedly superior models, often leading to the abandonment of previously successful ones. We present a method that challenges this paradigm, proposing a technique to harness the potential of older, proven models to augment the performance of the latest and most advanced models. We employ a collection of diverse pre-existing neural models as a set of conditions to enhance the efficacy of the current model. More specifically, a more coarse-grain model can also provide insight into the conditions. As such, we utilized a binary classifier for each class for a given sample. Hence, given class i𝑖iitalic_i, we have a binary classifier gisubscript𝑔𝑖g_{i}italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT which returns “true” for sample ω𝜔\omegaitalic_ω if gisubscript𝑔𝑖g_{i}italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT assigns it as i𝑖iitalic_i and “false” otherwise. In this way, for each sample ω𝜔\omegaitalic_ω we have a gi(ω)subscript𝑔𝑖𝜔g_{i}(\omega)italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_ω ) condition for each of the classes. We used the LRCNa architecture for the binary classifier and the details are in the appendix.

Domain Knowledge Harnessing domain expertise in outlier analysis can yield valuable insights and conditions. Specifically, our attention was drawn to the maximum velocity records within our dataset. Consequently, for each class denoted as i𝑖iitalic_i, we formulated a set of conditions encapsulated by sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, each of which is linked to the maximum velocity criterion. So, for a given sample ω𝜔\omegaitalic_ω, si(ω)subscript𝑠𝑖𝜔s_{i}(\omega)italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_ω ) is true if the velocity for ω𝜔\omegaitalic_ω is greater than the maximum velocity observed in set 𝒪𝒪\mathcal{O}caligraphic_O and false otherwise.

4 Experimental Evaluation

GeoLife Dataset. The proposed methodology is validated and assessed using GPS trajectories obtained from the GeoLife project, which involved data collected from 69 users (Zheng et al. 2008). Details on the preprocessing of the data can be found in the appendix.

No Overlap Segment Overlap Data point Overlap
Random Sequential Random Sequential Random Sequential
(least leakage) (prev. studies)
LRCNa (ours) 0.747 0.751 0.971 0.758 0.921 0.760
LRCNa+EDCR (ours) 0.759 (+1.6%) 0.763 (+1.6%) 0.971 (±plus-or-minus\pm± 0%) 0.769 (+1.5%) 0.921 (±plus-or-minus\pm± 0%) 0.780 (+2.6%)
LRCN (prev. SOTA) 0.749 0.747 0.952 0.767 0.887 0.774
LRCN+EDCR (ours) 0.761 (+1.6%) 0.760 (+1.7%) 0.952 (±plus-or-minus\pm± 0%) 0.768 (+0.1%) 0.889 (+0.2%) 0.783 (+1.1%)
CNN 0.742 0.755 0.851 0.763 0.853 0.779
CNN+EDCR (ours) 0.743 (+0.1%) 0.755 (±plus-or-minus\pm± 0%) 0.866 (+1.8%) 0.763 (±plus-or-minus\pm± 0%) 0.862 (+1.0%) 0.779 (±plus-or-minus\pm± 0%)
Table 1: Accuracy when all classes are represented in training and test sets under various data leakage cases. EDCR means “error detecting and correcting rules” were used on the model output and numbers in parens show the percent change in accuracy from EDCR over the base model. Bold numbers are the best in each case.
No Overlap Segment Overlap Data point Overlap
Random Sequential Random Sequential Random Sequential
(least leakage) (prev. studies)
LRCNa (ours) 0.727 0.734 0.971 0.742 0.906 0.715
LRCNa+EDCR (ours) 0.742 (+2.06%) 0.751 (+2.32%) 0.971 (±plus-or-minus\pm± 0%) 0.757 (+2.02%) 0.906 (±plus-or-minus\pm± 0%) 0.749 (+4.76%)
LRCN (prev. SOTA) 0.732 0.738 0.951 0.751 0.864 0.737
LRCN+EDCR (ours) 0.75 (+2.46%) 0.741 (+0.41%) 0.951 (±plus-or-minus\pm± 0%) 0.76 (+1.2%) 0.864 (+0%) 0.755 (+2.44%)
CNN 0.722 0.737 0.846 0.745 0.826 0.748
CNN+EDCR (ours) 0.723 (+0.14%) 0.737 (±plus-or-minus\pm± 0%) 0.866 (+2.36%) 0.745 (±plus-or-minus\pm± 0%) 0.83 (+0.48%) 0.748 (±plus-or-minus\pm± 0%)
Table 2: macro F1 when all classes are represented in training and test sets under various data leakage cases. EDCR means “error detecting and correcting rules” were used on the model output and numbers in parens show the percent change in macro F1 from EDCR over the base model. Bold numbers are the best in each case.

Training and Test Splits. Previous work such as (Kim et al. 2022) is known to have data leakage based on the split between training and test primarily due to segments of a movement sequence existing in both training and test sets resulting from ransom assignment to each. To address this data leakage issue, we examine our algorithms under various conditions based on ordering and overlap. For ordering, we examine random (which can allow previous behavior of the same agent in the training set, as in previous work) and sequential (which orders the agents to avoid this issue). For overlap, we examine no overlap between the training and test sets, segment overlap that allows training and test samples to overlap each other(as in previous work), and data point overlap (that allows for data points of a trajectory to span both training and test).

Compute and Implementation. All experiments were performed on a 2000 MHz AMD EPYC 7713 CPU, and a NVIDIA GA100 GPU using Python 3.10 with PyTorch.

All Classes Observed. In our first set of experiments, we examined how error detecting and correcting rules (EDCR) can affect the performance of the underlying model. In Table 1 we examine the accuracy of each model, both with and without EDCR. Models enabled with EDCR performed the same or better with improvement most noticeable when samples are sequential (which has less data leakage between training and test). In terms of overall performance, LRCNa with EDCR performed the best in five of six cases with LRCN with EDCR performing the best in the sixth. Of particular importance, in the “no overlap - sequential” case - the least likely to exhibit data leakage - EDCR improves the performance of both LRCNa and LRCN, 1.6% and 1.7% respectively. Additionally, we scrutinized the F1 scores in Table 2 for all models, both with and without EDCR, revealing more improvement in performance metrics compared to accuracy.

Hyperparameter Sensitivity. In the “all classes observed” set of experiments, we also examined hyperparameter sensitivity for ϵitalic-ϵ\epsilonitalic_ϵ. Recall that ϵitalic-ϵ\epsilonitalic_ϵ is interpreted as the maximum decrease in recall. We observed and validated the theoretical reduction(TR) in recall empirically and the experiments show us that in all cases, recall was no lower than the threshold specified by the hyperparameter ϵitalic-ϵ\epsilonitalic_ϵ though recall decreases as ϵitalic-ϵ\epsilonitalic_ϵ increases. In many cases, the experimental evaluation reduced recall significantly less than expected. In Figure 2, as the value of ϵitalic-ϵ\epsilonitalic_ϵ (x-axis) ranges from 0 to 0.10, it is evident that the decline in recall for all classes remains within the confines of 0.10. Likewise, precision only increases with ϵitalic-ϵ\epsilonitalic_ϵ, which is aligned with our theoretical results. We show precision, recall, and F1 by class for the “no overlap - sequential” of LRCNa in Figure 2. Though the algorithm DetCorrRuleLearn calls for a single ϵitalic-ϵ\epsilonitalic_ϵ hyperparameter, it is possible to set it differently for each class (e.g., lower values for classes where recall is important, higher values for classes where false positives are expensive). This may be beneficial as F1 for different classes seemed to peak for different values of ϵitalic-ϵ\epsilonitalic_ϵ. We leave the study of heterogeneous ϵitalic-ϵ\epsilonitalic_ϵ settings to future work.

Refer to caption
Figure 2: LRCNa Results for application of error detection and correction rules as a function of ϵitalic-ϵ\epsilonitalic_ϵ (no overlaps with sequential selection). TR in Recall is the theoretical reduction in recall based on analytic results.

Removal of Movement Classes from Training. Our experimental focus was on assessing how the introduction of EDCR impacts model performance in scenarios where certain movement classes are excluded from training. In Figure 3, we trained the CNN, LRCN, and LRCNa models without incorporating the walk and drive classes. Remarkably, employing EDCR without any supplementary data yielded a 5.2%(zero-shot) improvement over the base models, and a 23.9% (few-shot) improvement over the SOTA model without resorting to retraining of the base model, with even more pronounced results than in the initial experiment set. Utilizing a mere 30% of data from previously unseen classes, EDCR demonstrates a 21.3% to elevate the performance of the baseline model, all achieved without the need for direct access to the model itself. This outcome implies the potential for conducting few-shot learning, enabling the adaptation of fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT to novel scenarios with impressive efficacy. This enhancement significantly boosts accuracy using limited data for unseen samples, without extensive model modifications. This is crucial when direct model access is limited, for example through an API.

5 Related Work and Conclusion

As described earlier, the MTCP problem was previously studied in (Dabiri & Heaslip 2018; Kim et al. 2022), which introduces the LRCN and CNN architectures, respectively. Earlier work has also explored this problem with other machine learning approaches (Zheng et al. 2008; Wang et al. 2017; Simoncini et al. 2018). Note that error detection and correction have not previously been explored in these earlier works. Also note that both this prior work and this paper differ from trajectory generation (Janner et al. 2021; Chen et al. 2021; Itkina & Kochenderfer 2022) - which differs from trajectory classification.

Earlier work on machine learning introspection (Daftry et al. 2016; Ramanagopal et al. 2018) examined error detection on various perceptual models. Unlike this work, these approaches were not applied to the MTCP, only focused on error detection, and did not provide theoretical guarantees of improvement. Another area of related work is machine learning verification that (Ivanov et al. 2021; Jothimurugan et al. 2021; Ma et al. 2020)) that looks to ensure the output of an ML model meets a logical specification. Like our work, some of these contributions (e.g. (Ma et al. 2020)) adjust the output of a machine learning model to meet a logic-based specification. However, to our knowledge, there has been no work on the use of machine learning verification to correct a machine learning model as this work does. Other related areas include meta-learning and domain generalization (Hospedales et al. 2021; Zhou et al. 2022; Vanschoren 2018; Maes & Nardi 1988) which attempt to account for changes in the distribution of data and/or selection of a model that was trained on data similar to the current problem. While our approach can use additional data, it does not depend on training data generated by different distributions. To our knowledge, these other methods have not been applied to MTCP. Recent studies on abductive learning (Huang et al. 2023; Dai et al. 2019) and neural symbolic reasoning (Cornelio et al. 2022) incorporate error correction mechanisms rooted in inconsistency with domain knowledge as logical rules. These approaches typically necessitate direct access to the perceptual model for effective implementation. In contrast, our work takes a distinct approach by avoiding reliance on predefined learning rule pairs and eliminating the need for direct access to the perceptual model. We conjecture that these approaches could be complementary to EDCR, and we leave it to future work to explore how they can work together.

Conclusion. A key near-term direction for future work is the employment of these methods in government-administered tests of the IARPA HAYSTAC program which will provide an assessment of utility more closely related to real-world use cases. Likewise, an extension related to the aforementioned IARPA program would be to identify a sequence of movement classes in the case where an agent’s mode of transit may change. For example, Here we would look to apply our error detection and correction framework to recently introduced models such as those described in (Zeng et al. 2023). Separately, we framed rule learning as a pair of submodular maximization problems, but there are several options for algorithms beyond this paper. Finally, the use of rules for error detection and correction of machine learning models presented here may be useful in domains such as vision.

6 Acknowledgments

This research is supported by the Intelligence Advanced Research Projects Activity (IARPA) via the Department of Interior/ Interior Business Center (DOI/IBC) contract number 140D0423C0032. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DOI/IBC, or the U.S. Government. Additionally, some of the authors are supported by ONR grant N00014-23-1-2580 as well as internal funding from ASU Fulton Schools of Engineering.

Refer to caption
Refer to caption
Refer to caption
Figure 3: Results for experiments with two movement classes removed from training.

References

  • Aditya et al. (2023) Aditya, D., Mukherji, K., Balasubramanian, S., Chaudhary, A., and Shakarian, P. PyReason: Software for open world temporal logic. In AAAI Spring Symposium, 2023.
  • Buchbinder et al. (2012) Buchbinder, N., Feldman, M., Naor, J., and Schwartz, R. A tight linear time (1/2)-approximation for unconstrained submodular maximization. In 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science, pp.  649–658, 2012. doi: 10.1109/FOCS.2012.73.
  • Chen et al. (2021) Chen, L., Lu, K., Rajeswaran, A., Lee, K., Grover, A., Laskin, M., Abbeel, P., Srinivas, A., and Mordatch, I. Decision transformer: Reinforcement learning via sequence modeling. CoRR, abs/2106.01345, 2021. URL https://arxiv.org/abs/2106.01345.
  • Cornelio et al. (2022) Cornelio, C., Stuehmer, J., Hu, S. X., and Hospedales, T. Learning where and when to reason in neuro-symbolic inference. In The Eleventh International Conference on Learning Representations, 2022.
  • Dabiri & Heaslip (2018) Dabiri, S. and Heaslip, K. Inferring transportation modes from gps trajectories using a convolutional neural network. Transportation research part C: emerging technologies, 86:360–371, 2018.
  • Daftry et al. (2016) Daftry, S., Zeng, S., Bagnell, J. A., and Hebert, M. Introspective perception: Learning to predict failures in vision systems, 2016. URL http://arxiv.org/abs/1607.08665.
  • Dai et al. (2019) Dai, W.-Z., Xu, Q., Yu, Y., and Zhou, Z.-H. Bridging machine learning and logical reasoning by abductive learning. Advances in Neural Information Processing Systems, 32, 2019.
  • Fikioris et al. (2023) Fikioris, G., Patroumpas, K., Artikis, A., Pitsikalis, M., and Paliouras, G. Optimizing vessel trajectory compression for maritime situational awareness. GeoInformatica, 27(3):565–591, 2023.
  • Hospedales et al. (2021) Hospedales, T., Antoniou, A., Micaelli, P., and Storkey, A. Meta-learning in neural networks: A survey. IEEE transactions on pattern analysis and machine intelligence, 44(9):5149–5169, 2021.
  • Huang et al. (2019) Huang, H., Cheng, Y., and Weibel, R. Transport mode detection based on mobile phone network data: A systematic review. Transportation Research Part C: Emerging Technologies, 101:297–312, 2019.
  • Huang et al. (2023) Huang, Y.-X., Dai, W.-Z., Jiang, Y., and Zhou, Z.-H. Enabling knowledge refinement upon new concepts in abductive learning. 2023.
  • Itkina & Kochenderfer (2022) Itkina, M. and Kochenderfer, M. J. Interpretable self-aware neural networks for robust trajectory prediction, 2022.
  • Ivanov et al. (2021) Ivanov, R., Carpenter, T., Weimer, J., Alur, R., Pappas, G., and Lee, I. Verisig 2.0: Verification of neural network controllers using taylor model preconditioning. In Computer Aided Verification: 33rd International Conference, CAV 2021, Virtual Event, July 20–23, 2021, Proceedings, Part I, pp.  249–262. Springer-Verlag, 2021. ISBN 978-3-030-81684-1. doi: 10.1007/978-3-030-81685-8˙11. URL https://doi.org/10.1007/978-3-030-81685-8˙11.
  • Iyer & Bilmes (2013) Iyer, R. and Bilmes, J. Submodular optimization with submodular cover and submodular knapsack constraints. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, NIPS’13, pp.  2436–2444, Red Hook, NY, USA, 2013. Curran Associates Inc.
  • Janner et al. (2021) Janner, M., Li, Q., and Levine, S. Offline reinforcement learning as one big sequence modeling problem. In Advances in Neural Information Processing Systems, 2021.
  • Jothimurugan et al. (2021) Jothimurugan, K., Bansal, S., Bastani, O., and Alur, R. Compositional reinforcement learning from logical specifications. In Advances in Neural Information Processing Systems, 2021.
  • Kim et al. (2022) Kim, J., Kim, J. H., and Lee, G. Gps data-based mobility mode inference model using long-term recurrent convolutional networks. Transportation Research Part C: Emerging Technologies, 135:103523, 2022.
  • Lin & Hsu (2014) Lin, M. and Hsu, W.-J. Mining gps data for mobility patterns: A survey. Pervasive and mobile computing, 12:1–16, 2014.
  • Ma et al. (2020) Ma, M., Gao, J., Feng, L., and Stankovic, J. Stlnet: Signal temporal logic enforced multivariate recurrent neural networks. Advances in Neural Information Processing Systems, 33:14604–14614, 2020.
  • Maes & Nardi (1988) Maes, P. and Nardi, D. Meta-level architectures and reflection. 1988.
  • Ramanagopal et al. (2018) Ramanagopal, M. S., Anderson, C., Vasudevan, R., and Johnson-Roberson, M. Failing to learn: Autonomously identifying perception failures for self-driving cars. 3(4):3860–3867, 2018. ISSN 2377-3766, 2377-3774. doi: 10.1109/LRA.2018.2857402. URL http://arxiv.org/abs/1707.00051.
  • Simoncini et al. (2018) Simoncini, M., Taccari, L., Sambo, F., Bravi, L., Salti, S., and Lori, A. Vehicle classification from low-frequency gps data with recurrent neural networks. Transportation Research Part C: Emerging Technologies, 91:176–191, 2018.
  • Vanschoren (2018) Vanschoren, J. Meta-learning: A survey. arXiv preprint arXiv:1810.03548, 2018.
  • Vaswani et al. (2017) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems, pp.  5998–6008, 2017.
  • Vincenty (1975) Vincenty, T. Direct and inverse solutions of geodesics on the ellipsoid with application of nested equations. Survey review, 23(176):88–93, 1975.
  • Wang et al. (2017) Wang, H., Liu, G., Duan, J., and Zhang, L. Detecting transportation modes using deep neural network. IEICE TRANSACTIONS on Information and Systems, 100(5):1132–1135, 2017.
  • Zeng et al. (2023) Zeng, J., Yu, Y., Chen, Y., Yang, D., Zhang, L., and Wang, D. Trajectory-as-a-sequence: A novel travel mode identification framework. 146:103957, 2023. ISSN 0968-090X. doi: https://doi.org/10.1016/j.trc.2022.103957. URL https://www.sciencedirect.com/science/article/pii/S0968090X22003709.
  • Zheng et al. (2008) Zheng, Y., Li, Q., Chen, Y., Xie, X., and Ma, W.-Y. Understanding mobility based on gps data. In Proceedings of the 10th international conference on Ubiquitous computing, pp.  312–321, 2008.
  • Zhou et al. (2022) Zhou, K., Liu, Z., Qiao, Y., Xiang, T., and Loy, C. C. Domain generalization: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.

Appendix A Appendix

Details on Vector Embedding of Sequences

We begin with a set of GPS points where each point is a tuple of timestamp (t𝑡titalic_t), latitude (lat𝑙𝑎𝑡latitalic_l italic_a italic_t), and longitude (long𝑙𝑜𝑛𝑔longitalic_l italic_o italic_n italic_g), Pi=(ti,Pilat,Pilong)subscript𝑃𝑖subscript𝑡𝑖subscriptsuperscript𝑃𝑙𝑎𝑡𝑖subscriptsuperscript𝑃𝑙𝑜𝑛𝑔𝑖P_{i}=(t_{i},P^{lat}_{i},P^{long}_{i})italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_P start_POSTSUPERSCRIPT italic_l italic_a italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_P start_POSTSUPERSCRIPT italic_l italic_o italic_n italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ). Each point Pisubscript𝑃𝑖P_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT also has an associated class label c𝒞𝑐𝒞c\in\mathcal{C}italic_c ∈ caligraphic_C. To embed these tuples as vector embeddings that can be consumed by the neural model fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT, three essential preprocessing steps must be performed. These steps include normalizing the data size to meet the input requirements, extracting movement behaviors from the GPS points, and refining the data. In this section, we draw upon previous approaches (Zheng et al., 2008a,b; Dabiri et al., 2018; Kim, 2022) to guide the data preprocessing process.

As part of the data size normalization step we sequentially group chronologically ordered GPS points into uniform lengths of 40404040. The class label c𝑐citalic_c of every point in this sequence is the same and the entire sequence represents the movement trajectory of that class for 40404040 time units. The resulting sequence ωS𝜔𝑆\omega\in Sitalic_ω ∈ italic_S, where S𝑆Sitalic_S is the set of all sequences that are curated.

To capture patterns of movement behaviors from GPS points the distance time-series vector is computed as follows. Dijsubscriptsuperscript𝐷𝑗𝑖D^{j}_{i}italic_D start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the distance between two GPS point tuples Pijsubscriptsuperscript𝑃𝑗𝑖P^{j}_{i}italic_P start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and Pi1jsubscriptsuperscript𝑃𝑗𝑖1P^{j}_{i-1}italic_P start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT, where jS𝑗𝑆j\in Sitalic_j ∈ italic_S and iω𝑖𝜔i\in\omegaitalic_i ∈ italic_ω, and is computed using the Vincenty Distance formula  (Vincenty 1975). Here D103subscriptsuperscript𝐷310D^{3}_{10}italic_D start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT represents the distance between two points P10subscript𝑃10P_{10}italic_P start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT and P9subscript𝑃9P_{9}italic_P start_POSTSUBSCRIPT 9 end_POSTSUBSCRIPT from the 3rdsuperscript3𝑟𝑑3^{rd}3 start_POSTSUPERSCRIPT italic_r italic_d end_POSTSUPERSCRIPT sequence. There could be cases where a distance time-series vector falls short of 40404040 data points. To maintain a consistent length of sequence ω𝜔\omegaitalic_ω we pad the shorter Djsuperscript𝐷𝑗D^{j}italic_D start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT vector with zeros.

Additionally, we extract the velocity (V𝑉Vitalic_V), acceleration (A𝐴Aitalic_A), jerk (J𝐽Jitalic_J) and bearing rate (BR𝐵𝑅BRitalic_B italic_R) time-series vectors for each sequence as follows:

Vij=Vincenty(Pi1j,Pij)titi1subscriptsuperscript𝑉𝑗𝑖Vincentysubscriptsuperscript𝑃𝑗𝑖1subscriptsuperscript𝑃𝑗𝑖subscript𝑡𝑖subscript𝑡𝑖1\displaystyle V^{j}_{i}=\frac{\operatorname{Vincenty}\left(P^{j}_{i-1},P^{j}_{% i}\right)}{t_{i}-t_{i-1}}italic_V start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG roman_Vincenty ( italic_P start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT , italic_P start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_t start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT end_ARG (1)
Aij=VijVi1jtiti1subscriptsuperscript𝐴𝑗𝑖subscriptsuperscript𝑉𝑗𝑖subscriptsuperscript𝑉𝑗𝑖1subscript𝑡𝑖subscript𝑡𝑖1\displaystyle A^{j}_{i}=\frac{V^{j}_{i}-V^{j}_{i-1}}{t_{i}-t_{i-1}}italic_A start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG italic_V start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_V start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_t start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT end_ARG (2)
Jij=AijAi1jtiti1subscriptsuperscript𝐽𝑗𝑖subscriptsuperscript𝐴𝑗𝑖subscriptsuperscript𝐴𝑗𝑖1subscript𝑡𝑖subscript𝑡𝑖1\displaystyle J^{j}_{i}=\frac{A^{j}_{i}-A^{j}_{i-1}}{t_{i}-t_{i-1}}italic_J start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG italic_A start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_A start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_t start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT end_ARG (3)
BRij= Bearing i Bearing i1𝐵subscriptsuperscript𝑅𝑗𝑖delimited-∣∣subscript Bearing 𝑖subscript Bearing 𝑖1\displaystyle BR^{j}_{i}=\mid\text{ Bearing }_{i}-\text{ Bearing }_{i-1}\miditalic_B italic_R start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∣ Bearing start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - Bearing start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∣ (4)
where Bearing i=arctan(y,x)subscriptwhere Bearing 𝑖𝑦𝑥\displaystyle\text{ where Bearing }_{i}=\arctan(y,x)where Bearing start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_arctan ( italic_y , italic_x ) (5)
y=sin(Pilong Pi1long )cos(Pilat )\displaystyle y=\sin\left(P_{i}^{\text{long }}-P_{i-1}^{\text{long }}\right)^{% *}\cos\left(P_{i}^{\text{lat }}\right)italic_y = roman_sin ( italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT long end_POSTSUPERSCRIPT - italic_P start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT long end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT roman_cos ( italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT lat end_POSTSUPERSCRIPT ) (6)
x=cos(Pi1lat )sin(Pilat )sin(Pi1lat )\displaystyle x=\cos\left(P_{i-1}^{\text{lat }}\right)*\sin\left(P_{i}^{\text{% lat }}\right)-\sin\left(P_{i-1}^{\text{lat }}\right)*italic_x = roman_cos ( italic_P start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT lat end_POSTSUPERSCRIPT ) ∗ roman_sin ( italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT lat end_POSTSUPERSCRIPT ) - roman_sin ( italic_P start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT lat end_POSTSUPERSCRIPT ) ∗ (7)
cos(Pilat )cos(Pilong Pi1long )superscriptsubscript𝑃𝑖lat superscriptsubscript𝑃𝑖long superscriptsubscript𝑃𝑖1long \displaystyle\cos\left(P_{i}^{\text{lat }}\right)*\cos\left(P_{i}^{\text{long % }}-P_{i-1}^{\text{long }}\right)roman_cos ( italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT lat end_POSTSUPERSCRIPT ) ∗ roman_cos ( italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT long end_POSTSUPERSCRIPT - italic_P start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT long end_POSTSUPERSCRIPT )

We finally stack the vectors Vjsuperscript𝑉𝑗V^{j}italic_V start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT, Ajsuperscript𝐴𝑗A^{j}italic_A start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT, Jjsuperscript𝐽𝑗J^{j}italic_J start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT and BRj𝐵superscript𝑅𝑗BR^{j}italic_B italic_R start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT for each sequence ω𝜔\omegaitalic_ω, which is passed as the input to the neural model fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT as detailed in section 2.

Formal Statements of Additional Theorems Corollaries for Error Detection Rules

Corollary 1.

If and only if c1Pi𝑐1subscript𝑃𝑖c\geq 1-P_{i}italic_c ≥ 1 - italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT then the rule will cause precision not to decrease.

Corollary 2.

If Pi1csubscript𝑃𝑖1𝑐P_{i}\geq 1-citalic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ 1 - italic_c (the minimum condition for precision improvement from Corollary 1 then recall decreases by at most siRisubscript𝑠𝑖subscript𝑅𝑖s_{i}R_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

Theorem 4.

For a given error detecting rule, the quantity csi𝑐subscript𝑠𝑖c\cdot s_{i}italic_c ⋅ italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is a normalized polymatroid function w.r.t. set DC𝐷𝐶DCitalic_D italic_C.

Corollary 3.

The quantity (1c)siRiRi1𝑐subscript𝑠𝑖subscript𝑅𝑖subscript𝑅𝑖(1-c)s_{i}\frac{R_{i}}{R_{i}}( 1 - italic_c ) italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT divide start_ARG italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG (decrease in recall) is a normalized polymatroid function w.r.t. set DC𝐷𝐶DCitalic_D italic_C.

Corollary 4.

GreedyRuleSelect provides an approximation of cscscsitalic_c italic_s that is within 1/|C|1C1/|C|1 / | italic_C | of optimal.

Formal Statements of Additional Theorems Corollaries for Error Correction Rules

Corollary 5.

Precision increases for class i𝑖iitalic_i with the application of an error correcting rule if and only if c>Pi𝑐subscript𝑃𝑖c>P_{i}italic_c > italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

Corollary 6.

Recall is non-decreasing for class i𝑖iitalic_i with the application of an error correcting rule.

Theorem 5.

Confidence is submodular with respect to CCi𝐶subscript𝐶𝑖CC_{i}italic_C italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

Corollary 7.

For an arbitrarily small constant ϵitalic-ϵ\epsilonitalic_ϵ, DetUSMPosRuleSelect provides a 1/3+ϵ13italic-ϵ1/3+\epsilon1 / 3 + italic_ϵ approximation of confidence if the returned confidence is greater than the initial precision.

Proof of Theorem 1

Under the condition si1Pisubscript𝑠𝑖1subscript𝑃𝑖s_{i}\leq 1-P_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ 1 - italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, the precision of model fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT for class i𝑖iitalic_i, with initial precision Pisubscript𝑃𝑖P_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, after applying an error correcting rule with support sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and confidence c𝑐citalic_c increases by a function of sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and c𝑐citalic_c and is greater than or equal to csi𝑐subscript𝑠𝑖c\cdot s_{i}italic_c ⋅ italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

Proof.

CLAIM 1: The precision of model fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT for class i𝑖iitalic_i, with initial precision Pisubscript𝑃𝑖P_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, after applying an error correcting rule with support sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and confidence c𝑐citalic_c increases by:

si1si(c+Pi1)subscript𝑠𝑖1subscript𝑠𝑖𝑐subscript𝑃𝑖1\displaystyle\frac{s_{i}}{1-s_{i}}(c+P_{i}-1)divide start_ARG italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ( italic_c + italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - 1 ) (8)

The total number of items that fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT will attempt to classify as i𝑖iitalic_i before error correction is Ni=TPi+FPisubscript𝑁𝑖𝑇subscript𝑃𝑖𝐹subscript𝑃𝑖N_{i}=TP_{i}+FP_{i}italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_T italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_F italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Out of those, siNisubscript𝑠𝑖subscript𝑁𝑖s_{i}\cdot N_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT will be corrected by the rule. However, a fraction of (1c)1𝑐(1-c)( 1 - italic_c ) will be samples that would have been true positives if not corrected. Hence, the new precision can be written as follows:

TPi(1c)siNiNisiNi𝑇subscript𝑃𝑖1𝑐subscript𝑠𝑖subscript𝑁𝑖subscript𝑁𝑖subscript𝑠𝑖subscript𝑁𝑖\displaystyle\frac{TP_{i}-(1-c)s_{i}\cdot N_{i}}{N_{i}-s_{i}\cdot N_{i}}divide start_ARG italic_T italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - ( 1 - italic_c ) italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG (9)

As PiNi=TPisubscript𝑃𝑖subscript𝑁𝑖𝑇subscript𝑃𝑖P_{i}\cdot N_{i}=TP_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_T italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, we have:

PiNi(1c)siNiNi(1si)subscript𝑃𝑖subscript𝑁𝑖1𝑐subscript𝑠𝑖subscript𝑁𝑖subscript𝑁𝑖1subscript𝑠𝑖\displaystyle\frac{P_{i}\cdot N_{i}-(1-c)s_{i}\cdot N_{i}}{N_{i}(1-s_{i})}divide start_ARG italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - ( 1 - italic_c ) italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( 1 - italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG (10)
=Pi(1c)si(1si)absentsubscript𝑃𝑖1𝑐subscript𝑠𝑖1subscript𝑠𝑖\displaystyle=\frac{P_{i}-(1-c)s_{i}}{(1-s_{i})}= divide start_ARG italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - ( 1 - italic_c ) italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG ( 1 - italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG (11)

Now we subtract from that quantity the initial precision.

Pi(1c)si(1si)Pisubscript𝑃𝑖1𝑐subscript𝑠𝑖1subscript𝑠𝑖subscript𝑃𝑖\displaystyle\frac{P_{i}-(1-c)s_{i}}{(1-s_{i})}-P_{i}divide start_ARG italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - ( 1 - italic_c ) italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG ( 1 - italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG - italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (12)
=Pi(1c)si(1si)](1si)Pi1si\displaystyle=\frac{P_{i}-(1-c)s_{i}}{(1-s_{i})}-]\frac{(1-s_{i})P_{i}}{1-s_{i}}= divide start_ARG italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - ( 1 - italic_c ) italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG ( 1 - italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG - ] divide start_ARG ( 1 - italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG (13)
=si+sic+Pisi1siabsentsubscript𝑠𝑖subscript𝑠𝑖𝑐subscript𝑃𝑖subscript𝑠𝑖1subscript𝑠𝑖\displaystyle=\frac{-s_{i}+s_{i}c+P_{i}s_{i}}{1-s_{i}}= divide start_ARG - italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_c + italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG (14)
=si1si(c+Pi1)absentsubscript𝑠𝑖1subscript𝑠𝑖𝑐subscript𝑃𝑖1\displaystyle=\frac{s_{i}}{1-s_{i}}(c+P_{i}-1)= divide start_ARG italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ( italic_c + italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - 1 ) (15)

CLAIM 2: If si1Pisubscript𝑠𝑖1subscript𝑃𝑖s_{i}\leq 1-P_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ 1 - italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT then csi𝑐subscript𝑠𝑖c\cdot s_{i}italic_c ⋅ italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is a lower bound on the improvement in precision.

BWOC, then by Claim 1 we have.

si1si(c+Pi1)<csisubscript𝑠𝑖1subscript𝑠𝑖𝑐subscript𝑃𝑖1𝑐subscript𝑠𝑖\displaystyle\frac{s_{i}}{1-s_{i}}(c+P_{i}-1)<c\cdot s_{i}divide start_ARG italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ( italic_c + italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - 1 ) < italic_c ⋅ italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (16)
c+Pi1<c(1si)𝑐subscript𝑃𝑖1𝑐1subscript𝑠𝑖\displaystyle c+P_{i}-1<c(1-s_{i})italic_c + italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - 1 < italic_c ( 1 - italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) (17)
c+Pi1<ccsi𝑐subscript𝑃𝑖1𝑐𝑐subscript𝑠𝑖\displaystyle c+P_{i}-1<c-c\cdot s_{i}italic_c + italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - 1 < italic_c - italic_c ⋅ italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (18)
csi<1Pi𝑐subscript𝑠𝑖1subscript𝑃𝑖\displaystyle c\cdot s_{i}<1-P_{i}italic_c ⋅ italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < 1 - italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (19)
csi<si𝑐subscript𝑠𝑖subscript𝑠𝑖\displaystyle c\cdot s_{i}<s_{i}italic_c ⋅ italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (20)

However, as c1𝑐1c\leq 1italic_c ≤ 1 this is a contradiction.

The proof of the theorem then follows directly from claim 2. ∎

Proof of Corollarly 1

If and only if c1Pi𝑐1subscript𝑃𝑖c\geq 1-P_{i}italic_c ≥ 1 - italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT then the rule will cause precision not to decrease.

Proof.

Suppose, BWOC, the statement is false. By Theorem 1 then the following must be true.

Pisi(1c)1siPi<0subscript𝑃𝑖subscript𝑠𝑖1𝑐1subscript𝑠𝑖subscript𝑃𝑖0\displaystyle\frac{P_{i}-s_{i}(1-c)}{1-s_{i}}-P_{i}<0divide start_ARG italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( 1 - italic_c ) end_ARG start_ARG 1 - italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG - italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < 0 (21)
Pisi(1c)<P(1si)subscript𝑃𝑖subscript𝑠𝑖1𝑐𝑃1subscript𝑠𝑖\displaystyle P_{i}-s_{i}(1-c)<P(1-s_{i})italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( 1 - italic_c ) < italic_P ( 1 - italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) (22)
sicsi<Pisisubscript𝑠𝑖𝑐subscript𝑠𝑖subscript𝑃𝑖subscript𝑠𝑖\displaystyle s_{i}c-s_{i}<-P_{i}s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_c - italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < - italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (23)
Pi<1csubscript𝑃𝑖1𝑐\displaystyle P_{i}<1-citalic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < 1 - italic_c (24)

However, as Pi1csubscript𝑃𝑖1𝑐P_{i}\geq 1-citalic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ 1 - italic_c this cannot hold.

Likewise, suppose BWOC that c<1Pi𝑐1subscript𝑃𝑖c<1-P_{i}italic_c < 1 - italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and BWOC the statement is false:

Pisi(1c)1siPi>0subscript𝑃𝑖subscript𝑠𝑖1𝑐1subscript𝑠𝑖subscript𝑃𝑖0\displaystyle\frac{P_{i}-s_{i}(1-c)}{1-s_{i}}-P_{i}>0divide start_ARG italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( 1 - italic_c ) end_ARG start_ARG 1 - italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG - italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > 0 (25)
Pisi(1c)>P(1si)subscript𝑃𝑖subscript𝑠𝑖1𝑐𝑃1subscript𝑠𝑖\displaystyle P_{i}-s_{i}(1-c)>P(1-s_{i})italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( 1 - italic_c ) > italic_P ( 1 - italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) (26)
sicsi>Pisisubscript𝑠𝑖𝑐subscript𝑠𝑖subscript𝑃𝑖subscript𝑠𝑖\displaystyle s_{i}c-s_{i}>-P_{i}s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_c - italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > - italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (27)
Pi>1csubscript𝑃𝑖1𝑐\displaystyle P_{i}>1-citalic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > 1 - italic_c (28)

Again, a contradiction. ∎

Proof of Theorem 4

For a given error detecting rule, the quantity csi𝑐subscript𝑠𝑖c\cdot s_{i}italic_c ⋅ italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is a normalized polymatroid function w.r.t. set DC𝐷𝐶DCitalic_D italic_C.

Proof.

CLAIM 1: csi=POS/Ni𝑐subscript𝑠𝑖𝑃𝑂𝑆subscript𝑁𝑖c\cdot s_{i}=POS/N_{i}italic_c ⋅ italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_P italic_O italic_S / italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT where POS𝑃𝑂𝑆POSitalic_P italic_O italic_S is the number of samples where both the rule body and head are satisfied.
Let BOD𝐵𝑂𝐷BODitalic_B italic_O italic_D be the number of samples that the body of the rule is true. This gives us csi=POSBODBODNi𝑐subscript𝑠𝑖𝑃𝑂𝑆𝐵𝑂𝐷𝐵𝑂𝐷subscript𝑁𝑖c\cdot s_{i}=\frac{POS}{BOD}\frac{BOD}{N_{i}}italic_c ⋅ italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG italic_P italic_O italic_S end_ARG start_ARG italic_B italic_O italic_D end_ARG divide start_ARG italic_B italic_O italic_D end_ARG start_ARG italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG which is equivalent to the statement of the claim. CLAIM 2: The quantity csi𝑐subscript𝑠𝑖c\cdot s_{i}italic_c ⋅ italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is submodular w.r.t. set DC𝐷𝐶DCitalic_D italic_C.
We show this by the subodularitiy of POS𝑃𝑂𝑆POSitalic_P italic_O italic_S as Nisubscript𝑁𝑖N_{i}italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is a constant as well as the result of Claim 1. BWOC, POS𝑃𝑂𝑆POSitalic_P italic_O italic_S is not submodular for some set DC𝐷𝐶DCitalic_D italic_C. We use the symbol POS(DC)𝑃𝑂𝑆𝐷𝐶POS(DC)italic_P italic_O italic_S ( italic_D italic_C ) to denote this and assume the exsitence of two sets of conditions DC1,DC2𝐷subscript𝐶1𝐷subscript𝐶2DC_{1},DC_{2}italic_D italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_D italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. Then, the following must be true:

POS(DC1)+POS(DC2)<POS(DC1DC2)𝑃𝑂𝑆𝐷subscript𝐶1𝑃𝑂𝑆𝐷subscript𝐶2𝑃𝑂𝑆𝐷subscript𝐶1𝐷subscript𝐶2\displaystyle POS(DC_{1})+POS(DC_{2})<POS(DC_{1}\cup DC_{2})italic_P italic_O italic_S ( italic_D italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) + italic_P italic_O italic_S ( italic_D italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) < italic_P italic_O italic_S ( italic_D italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ italic_D italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) (29)

Which can be re-written as:

|condDC1{x|cond(ω)pred(ω)}|+limit-fromsubscript𝑐𝑜𝑛𝑑𝐷subscript𝐶1conditional-set𝑥𝑐𝑜𝑛𝑑𝜔𝑝𝑟𝑒𝑑𝜔\displaystyle|\bigcup_{cond\in DC_{1}}\{x|cond(\omega)\wedge pred(\omega)\}|+| ⋃ start_POSTSUBSCRIPT italic_c italic_o italic_n italic_d ∈ italic_D italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT { italic_x | italic_c italic_o italic_n italic_d ( italic_ω ) ∧ italic_p italic_r italic_e italic_d ( italic_ω ) } | + (30)
|condDC2{x|cond(ω)pred(ω)}|subscript𝑐𝑜𝑛𝑑𝐷subscript𝐶2conditional-set𝑥𝑐𝑜𝑛𝑑𝜔𝑝𝑟𝑒𝑑𝜔\displaystyle|\bigcup_{cond\in DC_{2}}\{x|cond(\omega)\wedge pred(\omega)\}|| ⋃ start_POSTSUBSCRIPT italic_c italic_o italic_n italic_d ∈ italic_D italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT { italic_x | italic_c italic_o italic_n italic_d ( italic_ω ) ∧ italic_p italic_r italic_e italic_d ( italic_ω ) } | (31)

This quantity is less than the following:

|condDC1DC2{x|cond(ω)predx}|subscript𝑐𝑜𝑛𝑑𝐷subscript𝐶1𝐷subscript𝐶2conditional-set𝑥𝑐𝑜𝑛𝑑𝜔𝑝𝑟𝑒subscript𝑑𝑥\displaystyle|\bigcup_{cond\in DC_{1}\cup DC_{2}}\{x|cond(\omega)\wedge pred_{% x}\}|| ⋃ start_POSTSUBSCRIPT italic_c italic_o italic_n italic_d ∈ italic_D italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ italic_D italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT { italic_x | italic_c italic_o italic_n italic_d ( italic_ω ) ∧ italic_p italic_r italic_e italic_d start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT } | (32)

However, this would imply there is at least one element in DC1DC2𝐷subscript𝐶1𝐷subscript𝐶2DC_{1}\cup DC_{2}italic_D italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ italic_D italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT not in either DC1𝐷subscript𝐶1DC_{1}italic_D italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT or DC2𝐷subscript𝐶2DC_{2}italic_D italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT which is a contradiction. CLAIM 3: csi𝑐subscript𝑠𝑖c\cdot s_{i}italic_c ⋅ italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT monotonically increases with DC𝐷𝐶DCitalic_D italic_C.
By claim 1, as the quantity equals POS/Ni𝑃𝑂𝑆subscript𝑁𝑖POS/N_{i}italic_P italic_O italic_S / italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and Nisubscript𝑁𝑖N_{i}italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is a constant, we just need to show monotonicity of POS𝑃𝑂𝑆POSitalic_P italic_O italic_S. Clearly POS𝑃𝑂𝑆POSitalic_P italic_O italic_S increases monotonically as additional elements in DC𝐷𝐶DCitalic_D italic_C can only make it increase. CLAIM 4: When DC=𝐷𝐶DC=\emptysetitalic_D italic_C = ∅, csi=0𝑐subscript𝑠𝑖0c\cdot s_{i}=0italic_c ⋅ italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0.
Follows directly from the fact that we define sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT as zero is no conditions are used.

Proof of theorem. Follows directly from claims 2-4. ∎

Proof of Theorem 2

After applying the rule to correct errors, the recall will decrease by

(1c)siRiPi1𝑐subscript𝑠𝑖subscript𝑅𝑖subscript𝑃𝑖\displaystyle(1-c)s_{i}\frac{R_{i}}{P_{i}}( 1 - italic_c ) italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT divide start_ARG italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG (33)
Proof.

The number of corrections made by the rule is si(TPi+FPi)subscript𝑠𝑖𝑇subscript𝑃𝑖𝐹subscript𝑃𝑖s_{i}(TP_{i}+FP_{i})italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_T italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_F italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) with (1c)1𝑐(1-c)( 1 - italic_c ) fraction of these being incorrect (increasing false negatives). Note that the sum TPi+FN𝑇subscript𝑃𝑖𝐹𝑁TP_{i}+FNitalic_T italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_F italic_N does not change after error correction, as any “corrected” false positive becomes a false negative, and false negatives do not otherwise change from error correction. Therefore, the new recall is:

TPis(1c)(TPi+FPi)TPi+FNi𝑇subscript𝑃𝑖𝑠1𝑐𝑇subscript𝑃𝑖𝐹subscript𝑃𝑖𝑇subscript𝑃𝑖𝐹subscript𝑁𝑖\displaystyle\frac{TP_{i}-s(1-c)(TP_{i}+FP_{i})}{TP_{i}+FN_{i}}divide start_ARG italic_T italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_s ( 1 - italic_c ) ( italic_T italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_F italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG italic_T italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_F italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG (34)

When this quantity is subtracted from the original recall (Risubscript𝑅𝑖R_{i}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT), we obtain:

si(1c)(Ri+FPiTPi+FNi)subscript𝑠𝑖1𝑐subscript𝑅𝑖𝐹subscript𝑃𝑖𝑇subscript𝑃𝑖𝐹subscript𝑁𝑖\displaystyle s_{i}(1-c)\left(R_{i}+\frac{FP_{i}}{TP_{i}+FN_{i}}\right)italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( 1 - italic_c ) ( italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + divide start_ARG italic_F italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_T italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_F italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) (35)

We note that FPi=TPiPiTPi=TPiPTPiPi𝐹subscript𝑃𝑖𝑇subscript𝑃𝑖subscript𝑃𝑖𝑇subscript𝑃𝑖𝑇subscript𝑃𝑖𝑃𝑇subscript𝑃𝑖subscript𝑃𝑖FP_{i}=\frac{TP_{i}}{P_{i}}-TP_{i}=\frac{TP_{i}-P\cdot TP_{i}}{P_{i}}italic_F italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG italic_T italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG - italic_T italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG italic_T italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_P ⋅ italic_T italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG which gives us:

si(1c)(Ri+TPiP(TPi+FNi)TPiPiPi(TPi+FNi))subscript𝑠𝑖1𝑐subscript𝑅𝑖𝑇subscript𝑃𝑖𝑃𝑇subscript𝑃𝑖𝐹subscript𝑁𝑖𝑇subscript𝑃𝑖subscript𝑃𝑖subscript𝑃𝑖𝑇subscript𝑃𝑖𝐹subscript𝑁𝑖\displaystyle s_{i}(1-c)\left(R_{i}+\frac{TP_{i}}{P(TP_{i}+FN_{i})}-\frac{TP_{% i}\cdot P_{i}}{P_{i}(TP_{i}+FN_{i})}\right)italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( 1 - italic_c ) ( italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + divide start_ARG italic_T italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_P ( italic_T italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_F italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG - divide start_ARG italic_T italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_T italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_F italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG ) (37)
=si(1c)(Ri+RiPiRi)absentsubscript𝑠𝑖1𝑐subscript𝑅𝑖subscript𝑅𝑖subscript𝑃𝑖subscript𝑅𝑖\displaystyle=s_{i}(1-c)\left(R_{i}+\frac{R_{i}}{P_{i}}-R_{i}\right)= italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( 1 - italic_c ) ( italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + divide start_ARG italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG - italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) (38)
=(1c)siRiPiabsent1𝑐subscript𝑠𝑖subscript𝑅𝑖subscript𝑃𝑖\displaystyle=(1-c)s_{i}\frac{R_{i}}{P_{i}}= ( 1 - italic_c ) italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT divide start_ARG italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG (39)

Proof of Corollary 2

If Pi1csubscript𝑃𝑖1𝑐P_{i}\geq 1-citalic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ 1 - italic_c (the minimum condition for precision improvement from Corollary 1 then recall decreases by at most siRisubscript𝑠𝑖subscript𝑅𝑖s_{i}R_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

Proof.

Suppose BWOC the statement is false. By Theorem 2, recall decrease by (1c)siRiPi1𝑐subscript𝑠𝑖subscript𝑅𝑖subscript𝑃𝑖(1-c)s_{i}\frac{R_{i}}{P_{i}}( 1 - italic_c ) italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT divide start_ARG italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG. This gives us:

(1c)siRiPi>siRi1𝑐subscript𝑠𝑖subscript𝑅𝑖subscript𝑃𝑖subscript𝑠𝑖subscript𝑅𝑖\displaystyle(1-c)s_{i}\frac{R_{i}}{P_{i}}>s_{i}R_{i}( 1 - italic_c ) italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT divide start_ARG italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG > italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (40)

Precision cannot be less than 1c1𝑐1-c1 - italic_c, so recall must then decrease by:

(1c)siRi1c>siRi1𝑐subscript𝑠𝑖subscript𝑅𝑖1𝑐subscript𝑠𝑖subscript𝑅𝑖\displaystyle(1-c)s_{i}\frac{R_{i}}{1-c}>s_{i}R_{i}( 1 - italic_c ) italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT divide start_ARG italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_c end_ARG > italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (41)
siRi>siRisubscript𝑠𝑖subscript𝑅𝑖subscript𝑠𝑖subscript𝑅𝑖\displaystyle s_{i}R_{i}>s_{i}R_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (42)

Proof of Corollary 3

The quantity (1c)siRiRi1𝑐subscript𝑠𝑖subscript𝑅𝑖subscript𝑅𝑖(1-c)s_{i}\frac{R_{i}}{R_{i}}( 1 - italic_c ) italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT divide start_ARG italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG (decrease in recall) is a normalized polymatroid function w.r.t. set DC𝐷𝐶DCitalic_D italic_C.

Proof.

Note that BOD𝐵𝑂𝐷BODitalic_B italic_O italic_D is the number of samples that satisfy the body, while POS𝑃𝑂𝑆POSitalic_P italic_O italic_S is the number of samples that satisfy the body and head, NEG=POSBOD𝑁𝐸𝐺𝑃𝑂𝑆𝐵𝑂𝐷NEG=POS-BODitalic_N italic_E italic_G = italic_P italic_O italic_S - italic_B italic_O italic_D.

(1c)siRiPii=1𝑐subscript𝑠𝑖subscript𝑅𝑖subscript𝑃𝑖𝑖absent\displaystyle(1-c)s_{i}\frac{R_{i}}{P_{i}i}=( 1 - italic_c ) italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT divide start_ARG italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_i end_ARG = (1POSBOD)BODNiRiPi1𝑃𝑂𝑆𝐵𝑂𝐷𝐵𝑂𝐷subscript𝑁𝑖subscript𝑅𝑖subscript𝑃𝑖\displaystyle\left(1-\frac{POS}{BOD}\right)\frac{BOD}{N_{i}}\frac{R_{i}}{P_{i}}( 1 - divide start_ARG italic_P italic_O italic_S end_ARG start_ARG italic_B italic_O italic_D end_ARG ) divide start_ARG italic_B italic_O italic_D end_ARG start_ARG italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG divide start_ARG italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG (43)
=\displaystyle== NEGBODBODNRiPi𝑁𝐸𝐺𝐵𝑂𝐷𝐵𝑂𝐷𝑁subscript𝑅𝑖subscript𝑃𝑖\displaystyle\frac{NEG}{BOD}\frac{BOD}{N}\frac{R_{i}}{P_{i}}divide start_ARG italic_N italic_E italic_G end_ARG start_ARG italic_B italic_O italic_D end_ARG divide start_ARG italic_B italic_O italic_D end_ARG start_ARG italic_N end_ARG divide start_ARG italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG (44)
=\displaystyle== NEG1NiRiPi𝑁𝐸𝐺1subscript𝑁𝑖subscript𝑅𝑖subscript𝑃𝑖\displaystyle NEG\frac{1}{N_{i}}\frac{R_{i}}{P_{i}}italic_N italic_E italic_G divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG divide start_ARG italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG (45)

As 1NiRiPi1subscript𝑁𝑖subscript𝑅𝑖subscript𝑃𝑖\frac{1}{N_{i}}\frac{R_{i}}{P_{i}}divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG divide start_ARG italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG is a constant, we need to show the submodularity of NEG𝑁𝐸𝐺NEGitalic_N italic_E italic_G which follows the same argument for POS𝑃𝑂𝑆POSitalic_P italic_O italic_S as per Claim 2 of Theorem 4. Likewise, NEG𝑁𝐸𝐺NEGitalic_N italic_E italic_G is montonic (mirroring the argument of Claim 3 of Theorem 4) and normalized by the defintion of sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in the case where there are no conditions. The statement of the theorem follows. ∎

Proof of Theorem 3

For the application of positive rules, precision increases if and only if rule confidence (c𝑐citalic_c) increases.

Proof.

CLAIM 1: Precision increases by csPis𝒫i+s𝑐𝑠subscript𝑃𝑖𝑠subscript𝒫𝑖𝑠\frac{cs-P_{i}s}{\mathbf{\mathcal{P}}_{i}+s}divide start_ARG italic_c italic_s - italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_s end_ARG start_ARG caligraphic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_s end_ARG.

The new precision is equal to the following:

TPi+csNMi+sN𝑇subscript𝑃𝑖𝑐𝑠𝑁subscript𝑀𝑖𝑠𝑁\displaystyle\frac{TP_{i}+csN}{M_{i}+sN}divide start_ARG italic_T italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_c italic_s italic_N end_ARG start_ARG italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_s italic_N end_ARG (46)

The improvement of the precision can be derived as follows.

TPi+csNMi+sNPi=𝑇subscript𝑃𝑖𝑐𝑠𝑁subscript𝑀𝑖𝑠𝑁subscript𝑃𝑖absent\displaystyle\frac{TP_{i}+csN}{M_{i}+sN}-P_{i}=divide start_ARG italic_T italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_c italic_s italic_N end_ARG start_ARG italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_s italic_N end_ARG - italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = (47)
=\displaystyle== TPi+csNPiMiPisNMi+sN𝑇subscript𝑃𝑖𝑐𝑠𝑁subscript𝑃𝑖subscript𝑀𝑖subscript𝑃𝑖𝑠𝑁subscript𝑀𝑖𝑠𝑁\displaystyle\frac{TP_{i}+csN-P_{i}M_{i}-P_{i}sN}{M_{i}+sN}divide start_ARG italic_T italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_c italic_s italic_N - italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_s italic_N end_ARG start_ARG italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_s italic_N end_ARG (48)
=\displaystyle== TPi+csNTPiPisNMi+sN𝑇subscript𝑃𝑖𝑐𝑠𝑁𝑇subscript𝑃𝑖subscript𝑃𝑖𝑠𝑁subscript𝑀𝑖𝑠𝑁\displaystyle\frac{TP_{i}+csN-TP_{i}-P_{i}sN}{M_{i}+sN}divide start_ARG italic_T italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_c italic_s italic_N - italic_T italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_s italic_N end_ARG start_ARG italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_s italic_N end_ARG (49)
=\displaystyle== csNPisNMi+sN𝑐𝑠𝑁subscript𝑃𝑖𝑠𝑁subscript𝑀𝑖𝑠𝑁\displaystyle\frac{csN-P_{i}sN}{M_{i}+sN}divide start_ARG italic_c italic_s italic_N - italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_s italic_N end_ARG start_ARG italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_s italic_N end_ARG (50)
=\displaystyle== csPis𝒫i+s𝑐𝑠subscript𝑃𝑖𝑠subscript𝒫𝑖𝑠\displaystyle\frac{cs-P_{i}s}{\mathbf{\mathcal{P}}_{i}+s}divide start_ARG italic_c italic_s - italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_s end_ARG start_ARG caligraphic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_s end_ARG (51)

CLAIM 2: If count of samples satisfying both rule body and head (the numerator of confidence) increases, then precision increases.

Suppose BWOC the claim is not true. Then for some value of POS𝑃𝑂𝑆POSitalic_P italic_O italic_S for which the improvement in precision is greater than POS=POS+1𝑃𝑂superscript𝑆𝑃𝑂𝑆1POS^{\prime}=POS+1italic_P italic_O italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_P italic_O italic_S + 1. Note that, in this case, the number of samples satisfying the body also increases by 1111. First, we know that we can re-write the result of claim 1 as follows.

POSPiBODMi+BOD𝑃𝑂𝑆subscript𝑃𝑖𝐵𝑂𝐷subscript𝑀𝑖𝐵𝑂𝐷\displaystyle\frac{POS-P_{i}BOD}{M_{i}+BOD}divide start_ARG italic_P italic_O italic_S - italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_B italic_O italic_D end_ARG start_ARG italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_B italic_O italic_D end_ARG (52)

Therefore, using the result from Claim 1, the following relationship must hold.

POSPiBODMi+BOD>POS+1PiBODPiMi+BOD+1𝑃𝑂𝑆subscript𝑃𝑖𝐵𝑂𝐷subscript𝑀𝑖𝐵𝑂𝐷𝑃𝑂𝑆1subscript𝑃𝑖𝐵𝑂𝐷subscript𝑃𝑖subscript𝑀𝑖𝐵𝑂𝐷1\displaystyle\frac{POS-P_{i}BOD}{M_{i}+BOD}>\frac{POS+1-P_{i}BOD-P_{i}}{M_{i}+% BOD+1}divide start_ARG italic_P italic_O italic_S - italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_B italic_O italic_D end_ARG start_ARG italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_B italic_O italic_D end_ARG > divide start_ARG italic_P italic_O italic_S + 1 - italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_B italic_O italic_D - italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_B italic_O italic_D + 1 end_ARG (53)
POSPiBOD>Mi(1Pi)+BOD(1Pi)𝑃𝑂𝑆subscript𝑃𝑖𝐵𝑂𝐷subscript𝑀𝑖1subscript𝑃𝑖𝐵𝑂𝐷1subscript𝑃𝑖\displaystyle POS-P_{i}BOD>M_{i}(1-P_{i})+BOD(1-P_{i})italic_P italic_O italic_S - italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_B italic_O italic_D > italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( 1 - italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + italic_B italic_O italic_D ( 1 - italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) (54)
POS>M(1Pi)+BOD𝑃𝑂𝑆𝑀1subscript𝑃𝑖𝐵𝑂𝐷\displaystyle POS>M(1-P_{i})+BODitalic_P italic_O italic_S > italic_M ( 1 - italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + italic_B italic_O italic_D (55)

This gives us a contradiction, as M(1Pi)0𝑀1subscript𝑃𝑖0M(1-P_{i})\geq 0italic_M ( 1 - italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ≥ 0 and POSBOD𝑃𝑂𝑆𝐵𝑂𝐷POS\leq BODitalic_P italic_O italic_S ≤ italic_B italic_O italic_D by definition.

CLAIM 3: If the difference in precision increases, the number of samples satisfying both rule body and head must increase.
By definition, the only way for this to occur is if BOD𝐵𝑂𝐷BODitalic_B italic_O italic_D increases and POS𝑃𝑂𝑆POSitalic_P italic_O italic_S does not - as they can both increase or only BOD𝐵𝑂𝐷BODitalic_B italic_O italic_D increase. If neither there is no change, and it is not possible for POS𝑃𝑂𝑆POSitalic_P italic_O italic_S to increase without BOD𝐵𝑂𝐷BODitalic_B italic_O italic_D. Therefore the following must be true.

POSPiBODMi+BOD<POSPiBODPiMi+BOD+1𝑃𝑂𝑆subscript𝑃𝑖𝐵𝑂𝐷subscript𝑀𝑖𝐵𝑂𝐷𝑃𝑂𝑆subscript𝑃𝑖𝐵𝑂𝐷subscript𝑃𝑖subscript𝑀𝑖𝐵𝑂𝐷1\displaystyle\frac{POS-P_{i}BOD}{M_{i}+BOD}<\frac{POS-P_{i}BOD-P_{i}}{M_{i}+% BOD+1}divide start_ARG italic_P italic_O italic_S - italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_B italic_O italic_D end_ARG start_ARG italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_B italic_O italic_D end_ARG < divide start_ARG italic_P italic_O italic_S - italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_B italic_O italic_D - italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_B italic_O italic_D + 1 end_ARG (56)

However, this is clearly a contradiction the expression on the right is clearly smaller (the numerator is smaller as Pisubscript𝑃𝑖P_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is positive, and the denominator is larger).

CLAIM 4: Precision increases if and only if c𝑐citalic_c increases.

Follows directly from claims 1-3.

CLAIM 5: When adding more samples that satisfy the body of the rule, confidence increases if and only if POS𝑃𝑂𝑆POSitalic_P italic_O italic_S increases.

Note that confidence is defined as POS/BOD𝑃𝑂𝑆𝐵𝑂𝐷POS/BODitalic_P italic_O italic_S / italic_B italic_O italic_D. Clearly, there confidence decreases if BOD𝐵𝑂𝐷BODitalic_B italic_O italic_D increases but not POS𝑃𝑂𝑆POSitalic_P italic_O italic_S and it is not possible for POS𝑃𝑂𝑆POSitalic_P italic_O italic_S to increase alone. Therefore, BWOC, the following must hold true.

POS+kBOD+k<POSBOD𝑃𝑂𝑆𝑘𝐵𝑂𝐷𝑘𝑃𝑂𝑆𝐵𝑂𝐷\displaystyle\frac{POS+k}{BOD+k}<\frac{POS}{BOD}divide start_ARG italic_P italic_O italic_S + italic_k end_ARG start_ARG italic_B italic_O italic_D + italic_k end_ARG < divide start_ARG italic_P italic_O italic_S end_ARG start_ARG italic_B italic_O italic_D end_ARG (57)
BODk<POSk𝐵𝑂𝐷𝑘𝑃𝑂𝑆𝑘\displaystyle BODk<POSkitalic_B italic_O italic_D italic_k < italic_P italic_O italic_S italic_k (58)
BOD<POS𝐵𝑂𝐷𝑃𝑂𝑆\displaystyle BOD<POSitalic_B italic_O italic_D < italic_P italic_O italic_S (59)

This is a contradiction as BODPOS𝐵𝑂𝐷𝑃𝑂𝑆BOD\geq POSitalic_B italic_O italic_D ≥ italic_P italic_O italic_S.

Going other way, suppose BWOC confidence increases but POS does not. We get:

c2>c1subscript𝑐2subscript𝑐1\displaystyle c_{2}>c_{1}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (60)
POSBOD2>POSBOD1𝑃𝑂𝑆𝐵𝑂subscript𝐷2𝑃𝑂𝑆𝐵𝑂subscript𝐷1\displaystyle\frac{POS}{BOD_{2}}>\frac{POS}{BOD_{1}}divide start_ARG italic_P italic_O italic_S end_ARG start_ARG italic_B italic_O italic_D start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG > divide start_ARG italic_P italic_O italic_S end_ARG start_ARG italic_B italic_O italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG (61)
BOD1>BOD2𝐵𝑂subscript𝐷1𝐵𝑂subscript𝐷2\displaystyle BOD_{1}>BOD_{2}italic_B italic_O italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > italic_B italic_O italic_D start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (62)

However, by the statement, as we add more samples that satisfy the body of the rule, we must have BOD1BOD2𝐵𝑂subscript𝐷1𝐵𝑂subscript𝐷2BOD_{1}\leq BOD_{2}italic_B italic_O italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ italic_B italic_O italic_D start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. Hence a cotnradiction.

CLAIM 6: Recall increases if and only if POS𝑃𝑂𝑆POSitalic_P italic_O italic_S increases.

As we can write the new recall in this case simply as the following, the claim immediately follows.

TPi+POSTPi+FNi𝑇subscript𝑃𝑖𝑃𝑂𝑆𝑇subscript𝑃𝑖𝐹subscript𝑁𝑖\displaystyle\frac{TP_{i}+POS}{TP_{i}+FN_{i}}divide start_ARG italic_T italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_P italic_O italic_S end_ARG start_ARG italic_T italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_F italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG (63)

CLAIM 7: Recall increases if and only if c𝑐citalic_c increases.

Follows directly from claims 5-6.

Proof of theorem.

Follows directly from claims 4 and 7. ∎

Proof of Corollary 4

GreedyRuleSelect provides an approximation of cs𝑐𝑠csitalic_c italic_s that is within 1/|C|1𝐶1/|C|1 / | italic_C | of optimal.

Proof.

Follows directly from Theorem 4.7 of (Iyer & Bilmes 2013). ∎

Proof of Corollary 7

For an arbitrarily small constant ϵitalic-ϵ\epsilonitalic_ϵ, DetUSMPosRuleSelect provides a 1/3+ϵ13italic-ϵ1/3+\epsilon1 / 3 + italic_ϵ approximation of confidence if the returned confidence is greater than the initial precision.

Proof.

Follows directly from the fact that confidence is zero when CCi=𝐶subscript𝐶𝑖CC_{i}=\emptysetitalic_C italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∅ and Theorem 2.3 of (Buchbinder et al. 2012). ∎

Conditions for Error Detection and Correction

This section describes the various methods we used to create conditions (set C𝐶Citalic_C) in detail with examples.

Model based. In this study, we employed multiple models, denoted as M𝑀Mitalic_M, each corresponding to a specific class. These models were constructed using our LRCNa architecture, as detailed in this paper. However, during the training process, we adapted the model M𝑀Mitalic_M to perform binary class classification. To illustrate, for the drive class, we divided the training data 𝒯𝒯\mathcal{T}caligraphic_T into two distinct datasets: one exclusively containing samples labeled as drive, and the other encompassing samples labeled as walk, bike, bus, train, collectively forming the non_drive class. We employ this binary class classification approach to establish a set of conditions C.

In the realm of Deep Learning, the constant evolution of models poses the challenge of choosing the most optimal solution for a given problem. It is a common practice to discard older SOTA models in favor of newer ones. However, this paper introduces a novel approach aimed at leveraging the capabilities of older, proven models to enhance the performance of the latest SOTA models.

In the context of classification problems, the conventional practice involves employing a threshold of 0.5 for evaluating final results. As illustrated in many receiver operating characteristic(ROC) curves, it is evident that precision generally escalates with an increase in the threshold. Consequently, a higher threshold is advocated as a standard in older state-of-the-art models to enhance their performance.

Examining the ROC curve as an illustrative example, with a threshold of 0.5, the True Positive Rate (TPR) approximates 0.65. Elevating the threshold to 0.9 corresponds to an increased TPR of approximately 0.8. In the event of the introduction of a new state-of-the-art model with a TPR below 0.8 at the 0.5 threshold, adopting the 0.9 threshold from the prior model is recommended. Here, values predicted above 0.9 are considered true positives, while those below 0.9 are designated as unknown predictions. For the latter, the state-of-the-art model can be employed for prediction.

Similar principles are applicable when utilizing the False Positive Rate curve and reducing the threshold. A lowered threshold yields a higher true-false prediction ratio, thereby offering a basis for refining predictions. This methodology, originally designed for binary classification, is adaptable for enhancing predictions in the realm of multiple classifications as well.

Domain knowledge. Leveraging domain knowledge pertaining to outliers, we focused on the maximum velocity values present in our dataset. Notably, the highest speed records were associated with the drive labels. To ensure fair and consistent comparisons across the dataset, we conducted data normalization based on the maximum speed observed in the drive data. The highest velocity recorded in our dataset is 1, associated with the label drive.” Following closely is the train label, exhibiting a maximum velocity of 0.751.

In our datasets, any sample with a speed exceeding the maximum speed recorded for the train (0.751 in our dataset) is unambiguously classified as a drive. In a broader context, we apply the following condition: For instance, if a sample’s maximum speed measures 0.73—falling below both the maximum speeds of 0.751 attributed to the train class and 1 associated with the drive class, yet surpassing those of other categories—it indicates that the sample is likely to be categorized as either drive or train. we proceed to assess its multiclass prediction values. The class with the higher prediction value will ultimately determine our final classification for the sample.

Model based. In this study, we employed multiple models, denoted as M𝑀Mitalic_M, each corresponding to a specific class. These models were constructed using our LRCNa architecture, as detailed in this paper. However, during the training process, we adapted the model M𝑀Mitalic_M to perform binary class classification. To illustrate, for the drive class, we divided the training data 𝒯𝒯\mathcal{T}caligraphic_T into two distinct datasets: one exclusively containing samples labeled as drive, and the other encompassing samples labeled as walk, bike, bus, train, collectively forming the non_drive class. We employ this binary class classification approach to establish a set of conditions C.