Efficient use of geographical information systems for improving transport mode classification


Comparison between transport mode classifiers is usually performed without considering imbalanced samples in the dataset. This problem makes performance rates, such as accuracy and precision, not enough to report the performance of a classifier because they represent a cut-off point in the classifier performance curve. Our rule-based method proposes to combine both, the network elements associated with the transport mode to identify, and the elements associated with other means of transport. We performed a comparison between our proposed method and another GPS/GIS-based method, by applying a real-world representative dataset with a target class imbalance. We evaluated the performance of both methods with five experiments, using the area under the Receiver Operating Characteristic curve as metric. The results show that the tested methods achieve the same false positive rate. However, our method identifies correctly 84% of the true positive samples, i.e., the highest performance in our test data (data collected in Belgium). The proposed method can be used as a part of the post-processing chain in transport data to perform transport and traffic analytics in smart cities.

Data Analytics 2018 : the seventh international conference on data analytics