Rural Environmental Registry, Data Mining, Unbalanced Data, Data Classification, SMOTE
Abstract
The Rural Environmental Registry (CAR) consists of a mandatory public electronic registry for all rural properties in the Brazilian territory, integrates environmental information of the properties, assists the monitoring of them, and the fight against deforestation. However, a large number of registrations are carried out erroneously generating inconsistent data, leading these to be canceled and/or to be requested to correct the registration. Carrying out these checks manually is very expensive, since a specialized workforce is required and Brazil has an immense amount of rural properties. Based on this problem, this work aims to provide an intelligent machine learning-based system that allows to verify and classify CAR data quickly and effectively through classification models. For this purpose, several learning models were trained using real data from registers. In addition to the classification, the SMOTE tool was used to treat the imbalance between classes. Results were generated using measures of performance of classifiers and comparative studies between the methods were also performed. The results showed potential use of the method in future automated predictions.