On outlier detection with the Chebyshev type inequalities
Abstract
This work considers algorithms of outlier detection based on the Chebyshev inequality. It compares these algorithms with such classical methods as Tukey’s boxplot, the N-sigma rule and its robust modifications based on MAD and FQ scale estimates. To adjust the parameters of the algorithms, a selection procedure is proposed based on the complete knowledge of the data distribution model. Areas of suboptimal parameters are also determined in case of incomplete knowledge of the distribution model. It is concluded that the direct use of the Chebyshev inequality implies the classical N-sigma rule. With the non-classical Chebyshev inequality, a robust outlier detection method is obtained, which slightly outperforms other considered algorithms.
References
- Tchebichef P. Des valeurs moyennes. Journal de Mathematiques Pures et Appliquees. 1867;12:177–184.
- Shevlyakov G, Kan M. Stream data preprocessing: outlier detection based on the Chebyshev inequality with applications. In: Proceeding of 26 th Conference of Open Innovations Association (FRUCT); 2020 April 20–24; Yaroslavl, Russia. [S. l.]: IEEE; 2020. p. 402–407. DOI: 10.23919/FRUCT48808.2020.9087459.
- Shevlyakov GL, Oja H. Robust correlation: theory and applications. [S. l.]: Wiley; 2016. 352 p. (Wiley series in probability and statistics). DOI: 10.1002/9781119264507.
- Andrea K. Metody i algoritmy razvedochnogo analiza dannykh, osnovannye na robastnykh modifikatsiyah boksplotov [Methods and algorithms for exploratory data analysis based on robust boxplot modification] [dissertation]. Saint Petersburg: Peter the Great St. Petersburg Polytechnic University; 2013. 164 p. Russian.
- Tukey JW. Exploratory data analysis. Reading, MA: Addison Wesley; 1977. 711 p.
Copyright (c) 2020 Journal of the Belarusian State University. Mathematics and Informatics
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
The authors who are published in this journal agree to the following:
- The authors retain copyright on the work and provide the journal with the right of first publication of the work on condition of license Creative Commons Attribution-NonCommercial. 4.0 International (CC BY-NC 4.0).
- The authors retain the right to enter into certain contractual agreements relating to the non-exclusive distribution of the published version of the work (e.g. post it on the institutional repository, publication in the book), with the reference to its original publication in this journal.
- The authors have the right to post their work on the Internet (e.g. on the institutional store or personal website) prior to and during the review process, conducted by the journal, as this may lead to a productive discussion and a large number of references to this work. (See The Effect of Open Access.)