On outlier detection with the Chebyshev type inequalities

  • Michael A. Chepulis Peter the Great St. Petersburg Polytechnic University, 29 Polytechnicheskaya Street, Saint Petersburg 195251, Russia https://orcid.org/0000-0001-7340-9323
  • Georgy L. Shevlyakov Peter the Great St. Petersburg Polytechnic University, 29 Polytechnicheskaya Street, Saint Petersburg 195251, Russia

Abstract

This work considers algorithms of outlier detection based on the Chebyshev inequality. It compares these algorithms with such classical methods as Tukey’s boxplot, the N-sigma rule and its robust modifications based on MAD and FQ scale estimates. To adjust the parameters of the algorithms, a selection procedure is proposed based on the complete knowledge of the data distribution model. Areas of suboptimal parameters are also determined in case of incomplete knowledge of the distribution model. It is concluded that the direct use of the Chebyshev inequality implies the classical N-sigma rule. With the non-classical Chebyshev inequality, a robust outlier detection method is obtained, which slightly outperforms other considered algorithms.

Author Biographies

Michael A. Chepulis, Peter the Great St. Petersburg Polytechnic University, 29 Polytechnicheskaya Street, Saint Petersburg 195251, Russia

master’s degree student at the department of applied mathematics and mechanics, high school of applied mathematics and computational physics

Georgy L. Shevlyakov, Peter the Great St. Petersburg Polytechnic University, 29 Polytechnicheskaya Street, Saint Petersburg 195251, Russia

doctor of science (physics and mathematics), full professor; professor at the department of applied mathematics and mechanics, high school of applied mathematics and computational physics

References

  1. Tchebichef P. Des valeurs moyennes. Journal de Mathematiques Pures et Appliquees. 1867;12:177–184.
  2. Shevlyakov G, Kan M. Stream data preprocessing: outlier detection based on the Chebyshev inequality with applications. In: Proceeding of 26 th Conference of Open Innovations Association (FRUCT); 2020 April 20–24; Yaroslavl, Russia. [S. l.]: IEEE; 2020. p. 402–407. DOI: 10.23919/FRUCT48808.2020.9087459.
  3. Shevlyakov GL, Oja H. Robust correlation: theory and applications. [S. l.]: Wiley; 2016. 352 p. (Wiley series in probability and statistics). DOI: 10.1002/9781119264507.
  4. Andrea K. Metody i algoritmy razvedochnogo analiza dannykh, osnovannye na robastnykh modifikatsiyah boksplotov [Methods and algorithms for exploratory data analysis based on robust boxplot modification] [dissertation]. Saint Petersburg: Peter the Great St. Petersburg Polytechnic University; 2013. 164 p. Russian.
  5. Tukey JW. Exploratory data analysis. Reading, MA: Addison Wesley; 1977. 711 p.
Published
2020-12-08
Keywords: anomaly, outlier detection, Chebyshev inequality, robustness
Supporting Agencies This research is partially supported by the Russian Foundation for Basic Research (number of grant 18-29-03250).
How to Cite
Chepulis, M. A., & Shevlyakov, G. L. (2020). On outlier detection with the Chebyshev type inequalities. Journal of the Belarusian State University. Mathematics and Informatics, 3, 28-35. https://doi.org/10.33581/2520-6508-2020-3-28-35
Section
Probability Theory and Mathematical Statistics