Space semantic aware loss function for embedding creation in case of transaction data

  • Maksim Vatkin Sber Bank, 6 Muliavina Boulevard, Minsk 220005, Belarus https://orcid.org/0000-0002-6923-9998
  • Dmitry A. Vorobey Sber Bank, 6 Muliavina Boulevard, Minsk 220005, Belarus

Abstract

Transaction data are the most popular data type of bank domain, they are often represented as sparse vectors with a large number of features. Using sparse vectors in deep learning tasks is computationally inefficient and may lead to overfitting. Аutoencoders are widely applied to extract new useful features in a lower dimensional space. In this paper we propose to use a novel loss function based on the metric that estimates the quality of mapping the semantic structure of the original tabular data to the embedded space. The proposed loss function allows preserving the item relation structure of the original space during the dimension reduction transformation. The obtained results show the improvement of the resulting embedding properties while using the combination of the new loss function and the traditional mean squared error one.

Author Biographies

Maksim Vatkin, Sber Bank, 6 Muliavina Boulevard, Minsk 220005, Belarus

chief data scientist

Dmitry A. Vorobey, Sber Bank, 6 Muliavina Boulevard, Minsk 220005, Belarus

data scientist

References

  1. Gupta P, Banchs RE, Rosso P. Squeezing bottlenecks: exploring the limits of autoencoder semantic representation capabilities. Neurocomputing. 2016;175(PB):1001–1008. DOI: 10.1016/j.neucom.2015.06.091.
  2. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed representations of words and phrases and their compositionality. In: Burges CJC, Bottou L, Welling M, editors. NIPS-2013. Proceedings of the 26 th International conference on neural information processing system; 2013 December 5–10; Lake Tahoe, Nevada, USA. Volume 2. New York: Curran Associates Inc.; 2013. p. 3111–3119.
  3. Bourlard H, Kamp Y. Auto-association by multilayer perceptrons and singular value decomposition. Biological Cybernetics. 1988;59:291–294. DOI: 10.1007/BF00332918.
  4. Credit card fraud detection [Internet]. Cambridge: Machine Learning Group; 2018 [cited 2021 March 5]. Available from: https://www.kaggle.com/mlg-ulb/creditcardfraud/data.
  5. Al-Shabi MA. Credit card fraud detection using autoencoder model in unbalanced datasets. Journal of Advances in Mathematics and Computer Science. 2019;33(5):1–16. DOI: 10.9734/jamcs/2019/v33i530192.
  6. Husejinović A. Credit card fraud detection using naive Bayesian and C4.5 decision tree classifiers. Periodicals of Engineering and Natural Sciences. 2020;8(1):1–5. DOI: 10.21533/pen.v%25vi%25i.300.
  7. Saito T, Rehmsmeier M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS One. 2015;10(3):e0118432. DOI: 10.1371/journal.pone.0118432.
  8. Davis J, Goadrich M. The relationship between precision-recall and ROC curves. In: Cohen WW, Moore A, editors. ICML-06. Proceedings of the 23 rd International conference on machine learning; 2006 June 25–29; Pittsburgh, USA. New York: Association for Computing Machinery; 2006. p. 233–240. DOI: 10.1145/1143844.1143874.
  9. Marushko EE, Doudkin AA, Zheng X. Identification of Earth’s surface objects using ensembles of convolutional neural networks. Journal of the Belarusian State University. Mathematics and Informatics. 2021;2:114–123. DOI: 10.33581/2520-6508-2021-2-114-123.
Published
2022-04-14
Keywords: data, embedding, vector, loss function, autoencoder
How to Cite
Vatkin, M., & Vorobey, D. A. (2022). Space semantic aware loss function for embedding creation in case of transaction data. Journal of the Belarusian State University. Mathematics and Informatics, 1, 97-102. https://doi.org/10.33581/2520-6508-2022-1-97-102