To the question of optimal division of the full sample during machine learning
Abstract
The article is devoted to modern approaches to dividing of a data set into training, control and test samples, used in the process of machine learning for forecasting purposes. The actual issue of choosing the optimal division of the entire available set of data into named above samples is considered. The author analyses the results of the operation of a software algorithm developed to find the optimal division of a data set into isolated samples for the purposes of machine learning of predictive models. A recommendation to separate 80 % of the total sample to minimise the forecast error of the developed models is given.
References
- Dobbin KK, Simon RM. Optimally splitting cases for training and testing high dimensional classifiers [Internet; cited 2020 March 19]. Available from: https://bmcmedgenomics.biomedcentral.com/articles/10.1186/1755-8794-4-31.
- Afendras G, Markatou M. Optimality of training / test size and resampling effectiveness of cross-validation estimators of the generalization error. Journal of Statistical Planning and Inference. 2019;199:286–301.
- Hyndman RJ, Khandakar Y. Automatic time series forecasting: the forecast package for R. Journal of Statistical Software. 2008;27(3):1–22. DOI: 10.18637/jss.v027.i03.
- Xiaozhe Wang, Smith K, Hyndman R. Characteristic-based clustering for time series data. Data Mining and Knowledge Discovery. 2006;13(3):335–364. DOI: 10.1007/s10618-005-0039-x.
- Hyndman RJ, Athanasopoulos G. Forecasting: principles and practice. Melbourne: OTexts; 2013. 291 p.
- Akaike H. A new look at the statistical model identification. IEEE Transactions on Automatic Control. 1974;19(6):716–723. DOI: 10.1109/TAC.1974.1100705.
- Schwarz G. Estimating the dimension of a model. Annals of Statistics. 1978;6(2):461– 464. DOI: 10.1214/aos/1176344136.
- Sugiura N. Further analysis of the data by Akaike’s information criterion and the finite corrections. Communications in Statistics. 1978;7(1):13–26. DOI: 10.1080/03610927808827599.
- Hyndman RJ, Koehler AB. Another look at measures of forecast accuracy. International Journal of Forecasting. 2006;22(4):679–688. DOI: 10.1016/j.ijforecast.2006.03.001.
Copyright (c) 2021 Journal of the Belarusian State University. Economics

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
The authors who are published in this journal agree to the following:
- The authors retain copyright on the work and provide the journal with the right of first publication of the work on condition of license Creative Commons Attribution-NonCommercial. 4.0 International (CC BY-NC 4.0).
- The authors retain the right to enter into certain contractual agreements relating to the non-exclusive distribution of the published version of the work (e.g. post it on the institutional repository, publication in the book), with the reference to its original publication in this journal.
- The authors have the right to post their work on the Internet (e.g. on the institutional store or personal website) prior to and during the review process, conducted by the journal, as this may lead to a productive discussion and a large number of references to this work. (See The Effect of Open Access.)