Data imputation by statistical modeling methods
Abstract
Data imputation by statistical modeling methods
Incoming article date: 15.04.2023One of the tasks of data preprocessing is the task of eliminating gaps in the data, i.e. imputation task. The paper proposes algorithms for filling gaps in data based on the method of statistical simulation. The proposed gap filling algorithms include the stages of clustering data by a set of features, classifying an object with a gap, constructing a distribution function for a feature that has gaps for each cluster, recovering missing values using the inverse function method. Computational experiments were carried out on the basis of statistical data on socio-economic indicators for the constituent entities of the Russian Federation for 2022. An analysis of the properties of the proposed imputation algorithms is carried out in comparison with known methods. The efficiency of the proposed algorithms is shown.
Keywords: imputation algorithm, data gaps, statistical modeling, inverse function method, data simulation