
Numerical Data Imputation with Multimodal Datasets – A nearest-neighbour density approach
Florian LALANDE (Okinawa)
Numerical data imputation algorithms consist in replacing missing values by estimates to allow extensive use of incomplete datasets. Current imputation methods seek to minimize the error between the unobserved ground truth and the imputed values. We will see how this strategy can create artifacts leading to poor imputation in the presence of multimodal distributions. To tackle this problem, we introduce the kNNxKDE algorithm : a hybrid method tailored for numerical data imputation using nearest-neighbours (kNN) for conditional density estimation with Gaussian kernels (KDE). We qualitatively and quantitatively show that this method preserves the original data structu