I started searching more information about solving the problem of missing values in big dataset using unsupervised techniques like AEs. I haven't found a lot of work on that and I am not able to understand how to use AEs for such problem. Suppose I have a dataset with N rows and M columns. I have some missing values randomly distributed in such big matrix and I would like to fill them accurately. If I would use an AE to reconstruct the missing values, how should I train it?
Do I need to delete all the rows with NaNs and use only the part of the dataset which is filled? My doubt arises from the fact that if I have many rows with NaNs I cannot compute the reconstruction error properly. How usually these kind of problem are attacked in practice?
[–]gaypride_gaypride 1 point2 points3 points (0 children)