Missing data is a common challenge in air quality research. A recent study compared five methods ( PMF, Random Forest, Denoising Autoencoder, MICE, and kNN) to impute hourly PM2.5 values across 25 Seoul districts without external data.
Y. Kim et al. showed that PMF achieved the highest accuracy, outperforming both machine learning and traditional statistical approaches. Its strength lies in resolving latent factors that capture spatial patterns of PM2.5, grouping districts influenced by similar pollution sources. This allows missing values to be imputed reliably while revealing underlying spatial structure — an advantage other methods cannot match. PMF is therefore a powerful tool for both imputing gaps and understanding urban air pollution dynamics.
