Document Type : Review Article
Authors
Civil Engineering Department, School of Engineering, Urmia University, Urmia, Iran
Abstract
Exposure to fine particulate matter (PM₂.₅) significantly impacts public health, particularly in regions where annual average levels of PM₂.₅ exceed the World Health Organization (WHO) guidelines. According to the literature, in Iran, elevated fine particulate matter levels contribute substantially to mortality among adults. The spatial coverage limitations and intermittent data gaps of ground PM₂.₅ monitoring stations pose challenges for effective air quality management.
The products of remote sensing technologies, such as Aerosol Optical Depth (AOD) from the Moderate Resolution Imaging Spectroradiometer (MODIS) sensors, offer a promising alternative for fine particulate matter estimation. This study reviews previous research on using machine learning algorithms to predict PM₂.₅ ground concentrations based on AOD data. A structured analysis of 127 selected studies reveals varying correlations between AOD and PM₂.₅ (with the resultant coefficient of determination, R², between ground PM₂.₅ concentrations and AOD data ranging from 48 to 99%), influenced by auxiliary variables like meteorological conditions and environmental factors.
Integrating these variables enhances prediction accuracy, though it may increase complexity and potential errors in machine learning models. The hybrid machine learning models demonstrate superior performance compared to individual algorithms, leveraging their adaptability, parallel processing capabilities, and ability to handle missing data. Despite advancements, challenges persist due to data uncertainty and meteorological dynamics.
In conclusion, while machine learning offers robust tools for PM₂.₅ forecasting using AOD data, ongoing research is essential to address existing limitations and optimize model performance amidst environmental variability.
Extended Abstract
Introduction
Air pollution, an inevitable consequence of industrialization, climate change, and increased fossil fuel usage, has emerged as a critical environmental concern, especially in urban areas. Globally, air pollution is recognized as one of the leading environmental health risks, contributing significantly to premature mortality and morbidity, with Iran ranking particularly high in terms of annual deaths linked to air pollution. Recent data place air pollution as the eighth leading global risk factor for mortality and the seventh in Iran, underlining its severe public health implications. The health effects range from respiratory and cardiovascular diseases to heightened risks of cancer. Emerging evidence also highlights the psychological and cognitive effects of air pollution, linking it to anxiety, depression, reduced cognitive performance, and even increased criminal tendencies.
Among air pollutants, fine particulate matter (PM₂.₅) is especially concerning due to its ability to penetrate deep into the respiratory system and bloodstream, causing widespread systemic harm. PM₂.₅ particles, with diameters of 2.5 microns or less, are associated with increased risks of heart attacks, strokes, and chronic lung diseases. In Iran, approximately 75,000 deaths annually are attributed to PM₂.₅ exposure. The average population-weighted PM₂.₅ concentration in the country stands at 48 μg/m³—substantially higher than the World Health Organization’s recommended maximum of 10 μg/m³. Such alarming figures underscore the critical need for effective air quality management strategies, including accurate and consistent monitoring of PM₂.₅ concentrations.
Conventional ground-based air quality monitoring stations are valuable for tracking PM₂.₅ levels but are limited by their sparse spatial coverage, especially in smaller cities and rural regions. To complement ground-based monitoring, remote sensing technologies have gained prominence as a means of providing broader spatial coverage. Aerosol Optical Depth (AOD), derived from satellite observations, is a key parameter for assessing atmospheric aerosols and estimating PM₂.₅ concentrations. Sensors such as the Moderate Resolution Imaging Spectroradiometer (MODIS) have been instrumental in providing AOD data through retrieval algorithms like Dark Target (DT) and Deep Blue (DB). However, traditional approaches for converting AOD data into PM₂.₅ concentrations often fall short in accuracy and adaptability, necessitating the exploration of advanced methodologies like machine learning.
The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) method was employed in the present paper to evaluate relevant studies and summarize the findings of previous research. The main advantage of this method compared to others lies in its high accuracy in extracting information, reducing bias, and providing a comprehensive and well-documented perspective on previous research findings. This approach, particularly in studies based on quantitative data, yields stronger and more reliable results. The focus of the analysis was centered on three main axes:
Identifying factors that significantly influence the establishment of a statistically meaningful correlation between AOD data obtained from satellite imagery and PM₂.₅ data derived from ground-based monitoring;
Assessing the success rate of previous research in establishing such correlations using statistical indicators; and
Examining the machine learning algorithms employed in these studies.
Material and Methods
This study systematically reviews machine learning algorithms used for estimating PM₂.₅ concentrations based on AOD data, focusing on their performance, scalability, and adaptability across diverse environmental settings. The primary objectives are to identify the strengths and limitations of existing methodologies, highlight gaps in current research, and propose avenues for improvement.
To ensure a comprehensive analysis, the study employed a systematic review and meta-analytical approach based on the PRISMA guidelines. Reputable scientific databases, including PubMed, Google Scholar, and Science Direct, were queried using keywords (Fig. 1) such as "PM₂.₅ estimation," "Aerosol Optical Depth," and "machine learning." An initial pool of 977 documents was narrowed down to 127 highly relevant articles through a rigorous screening process (Fig. 2). Key extraction parameters included the correlation between AOD and PM₂.₅, machine learning models employed, and performance metrics such as R² values, root mean square error (RMSE), and mean absolute error (MAE).
Results and Discussion
Correlation between AOD and PM₂.₅
Numerous studies demonstrate a strong correlation between AOD values and ground-level PM₂.₅ concentrations, although the strength of this relationship varies based on geographical, meteorological, and land-use factors (Fig. 3 and Table 1). Meteorological conditions such as temperature, relative humidity, and wind speed significantly influence the correlation by affecting aerosol properties and atmospheric dispersion. Additionally, land-use characteristics—such as urban density, vegetation cover, and proximity to industrial zones—modulate local PM₂.₅ levels.
Incorporating these auxiliary variables (see Fig. 4) into predictive models has proven effective in enhancing the accuracy of PM₂.₅ estimations. For example, regression models integrating meteorological and traffic data achieved significantly improved performance compared to models relying solely on AOD data. However, the inclusion of too many auxiliary variables can lead to overfitting, reducing the model's generalizability. Balancing model complexity with predictive accuracy remains a key challenge in this domain.
Machine Learning Algorithms for PM₂.₅ Estimation
Machine learning (ML) methods have revolutionized the estimation of PM₂.₅ concentrations from AOD data by effectively capturing the complex, non-linear relationships between variables. Among these, ensemble learning models such as Random Forest (RF) and XGBoost have consistently outperformed traditional linear regression models due to their ability to handle high-dimensional data and mitigate biases. Deep learning techniques, including Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks, have also shown promise, particularly in time-series predictions of PM₂.₅ levels. Studies comparing ML algorithms highlight the superior performance of ensemble models, especially when combined with feature selection techniques to reduce input data redundancy. For instance, hybrid approaches that integrate RF and XGBoost have achieved R² values exceeding 0.9 in several case studies, indicating exceptional predictive power. Furthermore, the use of advanced optimization techniques, such as Bayesian optimization, has enhanced the performance of these models by fine-tuning hyperparameters.
Limitations and Challenges
Despite the advancements in ML-based PM₂.₅ estimation, several challenges persist. The accuracy of these models is highly dependent on the quality of input data, which can be compromised by factors such as cloud cover, sensor limitations, and temporal mismatches between satellite observations and ground measurements. Additionally, the variability in meteorological conditions introduces uncertainties that are difficult to account for, particularly in regions with complex topography. Another limitation is the reliance on high-resolution satellite imagery, which is often expensive and not readily available for all regions. Addressing these challenges requires integrating data from multiple sources, including ground-based sensors, satellite datasets, and meteorological models. The development of robust data fusion techniques is essential for improving the scalability and reliability of PM₂.₅ estimation models.
Potential Applications and Future Directions
Future research should focus on developing hybrid algorithms that combine the strengths of multiple ML techniques, such as deep learning and ensemble learning. These algorithms should be capable of handling diverse datasets, enhancing temporal and spatial resolution, and addressing data uncertainty. Additionally, integrating long-term climate change scenarios into these models could provide more comprehensive insights into the dynamics of air pollution.
Conclusion
This study underscores the utility of AOD as a proxy for estimating PM₂.₅ concentrations and highlights the transformative potential of machine learning in enhancing the accuracy and scalability of air quality monitoring (please refer to Figs. 5 and 6). Ensemble learning models, particularly hybrid approaches, offer significant advantages in capturing the complex interactions between AOD and PM₂.₅. However, addressing challenges related to data quality, meteorological variability, and scalability is crucial for realizing the full potential of these methods.
Graphical Abstract

Main Subjects
Send comment about this article