Development of Machine Learning Algorithms to Predict Urban Air Quality Index (Study Area: Tehran City)

Document Type : Research Article


1 MA., Faculty of Surveying and Geomatics Engineering, University of Tehran, Tehran, Iran

2 MA, Faculty of Surveying and Geomatics Engineering, University of Tehran, Tehran, Iran

3 Researcher, Young Researchers and Elite Club, Mashhad Branch of Islamic Azad University, Mashhad, Iran

4 MA, Faculty of Surveying Engineering, K. N. Toosi University of Technology,Tehran, Iran

5 Research Group of Drought and Climate Change،University of Birjand


Considering the harms of air pollution on human health and the environment, it seems necessary to reduce and solve this problem based on accurate knowledge of pollutants and criteria affecting it and identifying polluted areas. Therefore, using mathematical models in the form of machine learning is an optimal and cost-efficient approach to air pollution modeling. This research is applied in terms of purpose and its method is descriptive-analytical. The novelty of this research is presenting a new combination approach to determine the effective criteria for predicting the amount of air pollution. Therefore, the purpose of this study was to evaluate and compare the capabilities of two machine learning models, namely Support Vector Machine (SVM) and Random Forest (RF) in combination with Genetic Algorithm (GA) to predict air pollution in Tehran. The data used in this research include particulate matter and gaseous pollutants in Tehran in 2020, which was obtained from Tehran Traffic Control Company. MATLAB and ArcMap software were used to analyze the data. The value of coefficient of determination (R2) obtained from the combined RF-GA method was 0.997, which indicates the high compatibility of this model with the data of this study. Moreover, the Root Mean Square Error (RMSE) value from the combined RF-GA method was 0.153, which indicates high accuracy of this model. Based on the data obtained from Tehran Traffic Control Company, the results of the RF method indicate the appropriateness of selecting the model to estimate the amount of air pollution in Tehran.

Graphical Abstract

Development of Machine Learning Algorithms to Predict Urban Air Quality Index (Study Area: Tehran City)


اسلامی‌نژاد، سید احمد؛  افتخاری، مبین؛  محمودی زاده، سعید؛  اکبری، محمد؛  حاجی الیاسی، علی؛ 1400. ارزیابی مدل‌های هوش مصنوعی مبتنی بر درخت به­منظور پیش‌بینی خطر سیل در بستر GIS. تحقیقات منابع آب ایران. 17(2), 174-189.
اسلامی‌نژاد، سید احمد؛ افتخاری، مبین؛ اکبری، محمد؛ حاجی الیاسی، علی؛ فرهادیان، هادی؛ 1400. پیش‌بینی مناطق مستعد وقوع سیل با استفاده از مدل‌های پیشرفته یادگیری ماشین ( دشت بیرجند). مدیریت آب و آبیاری. 11(4). 885-904.
افتخاری، مبین؛ اسلامی نژاد، سید احمد؛ حاجی الیاسی، علی؛ اکبری،  محمد؛ 1400. توسعۀ مدل DRASTIC با استفاده از هوش مصنوعی در پتانسیل آلودگی آبخوان مناطق نیمه ‏خشک. اکوهیدرولوژی. (3) 8. 651-665.
حق بیان، سارا؛ تشیع، بهنام؛ 1399. بهبود دقت مدل سازی غلظت ذرات معلق (PM2.5) از طریق ادغام ایستگاه‌های ثابت و همراه سنجش آلودگی هوا. فصلنامه علمی- پژوهشی اطلاعات جغرافیایی « سپهر». 29(116). 45-58.
خزایی، الهه؛ آل شیخ، علی اصغر؛ کریمی، محمد؛ وحیدنیا، محمدحسن؛ 1391. پیش بینی و مدلسازی غلظت آلاینده مونواکسیدکربن با تلفیق شبکه عصبی- فازی تطبیقی و سیستم اطلاعات جغرافیایی. کاربرد سنجش از دور و GIS در علوم منابع طبیعی. 3(3). 21-33.
رحیمی، جابر؛ رحیمی، علی؛ بذرافشان، جواد؛ 1392. بررسی تداوم روزهای همراه با آلاینده مونوکسیدکربن (CO) در هوای شهر تهران با استفاده از مدل زنجیره مارکف. نشریه علوم و تکنولوژی محیط زیست. 2(15). 79-90.
میری، محمد؛ قانعیان، محمد تقی؛  قلیزاده، عبدالمجید؛ یزدانی، اول محسن؛ نیکونهاد، علی؛ 1394. تحلیل و پهنه بندی آلودگی هوا شهر مشهد با استفاده از مدل‌های مختلف تحلیل فضایی. مجله مهندسی بهداشت محیط. (۲) ۳ :۱۵۴-۱۴.
Adams MD, Kanaroglou PS., 2016. Mapping real-time air pollution health risk for environmental management: Combining mobile and stationary air pollution monitoring with neural network models. Journal of Environmental Management. 168, 133-141.
Akbari M, Zahmatkesh H, Eftekhari M., 2021. A GIS-Based System for Real-Time Air Pollution Monitoring and Alerting Based on OGC Sensors Web Enablement Standards. Pollution, 7(1), 25-41.
Arabgol R, Sartaj M, Asghari K., 2016. Predicting nitrate concentration and its spatial distribution in groundwater resources using support vector machines (SVMs) model. Environmental Modeling & Assessment, 21(1), 71-82.
de Santana FB, de Souza AM, Poppi RJ., 2018. Visible and near infrared spectroscopy coupled to random forest to quantify some soil quality parameters. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 191, 454-462.
Eftekhari M, Eslaminezhad SA, Akbari M, DadrasAjirlou Y, Elyasi AH., 2021. Assessment of the potential of groundwater quality indicators by geostatistical methods in semi-arid regions. Journal of Chinese Soil and Water Conservation, 52(3), 158-67.
Farhadi, R., hadavifar, M., Moeinaddini, M., Amintoosi, M., 2020. Prediction of Air Pollutants Concentration Based on Meteorological Factors in Warm and Cold Season by Artificial Neural Network and Linear Regression, Case Study: Tehran. Journal of Natural Environment, 73(1), 115-127.
Fawcett T. 2006. An introduction to ROC analysis. Pattern Recognition Letters 27, 861-874.
Ghorbanzadeh O, Blaschke T, Aryal J, Gholaminia K., 2020. A new GIS-based technique using an adaptive neuro-fuzzy inference system for land subsidence susceptibility mapping. Journal of Spatial Science, 65(3), 401–417.
Gorsevski PV, Gessler PE, Foltz RB, Elliot WJ., 2006. Spatial prediction of landslide hazard using logistic regression and ROC analysis. Transactions in GIS, 10(3), 395-415.
Guevara J, Zadrozny B, Buoro A, Lu L, Tolle J, Limbeck J, Wu M, Hohl D., 2018. A hybrid data-driven and knowledge-driven methodology for estimating the effect of completion parameters on the cumulative production of horizontal wells. In: Proceedings - SPE Annual Technical Conference and Exhibition. Society of Petroleum Engineers (SPE).
Kumar D., 2018. Evolving Differential evolution method with random forest for prediction of Air Pollution. Procedia computer science, 132, 824-833.
Liu H, Li Q, Yu D, Gu Y., 2019. Air quality index and air pollutant concentration prediction based on machine learning algorithms. Applied Sciences, 9(19), p.4069.
Masoudi M, Gerami S., 2017. Status of CO as an air pollutant and its prediction, using meteorological parameters in Esfahan, Iran. Pollution. 3 (4), 527-537. 10.22059/poll.2017.62770
McKendry IG., 2015. Evaluation of Artificial Neural Networks for Fine Particulate Pollution (PM10 and PM2.5) Forecasting. Journal of the Air & Waste Management Association 52(9):–1101. 10.1080/10473289.2002.10470836
Mirjalili S., 2019. Genetic algorithm. In Evolutionary algorithms and neural networks (pp. 43-55). Springer, Cham.
Nejadkoorki F., and Baroutian S., 2012. Forecasting Extreme PM10 Concentrations Using Artificial Neural Networks. Statewide Agricultural Land Use Baseline 2015, 1(1), 277–84.
Oshan TM, Li Z, Kang W, Wolf LJ, Fotheringhm AS., 2019. MGWR: A Python implementation of multiscale geographically weighted regression for investigating process spatial heterogeneity and scale, ISPRS International Journal of Geo-Information, 8 (6), p. 269.
Park S, Kim M, Namgung HG, Kim KT, Cho KH, Kwon SB., 2018. Predicting PM10concentration in Seoul Metropolitan Subway Stations Using Artificial Neural Network (ANN). Journal of Hazardous Materials, 341, 75–82.
Quiroz JC, Mariun N, Mehrjou MR, Izadi M, Misron N, Mohd Radzi MA., 2018. Fault detection of broken rotor bar in LS-PMSM using random forests. Measurement, 116, 273-280.
Song XY, Gao Y, Peng Y, Huang S, Liu C, Peng ZR., 2021. A machine learning approach to modelling the spatial variations in the daily fine particulate matter (PM2. 5) and nitrogen dioxide (NO2) of Shanghai, China. Environment and Planning B: Urban Analytics and City Science, 48(3), 467-483.
Sun Y, Xue B, Zhang M, Yen GG, Lv J., 2020. Automatically designing CNN architectures using the genetic algorithm for image classification. IEEE transactions on cybernetics, 50(9), 3840-3854.
Tien Bui D, Shahabi H, Omidvar E, Shirzadi A, Geertsema M, Clague JJ, Lee S., 2019. Shallow landslide prediction using a novel hybrid functional machine learning algorithm. Remote Sensing, 11(8), 931.
Wang X, Liu H., 2019. A Knowledge-and Data-Driven Soft Sensor Based on Deep Learning for Predicting the Deformation of an Air Preheater Rotor. IEEE Access 7:159651–159660.
Wheeler DC., 2014. Geographically Weighted Regression. Handbook of Regional Science, Springer: 1435-1459.
Wiemann S, Richter S, Karrasch P, Brauner J, Pech K, Bernard L., 2012. Classification-driven air pollution mapping as for environment and health analysis. 6th International Environmental Modelling and Software Society (iEMSs), 2012, Leipzig, Germany.
Xue J, Xu Y, Zhao L, Wang C, Rasool Z, Ni M, Wang Q, Li D., 2019. Air pollution option pricing model based on AQI. Atmospheric Pollution Research, 10(3), 665-674.