Assessing the Efficacy of Air Pollution Mitigation in Beijing: Insights from GIS and Machine Learning Analyses

,


Figure 1 My region of interest (ROI) consists of Beijing Municipality, Hebei Province, and Tianjin
Municipality.The Winter Olympics of 2022 will occur in the circled area.

Conceptual Framework
Based on this question, I propose two hypotheses.H0: If Beijing effectively treats pollution, we should observe a decrease in the pollution density across years in surrounding areas.
H1: If Beijing shifts its pollution, we should observe more pollutants.density in surrounding areas.I attempt to test these hypotheses through processing remote sensing data in Google Earth Engine and visualizing them in QGIS.

Data
The main pollutants in Beijing are sulfur dioxide (SO2), nitrogen dioxide (NO2), and particulate matter (e.g., PM2.5).To assess pollutant density changes, Moderate Resolution Imaging Spectroradiometer (MODIS) and Sentinel-5P products were used.MODIS has been a multifunctional satellite platform of NASA since 1999.It offers a satellite product called the aerosol optical depth (AOD).It captures the amount of direct sunlight that is an obstacle from reaching the ground by aerosol particles.Many researchers have used the AOD to predict the PM2.5 concentration because it is correlated with the PM2.5 concentration (Filonchyk 2019).Xu et al. discovered that Beijing city has a "southeast high and northwest low" PM2.5 distribution using AOD and multiple other indices (2020).The Sentinel-5P satellite is a European satellite product that was issued in October 2017.It offers air quality-related remote sensing data, e.g., sulfur dioxide and nitrogen dioxide data.
In our machine learning analysis, we utilized satellite-derived PM2.5 data provided by the Atmospheric Composition Analysis Group at Washington University in St. Louis.(Satellitederived PM2.5 | Atmospheric Composition Analysis Group | Washington University in St. Louis,n.d.)

Methods
Due to the time limitation of the Sentinel-5P, I use "MODIS Terra & Aqua MAIAC Land Aerosol Optical Depth Daily 1 km" for my main analysis.Later, I used "Sentinel-5P OFFL NO2: Offline Nitrogen Dioxide" and "Sentinel-5P OFFL SO2: Offline Sulphur Dioxide" for a complementary analysis.

Main analysis methods
To assess the PM2.5 change over a long period, I chose 2013 as my starting year and 2019 as the ending year for a sharp contrast.To obtain this dataset, I created an image collection variable by importing MODIS AOD data and selecting the "OD_055" band (green band (0.55 μm) aerosol optical depth overland).Then, I use date filters to create the 2013 and 2019 image collection variables.To remove the noise caused by clouds, I applied a mean reducer and obtained two images.Finally, I subtracted the 2013 image from the 2019 image and exported the difference as a raster layer through my ROI geometry 1 .
In QGIS, I first imported and selected my ROI administration boundary shape fine.Then, I imported the 2019-2013 difference raster layer from GEE.Finally, I used a zonal statistical tool to obtain the mean change at the city and area levels.

Complementary analysis method
Using the Sentinel-5P NO2 and SO2 data, I first created annual column density image collections separately for the years 2018, 2019, and 2020.Using these image collections, I plotted the UI day-by-year series chart.Finally, I computed the image for each year using a mean reducer and exported the image for zonal statistical analysis in the QGSI.

Machine learning analysis method
In addition to remote sensing data analysis, a machine learning approach was applied to predict and understand the dynamics of PM2.5 levels in Beijing.A logistic model was chosen due to the observed S-shaped trend in the PM2.5 concentrations in Beijing from 2013 to 2022.There is a clear S curve.Specifically, Beijing experienced severe air pollution during 2013 and 2014, and then, the PM2.5 concentration decreased significantly from 2015 to 2020.After 2020, the level of PM2.5 in Beijing according to the logistic model, chosen for its similarity to the observed "S" shape in the PM2.5 data, estimates the probability of the dependent variable based on influencing independent variables (Cui & Zhao, 2019).The formula for the simple logistic model is as follows: where μ is a location parameter and s is a scale parameter.Its visualization is shown below (L.Liu et al., 2010).
Initially, we defined a custom sigmoid function based on the logistic regression function.Estimation of the location and scale parameters is essential before inputting the function into machine learning libraries.A rough estimation is obtained through visualization.Next, we normalize the data by dividing both the Year and adjusted PM2.5 by their respective maximum values, ensuring that their ranges fall within 0 and 1.
Normalization is a widely acknowledged practice for enhancing the performance and stability of machine learning models (Cabello-Solorzano et al., 2023).Third, we use scipy.optimize.curve_fitto fit our defined sigmoid function using nonlinear least squares for the best location parameter and scale parameter.2

Main analysis results
In general, the graph shows that the AOD decreased in the region because all changes were negative.The most significant decreases occurred in the Xingtai, Hengshui, and Handan areas, which belong to the Hebei Province.These areas are far from Beijing city.The decrease in the ADO of Beijing is consistent with decrease in the AOD in its surrounding areas, which is the opposite of the H1 suggested.Instead, there could be a positive spillover from the South to the North.

Complementary analysis results 3
Based on the UI chart, NO2 and SO2 density did not significantly change from 2019 to 2020.In addition, 2018 data are missing.Hence, I used 2020 data for visualization to present the current pollutant density level using QGIS.Even though the change across the year was not significant, it is worth mentioning that the NO2 density in the first six months of 2020 was lower than that in the same period of 2019.This could be caused by the COVID-19 lockdown when people used less transportation.Based on the SO2 graphs, I believe that in 2020, the average SO2 density was slightly greater than that in 2019, which could be caused by people using more electricity at home during the lockdown since SO2 originates from burning fusil fuel, and burning fusil fuel is the main source of generated electricity.
Beijing and several towns in Hebei Province have relatively high SO2 and NO2 densities.For example, the Handan area in Hebei Province has relatively high SO2 and NO2 densities, although it experienced a large decrease in AOD from 2013 to 2019.These distributions are possibly related to local industries and urbanization levels.

Machine learning analysis results
The R2 values obtained for Beijing and Henan were 0.99 and 0.96, respectively, indicating that the model can account for 99% and 96% of the variability in the PM2.5 concentration in Beijing and Henan, respectively.

Discussion and Conclusions
In conclusion, the integration of remote sensing data and machine learning models provides a comprehensive understanding of Beijing's air pollution dynamics.The combined evidence suggests that Beijing has not shifted its pollution to surrounding areas, but challenges persist with relatively high SO2 and NO2 density.The machine learning model offers valuable predictions for future PM2.5 levels and underscores the ongoing need for effective environmental policies.

Remote sensing conclusion:
Although the results suggest that Beijing did not transfer its pollution to surrounding areas, they do not sufficiently support the efficiency of the pollution treatment policy, as evidenced by the persistently high SO2 and NO2 densities in Beijing and certain regions.Additionally, the study has several limitations, such as the absence of an extensive data processing regimen.The use of primary AOD and Sentinel-5P SO2 and NO2 data with a mean reducer for cloud removal lacks the sophistication required for accurate analysis, including the consideration of factors such as seasonality.Despite these limitations, the study offers valuable insights, prompting potential future research avenues.The examination of pollution shifting at the provincial level, encompassing more regions for counterfactual analysis, and investigating the drivers behind the observed decrease in AOD in the southern part of the region are promising areas for further investigation.

Machine Learning Analysis
The analysis of PM2.5 data spanning from 2013 to 2022 reveals that Beijing did not transfer pollution to Hebei, as indicated by the absence of intensified air pollution in Hebei since 2013.Moreover, the predictive capabilities of the logistic model, which is specifically tailored for Beijing, anticipate a stabilization of PM2.5 levels of approximately 40 µg/m3 in the absence of new external influences, such as environmental policies.Notably, this projected level surpasses the annual PM2.5 standard set by the US EPA at 9.0 µg/m3, indicating potential long-term respiratory health risks (National Ambient Air Quality Standards (NAAQS) for PM | US EPA, 2024).

Future Research Directions
The considerable impact of transportation on air pollution necessitates a comprehensive exploration of transportation-related variables employing statistical methodologies (X.Ma et al., 2020).It is imperative to incorporate traffic volume as a co-factor, with its measurement facilitated through the estimation of traffic queues using onramp metering techniques, as explicated by Liu et al. (2022).Moreover, a pivotal aspect of this inquiry involves capturing temporal variations in traffic volume, as advocated by X. Ma et al. (2023).
In the context of environmental analysis, the application of machine learning, particularly leveraging transformers, attention-related technologies (Lyu, Zheng, et al., 2022b), and graph convolutional networks (Wang et al., 2023), has proven instrumental in revealing intricate relationships within satellite imagery.This approach offers a sophisticated computational methodology, such as deep neural networks, for discerning nuanced patterns and associations inherent in satellite data (Wang, Jin, et al., 2023).
Future research should delve into integrating machine learning techniques to augment the accuracy and predictive capabilities of air pollution models.This exploration can be extended to investigating the interplay between environmental policies and pollution levels utilizing advanced analytical tools such as graph convolutional networks and deep learning (Wei et al., 2023).Additionally, in the machine learning domain, the potential deployment of deep learning has become feasible with a more granular and extensive dataset.Deep learning, which is well established in various applications such as recommendations (Wu & Chi, 2024), excels in its intelligence and capacity to discern intricate patterns.A larger volume of data would facilitate a more nuanced exploration, thereby enhancing the model's ability to extract meaningful insights.
Another salient factor warranting examination in the context of environmental dynamics pertains to the perceptions of residents.Employing sentiment analysis proves instrumental in discerning the nuanced perspectives of netizens regarding fluctuations in air quality across temporal intervals (Wu et al., 2024).Predominantly, such sentiment analysis methodologies leverage sophisticated computational models, notably large language models (Wu, Xiang, et al., 2024), GenAI models (Xiang et al., 2024), BERT models (Pang et al., 2019), and Multimodal Transformer (Lyu, Dong, et al., 2022b).