Air pollution is a growing concern globally, and with increasing industrialization and urbanization, it becomes crucial to monitor and predict air quality in real-time. One of the most reliable ways to quantify air pollution is by calculating theAir Quality Index (AQI). In this article, we will explore how to predict AQI using Python, leveraging data science tools and machine learning algorithms.
What is AQI?
TheAir Quality Index (AQI) is a standardized indicator used to communicate how polluted the air currently is or how polluted it is forecast to become. The AQI is calculated based on pollutants such as:
Each pollutant has a sub-index, and the highest sub-index among them becomes the AQI.
I = \frac{I_{HI} - I_{LO}}{BP_{HI} - BP_{LO}} \times (C - BP_{LO}) + I_{LO}
Where:
- I is the AQI
- C is the concentration of the pollutant
- BP_{HI},BP_{LO} are the breakpoint concentrations
- I_{HI},I_{LO} are the AQI values corresponding to those breakpoints
We can see how air pollution is by looking at the AQI
AQI Level | AQI Range |
---|
Good | 0 - 50 |
Moderate | 51 - 100 |
Unhealthy | 101 - 150 |
Unhealthy for Strong People | 151 - 200 |
Hazardous | 201+ |
Let's find the AQI based on Chemical pollutants using Machine Learning Concept.
Data Set Description
It contains 7 attributes, of which 6 are chemical pollution quantities and one is Air Quality Index. AQI Value, CO AQI Value, Ozone AQI Value, NO2 AQI Value, PM2.5 AQI Value, lat,LNG are independent attributes. air_quality_index is a dependent attribute. Since air_quality_index is calculated based on the 7 attributes.
As the data is numeric and there are no missing values in the data, so no preprocessing is required. Our goal is to predict the AQI, so this task is either Classification or regression. So as our class label is continuous,regressiontechnique is required.
Step-by-Step Process to Predict AQI
1.Importing Libraries
Pythonimportpandasaspdimportnumpyasnpimportmatplotlib.pyplotaspltimportseabornassnsfromsklearn.model_selectionimporttrain_test_splitfromsklearn.linear_modelimportLinearRegressionfromsklearn.ensembleimportRandomForestRegressorfromsklearn.metricsimportmean_absolute_error,mean_squared_error,r2_score
2.Loading the Dataset
We’ll use a dataset with pollutant concentration levels and corresponding AQI values.
Pythondata=pd.read_csv('air_quality_data.csv')print(data.head())
3.Data Preprocessing
Handle missing values, rename columns, and check data types.
Pythondata=data.dropna()data.columns=[col.strip().lower()forcolindata.columns]
4.Exploratory Data Analysis (EDA)
Visualizing relationships between variables.
Pythonsns.pairplot(data)plt.show()corr=data.corr()sns.heatmap(corr,annot=True,cmap='coolwarm')
5.Feature Selection
Choose relevant features for training.
PythonX=data[['co aqi value','ozone aqi value','no2 aqi value','pm2.5 aqi value']]y=data['aqi value']
6.Train-Test Split
PythonX_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=42)
7.Model Training (Random Forest)
Pythonmodel=RandomForestRegressor(n_estimators=100,random_state=42)model.fit(X_train,y_train)
8.Model Evaluation
Pythony_pred=model.predict(X_test)print("Mean Absolute Error:",mean_absolute_error(y_test,y_pred))print("Mean Squared Error:",mean_squared_error(y_test,y_pred))print("R2 Score:",r2_score(y_test,y_pred))
9.Plotting Results
Pythonplt.figure(figsize=(10,6))plt.plot(y_test.values,label='Actual AQI')plt.plot(y_pred,label='Predicted AQI',alpha=0.7)plt.title('Actual vs Predicted AQI')plt.legend()plt.show()
Output:
Feature Correlation MapModel Evaluation Metrics: Mean Absolute Error: 0.09 Mean Squared Error: 2.59 R2 Score: 1.00
Predicted AQIReal-world Applications
- Smart cities to monitor pollution in real-time.
- Healthcare apps to warn sensitive populations.
- Environmental agencies for policy formulation.
Dataset Link:click here.
Predicting the Air Quality Index using Python

Predicting the Air Quality Index using Python

Air Quality Index Prediction in Machine Learning using Python