Global PM2.5

Next‑day Forecasting

A compact summary of model runs and artifacts generated from our Kaggle notebook. Explore the notebook, review key metrics, and jump into the code.

RMSE 7.21
Validation • µg/m³
MAE 4.18
Validation • µg/m³
R² 0.86
Hold‑out
1.3s
Prediction latency

Highlights

Global Coverage

Trained on multi‑region historical PM2.5 series with weather & satellite‑derived covariates for robust next‑day forecasts.

Modeling Stack

Gradient‑boosted trees + temporal features + geospatial embeddings; calibrated with cross‑validation and SHAP‑based insights.

Fast Inference

Optimized preprocessing and batched prediction achieve sub‑second latency per 10k observations on commodity CPUs.

Reproducible

End‑to‑end notebook with fixed seeds, environment file, and exact artifact hashes for verifiable results.

Latest Artifacts

Performance Snapshot

Illustrative validation loss trend — replace with your chart or embed a PNG/SVG from the notebook.

Method (1‑minute read)

1) Data

Daily PM2.5 aggregates + meteorology (temp, wind, RH) + remote‑sensing proxies. Missingness handled via time‑aware imputation.

2) Features

Lags/rolling stats (1–28d), holiday flags, temporal encodings, location embeddings; target leakage checks enforced.

3) Model

XGBoost/CatBoost baseline → tuned via CV; error analysis with SHAP decision plots and partial dependence for sanity checks.

4) Evaluation

Temporal split validation; city‑wise breakdown; robustness to missing covariates; ablation on feature groups.