Overcoming MSE: How We Built an Ultra-Reliable Lahore Smog Forecaster Using PyTorch Transformers and Asymmetric Loss

Haider Ali — Fri, 12 Jun 2026 14:56:55 +0000

Overcoming MSE: How We Built an Ultra-Reliable Lahore Smog Forecaster Using PyTorch Transformers and Asymmetric Loss

Lahore, Pakistan, is home to over 13 million people and is frequently ranked as the most polluted city in the world. During the winter months, a combination of agricultural crop burning, vehicle emissions, and cold weather creates a toxic layer of PM2.5 smog that blankets the region.

To help the public prepare and take preventive actions, we built Saans (meaning "breath" in Urdu)—a state-of-the-art machine learning system designed to forecast PM2.5 levels and US EPA Air Quality Index (AQI) values 24 hours in advance.

This article details the core ML challenge we faced: why standard Mean Squared Error (MSE) loss fails to predict dangerous air quality spikes, and how we solved it using a customized PyTorch Transformer model combined with a Weighted Asymmetric Huber Loss.

The Pitfall: Why Standard MSE Fails for Smog Forecasting

When training neural networks for regression tasks, the default loss function is almost always Mean Squared Error (MSE):

$$\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$$

While mathematically convenient, MSE has two fundamental flaws when applied to life-or-death environmental forecasting:

1. The "Regression to the Mean" Trap

In any annual cycle, extreme smog spikes (where PM2.5 shoots past 300 µg/m³) are relatively rare compared to moderate or clean days. Because MSE penalizes error quadratically, a model trying to minimize average MSE will take the safest path: it will underpredict extreme peaks and overpredict low valleys, essentially smoothing out the forecast line. The model "regresses to the mean," producing flat, useless predictions during the exact hours when public warnings are most critical.

2. Symmetrical Bias is Symmetrically Dangerous

MSE treats over-prediction and under-prediction of the same magnitude identically.

If the actual PM2.5 is 150 µg/m³ (Unhealthy) and the model predicts 50 µg/m³ (Good), the error is -100.
If the actual PM2.5 is 50 µg/m³ (Good) and the model predicts 150 µg/m³ (Unhealthy), the error is +100.

Under MSE, both predictions incur the exact same penalty (10,000). However, in terms of human health, under-prediction is a public health disaster. It tells the public the air is safe, leading parents to send children outdoors without masks during a toxic smog spike. Over-prediction, by contrast, is a safe margin of error that prompts precautionary behavior. We needed a loss function that was asymmetric and risk-averse.

The Solution Part 1: Weighted Asymmetric Huber Loss

To enforce risk-aversion, we implemented a custom loss function in PyTorch: Asymmetric Weighted Huber Loss.

class AsymmetricWeightedHuberLoss(nn.Module):
    def __init__(self, target_min, target_scale, threshold=100.0, asymmetry_factor=5.0, delta=0.1):
        super(AsymmetricWeightedHuberLoss, self).__init__()
        self.target_min = target_min
        self.target_scale = target_scale
        self.threshold = threshold
        self.asymmetry_factor = asymmetry_factor
        self.delta = delta

    def forward(self, pred, target):
        # Reconstruct raw target concentrations to compute weights
        raw_target = target * self.target_scale + self.target_min
        error = pred - target

        # 1. Huber Loss component
        abs_error = torch.abs(error)
        quadratic = torch.clamp(abs_error, max=self.delta)
        linear = abs_error - quadratic
        huber_loss = 0.5 * (quadratic ** 2) + self.delta * linear

        # 2. Asymmetry: Underpredicting spikes (actual > threshold and pred < actual)
        underprediction_mask = (error < 0) & (raw_target > self.threshold)

        # Apply 5x asymmetry penalty to underpredictions on high-pollution days
        weights = torch.ones_like(error)
        weights[underprediction_mask] = self.asymmetry_factor

        # 3. Scale weight: penalize errors more heavily as pollution levels rise
        scale_weight = 1.0 + (raw_target / 150.0)

        loss = weights * scale_weight * huber_loss
        return loss.mean()

How this mathematical formulation solves the problem:

Huber Loss Foundation: For small errors (below delta = 0.1), the loss behaves quadratically. For larger errors, it transitions to a linear penalty. This keeps the optimization robust against extreme data noise on clean days, preventing gradient explosion from random outliers.
Asymmetric Penalty (5x Multiplier): We define a high-pollution threshold (100 µg/m³). If the actual pollution is above this threshold, and the model underpredicts (error < 0), we multiply the loss by 5.0. This forces the neural network's gradients to aggressively steer away from false negatives on hazardous days.
Continuous Target-Dependent Scaling: The term (1.0 + PM2.5_raw / 150.0) scales the loss proportionally to how bad the pollution actually is. An error at 300 µg/m³ is penalized far more heavily than the same error at 30 µg/m³.

The Solution Part 2: PyTorch Transformer Encoder Architecture

Standard recurrent networks (LSTMs/GRUs) struggle with multi-step-ahead forecasting because they compress the entire historical timeline into a single hidden state vector. Over a 72-hour lookback, this creates an information bottleneck.

Instead, Saans uses a custom Transformer Encoder + Multi-Layer Perceptron (MLP) Decoder that directly projects to the 24-hour forecasting horizon.

                  +--------------------------------+
                  |    Output: 24-Hour Forecast    |
                  +--------------------------------+
                                   ^
                                   | [Linear Decoder Layer]
                  +--------------------------------+
                  |  Flatten (seq_len * d_model)  |
                  +--------------------------------+
                                   ^
                                   |
                  +--------------------------------+
                  |   Transformer Encoder Stack    | <--- Extraction of Self-Attention
                  |   (Multi-Head Self-Attention)  |      Weights for Explainability
                  +--------------------------------+
                                   ^
                                   |
                  +--------------------------------+
                  | Positional Sin/Cos Embeddings  |
                  +--------------------------------+
                                   ^
                                   |
                  +--------------------------------+
                  | Input Projection (46 features) |
                  +--------------------------------+
                                   ^
                                   |
              [ 72-Hour Historical Weather + Air Quality Input ]

Architectural Key Features:

Feature Space (46 Variables): Rather than just looking at past PM2.5, the model consumes 46 features, including boundary layer height (which captures atmospheric thermal inversion), wind speed, wind vectors (U and V components to track how smoke drifts from industrial choke points like Sheikhupura and Sundar), cyclical sin/cos encodings for time/dates, and rolling stats.
Sinusoidal Positional Encoding: Because self-attention is permutation-invariant, positional encodings are added to inputs to preserve the precise chronological order of historical hours.
Direct MLP Decoder: Instead of forecasting autoregressively (predicting hour 1, then feeding it back to predict hour 2, which compounds errors), our model flattens the encoder representations and directly projects them to the 24-hour horizon.
Extractable Self-Attention for Explainability: By preserving and averaging the final layer's multi-head attention weights, the dashboard is able to display exactly which historical hours the neural network focused on to make today's forecast.

Performance: The Proof is in the Smog Spikes

We trained and evaluated the model on over 33,000 hourly observations spanning four winter smog seasons (2022–2026). Here is how the Asymmetric Transformer stacks up against a standard MSE Bi-LSTM model on 90th percentile smog spikes (Actual PM2.5 > 171.8 µg/m³):

Metric / Horizon	Standard MSE Bi-LSTM	Asymmetric Bi-LSTM	SOTA Transformer Model (Ours)
90th %ile Spike RMSE	50.47 µg/m³	39.41 µg/m³	40.84 µg/m³
t+12h P90 Spike RMSE	47.91 µg/m³	37.53 µg/m³	40.67 µg/m³
t+24h P90 Spike RMSE	63.64 µg/m³	44.32 µg/m³	45.34 µg/m³
Overall Test R²	N/A	N/A	0.6507 (Solid Fit)
AQI Category Match	N/A	N/A	48.02% (Exact Match)

Here is the evaluation timeline comparison, illustrating how closely the model tracks diurnal cycles and smog onset spikes:

The Essential Trade-Off

A naive reading of the results might notice that the overall test RMSE is slightly higher for the Asymmetric models compared to standard MSE models.

This is an intentional design choice. Because the asymmetric loss applies a 5x penalty for underpredicting spikes, it biases the model's point predictions slightly upwards during high-pollution seasons. This introduces a "safety margin" that eliminates dangerous false negatives on hazardous days—saving lives and protecting health at the cost of a slightly larger average error on normal days.

For a safety-critical public advisory dashboard like Saans, reducing 24-hour lead-time spike errors by nearly 30% (slashing errors by 18.3 µg/m³) is the ultimate validation of this approach.

We can see this design choice reflected clearly in both the scatter distribution (where prediction density is tightly aligned with the y=x perfect forecast line) and the residual analysis:

Figure 2: Predicted vs. Actual PM2.5 concentration, showing high density fit.

Figure 3: Residual Plot (Predicted - Actual). The slight positive residual bias on clean days represents the intentional safety margin.

Interactive Dashboard Implementation

We wrapped this trained PyTorch architecture in a premium, real-time Streamlit dashboard utilizing glassmorphism styling.

The pipeline fetches live CAMS air quality and ERA5 weather parameters for Lahore (or any other specified coordinate).
Preprocesses features, runs them through the GPU/CPU inference graph, and displays a 24-hour forecast timeline.
Displays a clear Public Health Advisory based on estimated US EPA AQI.
Highlights the Top 3 Historical Hours that the Transformer attended to most heavily to construct the forecast.

Try it Yourself

Live App Dashboard: Visit the live tracker at saansai.streamlit.app.
Open Source Code: Explore the full pipeline, training scripts, and preprocessing code in our GitHub Repository.

Saans bridges the gap between complex deep learning architectures and actionable civic utilities, proving that targeted mathematical design is key to addressing real-world environmental crises.

5 Boring Tasks You Can Automate With Python (And How Much Time You'll Save)

Haider Ali — Sat, 30 May 2026 23:23:01 +0000

If you've ever caught yourself doing the same thing on your computer over and over again — copying data between spreadsheets, downloading files one by one, sending the same email to a list of people — there's a good chance Python can do it for you in seconds.
You don't need to be a developer to benefit from automation. You just need to know what's possible.
Here are 5 tasks that are incredibly common, incredibly tedious, and incredibly easy to automate.
1. Scraping Data From Websites
The manual way: Opening 50 product pages, copying the name, price, and description into a spreadsheet. One by one. For hours.
The automated way: A Python script using BeautifulSoup or Selenium visits every page, extracts exactly what you need, and dumps it into a clean CSV — in minutes.
Real use cases:
Monitoring competitor prices
Collecting leads from directories
Pulling property listings for analysis
Tracking stock availability
Time saved: What takes a human 4-6 hours takes a script about 3 minutes.
2. Processing Excel and CSV Files
The manual way: Opening a spreadsheet, filtering rows, copy-pasting into another sheet, formatting columns, running calculations. Repeat every Monday morning forever.
The automated way: A Python script using pandas reads your file, cleans the data, runs every calculation, and spits out a formatted report — automatically, on a schedule if you want.
Real use cases:
Weekly sales reports
Cleaning messy exported data
Merging multiple CSVs into one
Flagging anomalies or missing values
Time saved: A 2-hour weekly task becomes a script you run once and forget.
3. Sending Automated Emails
The manual way: Writing the same email 200 times with slightly different names and details. Or worse, a mail merge that breaks every time.
The automated way: Python reads a list of contacts, personalizes each email with their name, order number, or whatever you need, and sends them all in one go.
Real use cases:
Follow-up emails after purchases
Appointment reminders
Weekly digests or newsletters
Notifying a team when something changes
Time saved: Hours of copy-pasting replaced by a script that runs in under a minute.
4. Renaming, Moving, and Organizing Files
The manual way: Going through hundreds of downloaded files, renaming them to a consistent format, moving them into the right folders. Soul-crushing work.
The automated way: A Python script watches a folder, detects new files, renames them based on rules you define, and sorts them automatically.
Real use cases:
Organizing downloaded invoices by date and client
Renaming photos from a camera in bulk
Sorting exports from different tools into project folders
Archiving old files automatically
Time saved: A task you dread every week disappears entirely.
5. Filling Out Forms and Clicking Through Websites
The manual way: Logging into a portal, navigating to the same page, entering the same data, downloading a report. Every single day.
The automated way: Selenium or Playwright controls a browser like a human would — logging in, clicking buttons, filling fields, downloading files — without you touching anything.
Real use cases:
Downloading reports from portals that don't have an API
Filling out repetitive government or internal forms
Checking prices or availability across multiple sites
Logging into systems and extracting data on a schedule
Time saved: A 20-minute daily routine becomes something that happens while you make coffee.
So What Does This Actually Cost?
A custom Python script for any of the above typically costs between $30 and $150 depending on complexity — and it pays for itself the first week you use it.
If you're spending 3 hours a week on a task and your time is worth anything, automation is a no-brainer.
Want One Built For You?
I'm a Python developer specializing in automation, web scraping, and AI integration. If any of the above sounds like something you need, feel free to check out my Fiverr gig or drop me a message — I'm happy to discuss your specific use case before you commit to anything.
Hire me on Fiverr
www.fiverr.com/mlworker
Have a task you think could be automated? Drop it in the comments — I'll tell you if Python can handle it.

DEV Community: Haider Ali

Overcoming MSE: How We Built an Ultra-Reliable Lahore Smog Forecaster Using PyTorch Transformers and Asymmetric Loss

Overcoming MSE: How We Built an Ultra-Reliable Lahore Smog Forecaster Using PyTorch Transformers and Asymmetric Loss

The Pitfall: Why Standard MSE Fails for Smog Forecasting

1. The "Regression to the Mean" Trap

2. Symmetrical Bias is Symmetrically Dangerous

The Solution Part 1: Weighted Asymmetric Huber Loss

How this mathematical formulation solves the problem:

The Solution Part 2: PyTorch Transformer Encoder Architecture

Architectural Key Features:

Performance: The Proof is in the Smog Spikes

The Essential Trade-Off

Interactive Dashboard Implementation

Try it Yourself

5 Boring Tasks You Can Automate With Python (And How Much Time You'll Save)