How to Stack Astrophotography Images Programmatically with Python

#python #astrophotography #imageprocessing #numpy

The Hacker News front page recently featured a fascinating story about an astrophotographer whose deep-sky images ended up in the movie Project Hail Mary. It got me thinking about the technical side of astrophotography — specifically, the image stacking problem that every astrophotographer (and increasingly, every developer working with scientific imaging) eventually runs into.

Here's the frustrating scenario: you've captured 200 frames of the Orion Nebula, each one noisy and underwhelming on its own. You need to combine them into a single clean image. And if you just naively average the pixel values, you get garbage.

Let me walk you through why, and how to fix it.

Why Naive Averaging Fails

The core issue is that astronomical images have extremely low signal-to-noise ratios. Each individual frame contains the faint signal you want (nebula, galaxy, whatever) buried under read noise, thermal noise, and light pollution gradients. A simple mean average of all frames will reduce random noise — that's basic statistics — but it won't handle outliers.

Satellite trails, hot pixels, cosmic ray hits, and airplane lights all contaminate individual frames. A single bright satellite trail across one frame will leave a faint ghost line in your averaged result.

import numpy as np
from astropy.io import fits
import glob

# Load all FITS frames into a 3D array
files = sorted(glob.glob('lights/*.fits'))
frames = np.array([fits.getdata(f).astype(np.float32) for f in files])

# Naive mean — don't do this for real work
naive_stack = np.mean(frames, axis=0)
# Problem: satellite trails and hot pixels bleed through

That np.mean call treats every pixel equally. A satellite trail registering at 65000 ADU in one frame still contributes significantly to the final value, even across 200 frames.

The Real Fix: Sigma-Clipped Stacking

The standard solution is sigma clipping — you reject pixel values that fall outside a certain number of standard deviations from the median before averaging. This is a well-understood statistical technique, but implementing it efficiently for large image stacks takes some care.

from astropy.stats import sigma_clip

def sigma_clipped_stack(frames, sigma=2.5):
    """
    Stack frames using sigma-clipped mean.
    Each pixel position is evaluated independently across all frames.
    """
    # sigma_clip returns a masked array — rejected values are masked out
    clipped = sigma_clip(frames, sigma=sigma, axis=0, maxiters=3)

    # Mean of non-rejected values at each pixel position
    stacked = np.ma.mean(clipped, axis=0).data

    return stacked

result = sigma_clipped_stack(frames, sigma=2.5)
fits.writeto('stacked_result.fits', result, overwrite=True)

The sigma=2.5 parameter means any pixel value more than 2.5 standard deviations from the median gets thrown out before computing the mean. Satellite trail? Gone. Random hot pixel spike? Gone. Your actual nebula signal, which is consistent across frames, survives.

Astropy's sigma_clip documentation covers the full API if you want to tune the behavior.

The Memory Problem (And How to Solve It)

Here's where it gets tricky in practice. If you're stacking 200 frames at 4096x4096 pixels in float32, that's roughly 12.8 GB in memory. Most laptops will choke on that.

I ran into this exact issue last year while processing data from a friend's telescope setup. The fix is to process the stack in chunks — you don't need all frames in memory simultaneously for every pixel position.

def chunked_sigma_stack(files, sigma=2.5, chunk_height=256):
    """
    Process the stack in horizontal strips to keep memory usage sane.
    Only loads the rows we need from each frame at a time.
    """
    # Read first frame to get dimensions
    header = fits.getheader(files[0])
    h, w = header['NAXIS2'], header['NAXIS1']
    result = np.zeros((h, w), dtype=np.float32)

    for y_start in range(0, h, chunk_height):
        y_end = min(y_start + chunk_height, h)

        # Load only the rows we need from each frame
        chunk_stack = np.array([
            fits.getdata(f)[y_start:y_end, :].astype(np.float32)
            for f in files
        ])

        clipped = sigma_clip(chunk_stack, sigma=sigma, axis=0, maxiters=3)
        result[y_start:y_end, :] = np.ma.mean(clipped, axis=0).data

        # Optional: track progress
        print(f'Processed rows {y_start}-{y_end} of {h}')

    return result

files = sorted(glob.glob('lights/*.fits'))
stacked = chunked_sigma_stack(files, chunk_height=512)

With chunk_height=512 and 200 frames, you're only holding about 800 MB at a time instead of 12.8 GB. The tradeoff is more disk I/O since you're reading each file multiple times, but for most setups the memory savings are worth it.

Don't Forget Calibration Frames

One thing that trips people up — and I've seen this in several open-source stacking pipelines — is skipping calibration before stacking. If you stack uncalibrated frames, you're baking systematic errors into your result.

The standard calibration pipeline looks like this:

Dark frames — exposures with the lens cap on, same duration and temperature as your light frames. These capture thermal noise patterns specific to your sensor.
Flat frames — evenly illuminated exposures that map vignetting and dust shadows across your optical path.
Bias frames — zero-length exposures capturing the read noise floor.

The calibration formula for each light frame:

calibrated = (light - master_dark) / (master_flat - master_bias)

Apply this to every frame before stacking. The master dark, flat, and bias are themselves median-stacked from their respective calibration sets. Median stacking (rather than mean) works well here because calibration frames tend to have fewer outlier issues.

# Create master calibration frames using median
darks = np.array([fits.getdata(f).astype(np.float32) for f in dark_files])
master_dark = np.median(darks, axis=0)

flats = np.array([fits.getdata(f).astype(np.float32) for f in flat_files])
master_flat = np.median(flats, axis=0)

# Normalize flat so mean value is 1.0
master_flat /= np.mean(master_flat)

# Calibrate a single light frame
def calibrate(light_data, master_dark, master_flat):
    corrected = (light_data - master_dark) / master_flat
    return np.clip(corrected, 0, None)  # no negative pixel values

Frame Alignment: The Other Half of the Problem

I glossed over something critical — your frames need to be aligned before stacking. The stars drift between exposures due to imperfect tracking, and even a few pixels of drift will blur your final image.

The standard approach is to detect stars in each frame, match star patterns across frames, and compute an affine transformation to register them. The astroalign library handles this well:

import astroalign as aa

reference = fits.getdata(files[0]).astype(np.float32)

for f in files[1:]:
    source = fits.getdata(f).astype(np.float32)
    # Returns the aligned image and the transformation used
    aligned, footprint = aa.register(source, reference)

This uses triangle matching between detected star patterns — it's rotational and scale invariant, which is exactly what you need for equatorial mount tracking errors.

Prevention Tips

A few things I've learned the hard way:

Always shoot more frames than you think you need. Sigma clipping works better with more data points per pixel. 50 frames is okay, 200 is better.
Check your frames before stacking. A quick visual scan or automated FWHM (star sharpness) measurement will help you toss frames where the wind shook the telescope or clouds rolled through.
Save intermediate results. Write your calibrated, aligned frames to disk before stacking. Debugging a stacking issue is much easier when you're not re-running calibration every time.
Use FITS format, not JPEG or PNG. FITS preserves the full dynamic range of your sensor. The moment you save to an 8-bit format, you've thrown away the data that makes stacking worthwhile.

The combination of proper calibration, alignment, and sigma-clipped stacking is what turns 200 mediocre exposures into a single stunning deep-sky image. It's the same pipeline whether you're a hobbyist with a DSLR on a star tracker or processing data from a professional observatory. The math doesn't care about your budget — it just needs enough frames and clean statistics.