Kofi Adu Agyekum

Posted on Jun 4

Calculating NDVI in Python with Rasterio and GeoPandas

#ndvi #gis #remotesensing #vegetationmapping

After learning a lot from my first attempt at NDVI processing (where just about everything went wrong), I built a clean, working NDVI pipeline using Python. This post walks through how I calculated and visualized NDVI over the Pohjois-Savo region in Finland using Landsat 8 data.

What is NDVI?

NDVI stands for Normalized Difference Vegetation Index. It’s a widely used metric in remote sensing that helps identify healthy vegetation using the difference between red and near-infrared (NIR) light reflected by surfaces.

The formula:

NDVI = (NIR - Red) / (NIR + Red)

Higher NDVI values generally indicate denser, healthier vegetation.

Tools & Data

Landsat 8 Surface Reflectance data from USGS EarthExplorer
- Band 4 = Red
- Band 5 = Near Infrared (NIR)
Python Libraries:
- rasterio (for raster data)
- geopandas (for shapefiles/geojson)
- numpy (for array math)
- matplotlib (for visualization)
Boundary: A GeoJSON file covering part of the Pohjois-Savo region in Finland

Step-by-Step Workflow

1. Import libraries

import rasterio
import geopandas as gpd
import numpy as  np
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
from rasterio.mask import mask
from rasterio.crs import CRS
from rasterio.features import shapes
from shapely.geometry import shape

2. Load the Red, NIR Bands, and Reproject the GeoJSON Boundary

red_path = '../data/raw/band4.TIF'
nir_path = '../data/raw/band5.TIF'
shape_path = '../data/raw/workarea.geojson'

3. Load, Clip the Bands to the Boundary and Calculate NDVI

#open bands
with rasterio.open(red_path) as red_src, rasterio.open(nir_path) as nir_src:
    red = red_src.read(1).astype('float32')
    nir = nir_src.read(1).astype('float32')

    #check for invalid crs (EPSG:9001 or None)
    def fix_crs(src):
        if not src.crs or "9001" in str(src.crs):
            return CRS.from_epsg(32635)  
        return src.crs

    red_crs = fix_crs(red_src)
    nir_crs = fix_crs(nir_src)

    #use red's crs as base
    working_crs = red_crs

    #load and reproject farm boundary
    #if None you can set to 3857 also   
    farm_shape = gpd.read_file(shape_path)
    if farm_shape.crs is None:
        farm_shape.set_crs("EPSG:4326", inplace=True)
    farm_shape = farm_shape.to_crs(working_crs)

    #clip raster bands
    clipped_red, _ = mask(red_src, farm_shape.geometry, crop=True)
    clipped_nir, _ = mask(nir_src, farm_shape.geometry, crop=True)

    #recalculate NDVI
    ndvi_clipped = (clipped_nir[0] - clipped_red[0]) / (clipped_nir[0] + clipped_red[0])

4. Plot your graph

plt.figure(figsize=(8, 6))
plt.imshow((nir - red) / (nir + red), cmap='RdYlGn', vmin=-1, vmax=1)
plt.colorbar()
plt.title("Full NDVI")
plt.axis("off")
plt.show()

You will see something like this:

5. Classify NDVI (Optional)

def classify_ndvi(ndvi_array):
    classified = np.full(ndvi_array.shape, -1, dtype='int8') #-1 = no data
    classified[(ndvi_array >= -1) & (ndvi_array < 0.2)] = 0  #barren/water
    classified[(ndvi_array >= 0.2) & (ndvi_array <= 0.5)] = 1 #moderate vegetation
    classified[(ndvi_array > 0.5) & (ndvi_array <= 1)] = 2 #healthy/high vegetation
    return classified

# appling to ndvi array
classified_ndvi = classify_ndvi(ndvi_clipped)

6. Plot Classified Graph

colors = ['skyblue', 'yellowgreen', 'darkgreen']
cmap = mcolors.ListedColormap(colors)
labels = ['Barren/Water', 'Moderate', 'Healthy']

plt.figure(figsize=(8, 6))
img = plt.imshow(classified_ndvi, cmap=cmap, vmin=0, vmax=3)
cbar = plt.colorbar(img, ticks=[0.5, 1.5, 2.5])
cbar.ax.set_yticklabels(labels)
plt.title("NDVI Vegetation Classification")
plt.axis("off")
plt.tight_layout()
plt.show()

The result will look something like this:

7. Polygonise your data

with rasterio.open(red_path) as src:
    transform = src.transform
    crs = src.crs

#generate dictionary (geometry, value) from classified ndvi
polygons = []
for geom, value in shapes(classified_ndvi, transform=transform):
    if value != -1:#skips no data
        polygons.append({"geometry": shape(geom), "value": int(value)})


#convert to geodataframe
gdf = gpd.GeoDataFrame(polygons, crs=crs)

#adding labels
gdf["label"] = gdf["value"].map({0: "Barren", 1: "Moderate", 2: "Healthy"})

#save as geojson
gdf.to_file("classified_ndvi_polygons.geojson", driver="GeoJSON")

#preview
gdf.head()

8. GeoJson, Shapefiles and GeoPackage

#save as geojson file
gdf.to_file("classified_ndvi_polygons.geojson", driver="GeoJSON")

#save as a .shp file
gdf.to_file("classified_ndvi_polygons.shp")

#save as a geopackage
gdf.to_file("classified_ndvi.gpkg", driver="GPKG")

While working on this project, I also ran into a few format-related issues when exporting the classified results. At first, I saved the vectorized NDVI zones as a GeoJSON, but the file ended up being around 352MB, which was too large for QGIS to handle smoothly. It either failed to load or froze entirely. I tried again using a Shapefile, which worked better for loading large datasets, but came with limitations like short field names and no support for special characters. Finally, I exported the same data as a GeoPackage (.gpkg), and it worked perfectly. It handled the large file size, preserved full attribute names, and opened cleanly in QGIS. Based on this, I’d recommend GeoJSON for lightweight data, Shapefiles for compatibility, and GeoPackage for larger or more complex outputs.

9. Loading the Result in QGIS

Once I had the classified NDVI zones vectorized and saved as a GeoPackage, I brought the file into QGIS to generate the final map layout.

QGIS handled the GeoPackage smoothly — no freezing or lag, even with thousands of polygons. I applied a simple categorized style based on NDVI class (barren, moderate, healthy) for easy interpretation.

From there, I added:

A base map for reference (using XYZ Tiles)
A legend, scale bar, and labels
A title and layout design using the Print Layout tool

Then I exported the result as a high-resolution PDF map.

This step wasn’t just about visualization — it was about validating that the Python-generated output aligned with the real-world landscape and could be presented clearly.

Result

The final NDVI map showed strong vegetation density in forested areas and lower NDVI in more open or cleared zones. I processed everything in Python and confirmed that the output matched what I saw later in QGIS when visualizing the same area.

Takeaways

Always check CRS and bit depth before processing
NDVI outputs may need classification to be more interpretable
Vectorizing large rasters can produce heavy files — use GeoPackage or Shapefile for big outputs

If you want to see the mistakes that led to this working version, check out:

Before I Got the Map, I Got It All Wrong

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.

DEV Community