DEV Community

Kofi Adu Agyekum
Kofi Adu Agyekum

Posted on

Calculating NDVI in Python with Rasterio and GeoPandas

After learning a lot from my first attempt at NDVI processing (where just about everything went wrong), I built a clean, working NDVI pipeline using Python. This post walks through how I calculated and visualized NDVI over the Pohjois-Savo region in Finland using Landsat 8 data.


What is NDVI?

NDVI stands for Normalized Difference Vegetation Index. It’s a widely used metric in remote sensing that helps identify healthy vegetation using the difference between red and near-infrared (NIR) light reflected by surfaces.

The formula:

NDVI = (NIR - Red) / (NIR + Red)
Enter fullscreen mode Exit fullscreen mode

Higher NDVI values generally indicate denser, healthier vegetation.


Tools & Data

  • Landsat 8 Surface Reflectance data from USGS EarthExplorer
    • Band 4 = Red
    • Band 5 = Near Infrared (NIR)
  • Python Libraries:
    • rasterio (for raster data)
    • geopandas (for shapefiles/geojson)
    • numpy (for array math)
    • matplotlib (for visualization)
  • Boundary: A GeoJSON file covering part of the Pohjois-Savo region in Finland

Step-by-Step Workflow

1. Import libraries

import rasterio
import geopandas as gpd
import numpy as  np
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
from rasterio.mask import mask
from rasterio.crs import CRS
from rasterio.features import shapes
from shapely.geometry import shape
Enter fullscreen mode Exit fullscreen mode

2. Load the Red, NIR Bands, and Reproject the GeoJSON Boundary

red_path = '../data/raw/band4.TIF'
nir_path = '../data/raw/band5.TIF'
shape_path = '../data/raw/workarea.geojson'
Enter fullscreen mode Exit fullscreen mode

3. Load, Clip the Bands to the Boundary and Calculate NDVI

#open bands
with rasterio.open(red_path) as red_src, rasterio.open(nir_path) as nir_src:
    red = red_src.read(1).astype('float32')
    nir = nir_src.read(1).astype('float32')

    #check for invalid crs (EPSG:9001 or None)
    def fix_crs(src):
        if not src.crs or "9001" in str(src.crs):
            return CRS.from_epsg(32635)  
        return src.crs

    red_crs = fix_crs(red_src)
    nir_crs = fix_crs(nir_src)

    #use red's crs as base
    working_crs = red_crs

    #load and reproject farm boundary
    #if None you can set to 3857 also   
    farm_shape = gpd.read_file(shape_path)
    if farm_shape.crs is None:
        farm_shape.set_crs("EPSG:4326", inplace=True)
    farm_shape = farm_shape.to_crs(working_crs)

    #clip raster bands
    clipped_red, _ = mask(red_src, farm_shape.geometry, crop=True)
    clipped_nir, _ = mask(nir_src, farm_shape.geometry, crop=True)

    #recalculate NDVI
    ndvi_clipped = (clipped_nir[0] - clipped_red[0]) / (clipped_nir[0] + clipped_red[0])
Enter fullscreen mode Exit fullscreen mode

4. Plot your graph

plt.figure(figsize=(8, 6))
plt.imshow((nir - red) / (nir + red), cmap='RdYlGn', vmin=-1, vmax=1)
plt.colorbar()
plt.title("Full NDVI")
plt.axis("off")
plt.show()
Enter fullscreen mode Exit fullscreen mode

You will see something like this:

Graph Plotted

5. Classify NDVI (Optional)

def classify_ndvi(ndvi_array):
    classified = np.full(ndvi_array.shape, -1, dtype='int8') #-1 = no data
    classified[(ndvi_array >= -1) & (ndvi_array < 0.2)] = 0  #barren/water
    classified[(ndvi_array >= 0.2) & (ndvi_array <= 0.5)] = 1 #moderate vegetation
    classified[(ndvi_array > 0.5) & (ndvi_array <= 1)] = 2 #healthy/high vegetation
    return classified

# appling to ndvi array
classified_ndvi = classify_ndvi(ndvi_clipped)
Enter fullscreen mode Exit fullscreen mode

6. Plot Classified Graph

colors = ['skyblue', 'yellowgreen', 'darkgreen']
cmap = mcolors.ListedColormap(colors)
labels = ['Barren/Water', 'Moderate', 'Healthy']

plt.figure(figsize=(8, 6))
img = plt.imshow(classified_ndvi, cmap=cmap, vmin=0, vmax=3)
cbar = plt.colorbar(img, ticks=[0.5, 1.5, 2.5])
cbar.ax.set_yticklabels(labels)
plt.title("NDVI Vegetation Classification")
plt.axis("off")
plt.tight_layout()
plt.show()
Enter fullscreen mode Exit fullscreen mode

The result will look something like this:

Classified Graph

7. Polygonise your data

with rasterio.open(red_path) as src:
    transform = src.transform
    crs = src.crs

#generate dictionary (geometry, value) from classified ndvi
polygons = []
for geom, value in shapes(classified_ndvi, transform=transform):
    if value != -1:#skips no data
        polygons.append({"geometry": shape(geom), "value": int(value)})


#convert to geodataframe
gdf = gpd.GeoDataFrame(polygons, crs=crs)

#adding labels
gdf["label"] = gdf["value"].map({0: "Barren", 1: "Moderate", 2: "Healthy"})

#save as geojson
gdf.to_file("classified_ndvi_polygons.geojson", driver="GeoJSON")

#preview
gdf.head()
Enter fullscreen mode Exit fullscreen mode

8. GeoJson, Shapefiles and GeoPackage

#save as geojson file
gdf.to_file("classified_ndvi_polygons.geojson", driver="GeoJSON")

#save as a .shp file
gdf.to_file("classified_ndvi_polygons.shp")

#save as a geopackage
gdf.to_file("classified_ndvi.gpkg", driver="GPKG")
Enter fullscreen mode Exit fullscreen mode

While working on this project, I also ran into a few format-related issues when exporting the classified results. At first, I saved the vectorized NDVI zones as a GeoJSON, but the file ended up being around 352MB, which was too large for QGIS to handle smoothly. It either failed to load or froze entirely. I tried again using a Shapefile, which worked better for loading large datasets, but came with limitations like short field names and no support for special characters. Finally, I exported the same data as a GeoPackage (.gpkg), and it worked perfectly. It handled the large file size, preserved full attribute names, and opened cleanly in QGIS. Based on this, I’d recommend GeoJSON for lightweight data, Shapefiles for compatibility, and GeoPackage for larger or more complex outputs.


9. Loading the Result in QGIS

Once I had the classified NDVI zones vectorized and saved as a GeoPackage, I brought the file into QGIS to generate the final map layout.

QGIS handled the GeoPackage smoothly — no freezing or lag, even with thousands of polygons. I applied a simple categorized style based on NDVI class (barren, moderate, healthy) for easy interpretation.

From there, I added:

  • A base map for reference (using XYZ Tiles)
  • A legend, scale bar, and labels
  • A title and layout design using the Print Layout tool

Then I exported the result as a high-resolution PDF map.

This step wasn’t just about visualization — it was about validating that the Python-generated output aligned with the real-world landscape and could be presented clearly.

Final Map

Result

The final NDVI map showed strong vegetation density in forested areas and lower NDVI in more open or cleared zones. I processed everything in Python and confirmed that the output matched what I saw later in QGIS when visualizing the same area.


Takeaways

  • Always check CRS and bit depth before processing
  • NDVI outputs may need classification to be more interpretable
  • Vectorizing large rasters can produce heavy files — use GeoPackage or Shapefile for big outputs

Related Post

If you want to see the mistakes that led to this working version, check out:

Before I Got the Map, I Got It All Wrong


Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.