After learning a lot from my first attempt at NDVI processing (where just about everything went wrong), I built a clean, working NDVI pipeline using Python. This post walks through how I calculated and visualized NDVI over the Pohjois-Savo region in Finland using Landsat 8 data.
What is NDVI?
NDVI stands for Normalized Difference Vegetation Index. It’s a widely used metric in remote sensing that helps identify healthy vegetation using the difference between red and near-infrared (NIR) light reflected by surfaces.
The formula:
NDVI = (NIR - Red) / (NIR + Red)
Higher NDVI values generally indicate denser, healthier vegetation.
Tools & Data
-
Landsat 8 Surface Reflectance data from USGS EarthExplorer
- Band 4 = Red
- Band 5 = Near Infrared (NIR)
-
Python Libraries:
-
rasterio
(for raster data) -
geopandas
(for shapefiles/geojson) -
numpy
(for array math) -
matplotlib
(for visualization)
-
- Boundary: A GeoJSON file covering part of the Pohjois-Savo region in Finland
Step-by-Step Workflow
1. Import libraries
import rasterio
import geopandas as gpd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
from rasterio.mask import mask
from rasterio.crs import CRS
from rasterio.features import shapes
from shapely.geometry import shape
2. Load the Red, NIR Bands, and Reproject the GeoJSON Boundary
red_path = '../data/raw/band4.TIF'
nir_path = '../data/raw/band5.TIF'
shape_path = '../data/raw/workarea.geojson'
3. Load, Clip the Bands to the Boundary and Calculate NDVI
#open bands
with rasterio.open(red_path) as red_src, rasterio.open(nir_path) as nir_src:
red = red_src.read(1).astype('float32')
nir = nir_src.read(1).astype('float32')
#check for invalid crs (EPSG:9001 or None)
def fix_crs(src):
if not src.crs or "9001" in str(src.crs):
return CRS.from_epsg(32635)
return src.crs
red_crs = fix_crs(red_src)
nir_crs = fix_crs(nir_src)
#use red's crs as base
working_crs = red_crs
#load and reproject farm boundary
#if None you can set to 3857 also
farm_shape = gpd.read_file(shape_path)
if farm_shape.crs is None:
farm_shape.set_crs("EPSG:4326", inplace=True)
farm_shape = farm_shape.to_crs(working_crs)
#clip raster bands
clipped_red, _ = mask(red_src, farm_shape.geometry, crop=True)
clipped_nir, _ = mask(nir_src, farm_shape.geometry, crop=True)
#recalculate NDVI
ndvi_clipped = (clipped_nir[0] - clipped_red[0]) / (clipped_nir[0] + clipped_red[0])
4. Plot your graph
plt.figure(figsize=(8, 6))
plt.imshow((nir - red) / (nir + red), cmap='RdYlGn', vmin=-1, vmax=1)
plt.colorbar()
plt.title("Full NDVI")
plt.axis("off")
plt.show()
You will see something like this:
5. Classify NDVI (Optional)
def classify_ndvi(ndvi_array):
classified = np.full(ndvi_array.shape, -1, dtype='int8') #-1 = no data
classified[(ndvi_array >= -1) & (ndvi_array < 0.2)] = 0 #barren/water
classified[(ndvi_array >= 0.2) & (ndvi_array <= 0.5)] = 1 #moderate vegetation
classified[(ndvi_array > 0.5) & (ndvi_array <= 1)] = 2 #healthy/high vegetation
return classified
# appling to ndvi array
classified_ndvi = classify_ndvi(ndvi_clipped)
6. Plot Classified Graph
colors = ['skyblue', 'yellowgreen', 'darkgreen']
cmap = mcolors.ListedColormap(colors)
labels = ['Barren/Water', 'Moderate', 'Healthy']
plt.figure(figsize=(8, 6))
img = plt.imshow(classified_ndvi, cmap=cmap, vmin=0, vmax=3)
cbar = plt.colorbar(img, ticks=[0.5, 1.5, 2.5])
cbar.ax.set_yticklabels(labels)
plt.title("NDVI Vegetation Classification")
plt.axis("off")
plt.tight_layout()
plt.show()
The result will look something like this:
7. Polygonise your data
with rasterio.open(red_path) as src:
transform = src.transform
crs = src.crs
#generate dictionary (geometry, value) from classified ndvi
polygons = []
for geom, value in shapes(classified_ndvi, transform=transform):
if value != -1:#skips no data
polygons.append({"geometry": shape(geom), "value": int(value)})
#convert to geodataframe
gdf = gpd.GeoDataFrame(polygons, crs=crs)
#adding labels
gdf["label"] = gdf["value"].map({0: "Barren", 1: "Moderate", 2: "Healthy"})
#save as geojson
gdf.to_file("classified_ndvi_polygons.geojson", driver="GeoJSON")
#preview
gdf.head()
8. GeoJson, Shapefiles and GeoPackage
#save as geojson file
gdf.to_file("classified_ndvi_polygons.geojson", driver="GeoJSON")
#save as a .shp file
gdf.to_file("classified_ndvi_polygons.shp")
#save as a geopackage
gdf.to_file("classified_ndvi.gpkg", driver="GPKG")
While working on this project, I also ran into a few format-related issues when exporting the classified results. At first, I saved the vectorized NDVI zones as a GeoJSON, but the file ended up being around 352MB, which was too large for QGIS to handle smoothly. It either failed to load or froze entirely. I tried again using a Shapefile, which worked better for loading large datasets, but came with limitations like short field names and no support for special characters. Finally, I exported the same data as a GeoPackage (.gpkg), and it worked perfectly. It handled the large file size, preserved full attribute names, and opened cleanly in QGIS. Based on this, I’d recommend GeoJSON for lightweight data, Shapefiles for compatibility, and GeoPackage for larger or more complex outputs.
9. Loading the Result in QGIS
Once I had the classified NDVI zones vectorized and saved as a GeoPackage, I brought the file into QGIS to generate the final map layout.
QGIS handled the GeoPackage smoothly — no freezing or lag, even with thousands of polygons. I applied a simple categorized style based on NDVI class (barren, moderate, healthy) for easy interpretation.
From there, I added:
- A base map for reference (using XYZ Tiles)
- A legend, scale bar, and labels
- A title and layout design using the Print Layout tool
Then I exported the result as a high-resolution PDF map.
This step wasn’t just about visualization — it was about validating that the Python-generated output aligned with the real-world landscape and could be presented clearly.
Result
The final NDVI map showed strong vegetation density in forested areas and lower NDVI in more open or cleared zones. I processed everything in Python and confirmed that the output matched what I saw later in QGIS when visualizing the same area.
Takeaways
- Always check CRS and bit depth before processing
- NDVI outputs may need classification to be more interpretable
- Vectorizing large rasters can produce heavy files — use GeoPackage or Shapefile for big outputs
Related Post
If you want to see the mistakes that led to this working version, check out:
Before I Got the Map, I Got It All Wrong
Top comments (0)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.