DEV Community

agenthustler
agenthustler

Posted on

Scraping Scientific Data: Climate Datasets, NASA APIs, and NOAA

Scraping Scientific Data: Climate Datasets, NASA APIs, and NOAA

Scientific data is increasingly open and accessible. NASA, NOAA, and other agencies provide APIs and datasets that you can programmatically access for research, visualization, or building climate-aware applications.

NASA APIs

NASA offers over 30 free APIs. No scraping needed — just an API key:

pip install requests pandas matplotlib
Enter fullscreen mode Exit fullscreen mode

Astronomy Picture of the Day

import requests

NASA_KEY = "DEMO_KEY"  # Get your own at api.nasa.gov

def get_apod(date=None):
    url = "https://api.nasa.gov/planetary/apod"
    params = {"api_key": NASA_KEY}
    if date:
        params["date"] = date

    response = requests.get(url, params=params)
    data = response.json()
    return {
        "title": data["title"],
        "date": data["date"],
        "explanation": data["explanation"],
        "image_url": data.get("hdurl", data.get("url"))
    }

apod = get_apod()
print(f"{apod['title']}\n{apod['explanation'][:200]}...")
Enter fullscreen mode Exit fullscreen mode

Near Earth Objects (Asteroids)

def get_near_earth_objects(start_date, end_date):
    url = "https://api.nasa.gov/neo/rest/v1/feed"
    params = {
        "start_date": start_date,
        "end_date": end_date,
        "api_key": NASA_KEY
    }
    response = requests.get(url, params=params)
    data = response.json()

    asteroids = []
    for date, objects in data["near_earth_objects"].items():
        for obj in objects:
            asteroids.append({
                "name": obj["name"],
                "date": date,
                "diameter_min_m": obj["estimated_diameter"]["meters"]["estimated_diameter_min"],
                "diameter_max_m": obj["estimated_diameter"]["meters"]["estimated_diameter_max"],
                "velocity_kmh": float(obj["close_approach_data"][0]["relative_velocity"]["kilometers_per_hour"]),
                "miss_distance_km": float(obj["close_approach_data"][0]["miss_distance"]["kilometers"]),
                "is_hazardous": obj["is_potentially_hazardous_asteroid"]
            })

    return asteroids

neos = get_near_earth_objects("2026-03-01", "2026-03-07")
print(f"Found {len(neos)} near-Earth objects")
hazardous = [a for a in neos if a["is_hazardous"]]
print(f"Potentially hazardous: {len(hazardous)}")
Enter fullscreen mode Exit fullscreen mode

NOAA Climate Data

def get_noaa_data(dataset_id, location_id, start_date, end_date, token):
    url = "https://www.ncdc.noaa.gov/cdo-web/api/v2/data"
    headers = {"token": token}
    params = {
        "datasetid": dataset_id,
        "locationid": location_id,
        "startdate": start_date,
        "enddate": end_date,
        "limit": 1000,
        "units": "metric"
    }

    response = requests.get(url, headers=headers, params=params)
    if response.status_code == 200:
        return response.json().get("results", [])
    return []

data = get_noaa_data(
    dataset_id="GHCND",
    location_id="CITY:US360019",
    start_date="2025-01-01",
    end_date="2025-12-31",
    token="YOUR_NOAA_TOKEN"
)
Enter fullscreen mode Exit fullscreen mode

EPA Air Quality Data

def get_air_quality(zip_code, date):
    url = "https://www.airnowapi.org/aq/observation/zipCode/current/"
    params = {
        "format": "application/json",
        "zipCode": zip_code,
        "date": date,
        "distance": 25,
        "API_KEY": "YOUR_AIRNOW_KEY"
    }
    response = requests.get(url, params=params)
    return response.json()

aqi_data = get_air_quality("10001", "2026-03-01")
for reading in aqi_data:
    print(f"{reading['ParameterName']}: AQI {reading['AQI']} ({reading['Category']['Name']})")
Enter fullscreen mode Exit fullscreen mode

Building a Climate Dashboard

import pandas as pd
import matplotlib.pyplot as plt

def plot_temperature_trend(data):
    df = pd.DataFrame(data)
    temp_data = df[df["datatype"] == "TAVG"]

    if temp_data.empty:
        print("No temperature data found")
        return

    temp_data["date"] = pd.to_datetime(temp_data["date"])
    temp_data["value"] = temp_data["value"] / 10

    fig, ax = plt.subplots(figsize=(12, 6))
    ax.plot(temp_data["date"], temp_data["value"], linewidth=0.8)
    ax.set_xlabel("Date")
    ax.set_ylabel("Temperature (C)")
    ax.set_title("Daily Average Temperature")
    ax.grid(True, alpha=0.3)
    plt.savefig("climate_trend.png", dpi=150, bbox_inches="tight")
    print("Chart saved to climate_trend.png")
Enter fullscreen mode Exit fullscreen mode

Scraping Non-API Sources

Some scientific data requires scraping HTML pages. For those, use ScraperAPI to handle JavaScript-rendered pages from data portals. ThorData proxies help when accessing geo-restricted datasets. ScrapeOps monitors your collection pipeline health.

Data Sources Reference

Source Type Auth URL
NASA APIs REST API Free key api.nasa.gov
NOAA CDO REST API Free token ncdc.noaa.gov
EPA AirNow REST API Free key airnowapi.org
USGS Earthquakes GeoJSON None earthquake.usgs.gov
OpenWeather REST API Free tier openweathermap.org

Tips

  1. Cache aggressively — scientific data rarely changes retroactively
  2. Use bulk downloads for historical data instead of API calls
  3. Check data licenses — most government data is public domain
  4. Handle missing values — sensor outages create gaps in climate data
  5. Use netCDF/HDF5 for large gridded datasets, not CSV

Follow for more Python data science tutorials!

Top comments (0)