DEV Community

Minexa.ai
Minexa.ai

Posted on • Edited on

Scraping environmental data from OpenEI with Minexa.ai

OpenEI (Open Energy Information) is a platform maintained by the U.S. Department of Energy that hosts a large catalog of energy-related datasets. The search page at data.openei.org/search lists datasets across topics like solar resources, utility rates, building energy use, and grid data. Each listing includes a title, organization, tags, license type, and a link to the full dataset record.

If you need to collect this data at scale — to build a dataset index, track what gets published over time, or feed an internal research tool — copying it manually is not realistic. This is where the Minexa.ai Chrome extension comes in.


Watch the full walkthrough first

Before going through the screenshots below, the video tutorial covers the entire process end to end. It is the fastest way to understand how the extraction works on OpenEI.

Watch full video demo

How the extraction works, stage by stage

Rather than listing steps in isolation, here is what each stage of the process actually does and what you see on screen.

Starting point: the Minexa extension

Once the extension is installed, opening it from any page brings up the Minexa home screen. This is where all your scrapers and jobs are managed.

Minexa home page

The extension works directly in your browser — no separate app, no dashboard to log into from another tab.


Navigating to the target page

Browse to data.openei.org/search. This is the dataset search listing page. You can see all the dataset cards that Minexa will detect and extract from.

OpenEI search page loaded

Minexa works on the page currently open in your browser, so there is no URL to paste into a separate interface. You are already on the right page.


Confirming the page

After opening the extension popup, you click 'I'm on the right page'. This tells Minexa to begin analyzing the current page structure.

Extension popup with confirmation button

From this point, Minexa takes over the detection process automatically.


Pagination detection

Minexa scans the page and identifies how it paginates. For OpenEI, it detects the next page mechanism and shows you a list of the pagination it found. You review it and click Continue.

Pagination detected

You do not configure this manually. Minexa reads the page structure and figures out the pagination pattern on its own.


Choosing your scraping depth

After pagination is confirmed, Minexa asks whether you want to scrape just the list page or also follow each dataset link and extract detail page data. For most research use cases, list-only is sufficient. For deeper extraction, the detail mode pulls additional fields from each individual dataset record page.

List or detail scraping option

This two-layer extraction capability means a single job can produce both the summary data from the list and the full metadata from each dataset page.


Simple or advanced mode

Before the job starts, you choose between simple mode (Minexa picks the most relevant fields automatically) and advanced mode (you can review and adjust the field selection). For most users, simple mode produces a clean, complete output without any additional configuration.

Simple or advanced scraping options


Container detection

Minexa automatically highlights the repeating container on the page — the element that wraps each dataset listing. This is the structural anchor it uses to identify where one result ends and the next begins.

Container highlighted automatically


Field discovery

After detecting the container, Minexa surfaces all the data points it found within each listing. These appear as labeled columns — title, organization, tags, license, and more. You do not need to specify these upfront.

All data points extracted

This is one of the more useful aspects of the tool: if you are not sure what fields are available on a page, Minexa shows you rather than asking you to define them first.


API and code samples

At the configuration stage, Minexa also surfaces ready-to-use code samples in JSON and Python, along with an API request view. This is useful if you want to integrate the scraper into an existing pipeline.

Code samples and API request


Job summary with scheduling and Google Sheets options

Before running, you see a summary screen. From here you can connect a Google Sheet for live output or set up a recurring schedule so the job runs automatically without manual triggering.

Job summary screen

Scheduling is particularly relevant for OpenEI since new datasets are added regularly. A weekly scheduled run keeps your local dataset index current without any manual work.


Running the job

The scraper appears in your jobs list with a Run button. Once triggered, extraction begins across all detected pages.

Jobs list with run button


Results during and after the run

As the job runs, data populates in a table view in real time. Once complete, you can export to Excel or JSON.

Scraped data table after job finishes


What the extracted data looks like

Here is a sample of what the JSON output contains after a completed run on the OpenEI search page:

[
  {
    "title": "U.S. Solar Resource Data",
    "organization": "National Renewable Energy Laboratory",
    "tags": "solar, irradiance, GHI, DNI",
    "license": "Public Domain"
  },
  {
    "title": "Utility Rate Database",
    "organization": "NREL",
    "tags": "electricity rates, tariffs, utilities",
    "license": "Creative Commons"
  }
]
Enter fullscreen mode Exit fullscreen mode

Each row corresponds to one dataset listing. Fields are clean and consistently named across all pages.


Working with the exported data in Python

import json

with open('openei_datasets.json', 'r') as f:
    datasets = json.load(f)

for dataset in datasets:
    print(dataset.get('title'), '|', dataset.get('organization'))
Enter fullscreen mode Exit fullscreen mode

This gives you a quick way to scan titles and organizations, or pipe the data into a pandas DataFrame for further filtering and analysis.


The scraper configuration is saved after the first run. The next time you trigger it, Minexa skips the detection phase entirely and goes straight to extraction. If you want to get started, the extension is available at minexa.ai.

Top comments (0)