You found data on a website. You need it in your database.
What's the fastest path from point A to point B?
This guide covers the practical workflows for getting HTML tables into SQLite, PostgreSQL, MySQL, and other databases—with minimal friction.
The General Workflow
Every browser-to-database pipeline has the same steps:
- Extract — Get the table out of the webpage
- Clean — Fix formatting, types, and structure
- Load — Insert into database
The question is where to do each step and with what tools.
Path 1: CSV as Intermediary
The most common approach. Works with any database.
Step 1: Export to CSV
Use a browser extension like HTML Table Exporter or copy-paste into a spreadsheet and save as CSV.
Key consideration: Delimiter choice. Commas work unless your data contains commas. Semicolons or tabs are safer for messy data.
Step 2: Load into Database
SQLite:
sqlite3 mydb.db
.mode csv
.import data.csv tablename
Or with headers:
sqlite3 mydb.db <<EOF
.mode csv
.headers on
.import data.csv tablename
EOF
PostgreSQL:
COPY tablename FROM '/path/to/data.csv'
WITH (FORMAT csv, HEADER true);
Or using psql:
\copy tablename FROM 'data.csv' WITH (FORMAT csv, HEADER true);
MySQL:
LOAD DATA INFILE '/path/to/data.csv'
INTO TABLE tablename
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;
Pros and Cons
✅ Universal compatibility
✅ Easy to inspect intermediate file
✅ Can manually fix issues before loading
❌ Extra step
❌ Type inference is basic
Path 2: Direct to SQL Statements
Export the table directly as INSERT statements.
Export as SQL
Some export tools generate SQL directly. The output looks like:
INSERT INTO table_name (col1, col2, col3) VALUES
('value1', 'value2', 123),
('value4', 'value5', 456);
Load
Just execute the SQL file:
# SQLite
sqlite3 mydb.db < data.sql
# PostgreSQL
psql -d mydb -f data.sql
# MySQL
mysql mydb < data.sql
Pros and Cons
✅ One step from export to database
✅ SQL is portable across databases (mostly)
❌ Large datasets = huge SQL files
❌ Need to define table schema first
Path 3: Python + Pandas Pipeline
Maximum flexibility. Best for repeated imports or complex cleaning.
Complete Workflow
import pandas as pd
from sqlalchemy import create_engine
# Step 1: Load CSV (exported from browser)
df = pd.read_csv('data.csv')
# Step 2: Clean
df.columns = df.columns.str.lower().str.replace(' ', '_') # Normalize column names
df['date'] = pd.to_datetime(df['date']) # Parse dates
df['amount'] = df['amount'].str.replace('[$,]', '', regex=True).astype(float) # Clean currency
# Step 3: Load to database
engine = create_engine('sqlite:///mydb.db')
# Or: create_engine('postgresql://user:pass@localhost/mydb')
# Or: create_engine('mysql+pymysql://user:pass@localhost/mydb')
df.to_sql('tablename', engine, if_exists='replace', index=False)
Type Mapping
Pandas infers types, but you can be explicit:
df.to_sql('tablename', engine,
if_exists='replace',
index=False,
dtype={
'id': Integer(),
'name': String(100),
'created_at': DateTime(),
'amount': Float()
})
Pros and Cons
✅ Full control over cleaning and types
✅ Handles complex transformations
✅ Repeatable and scriptable
❌ Requires Python environment
❌ Overkill for one-time imports
Path 4: DuckDB (The Modern Shortcut)
DuckDB reads CSV directly and handles most cleaning automatically.
Direct Query
-- Query CSV without importing
SELECT * FROM 'data.csv';
-- With type inference
SELECT * FROM read_csv_auto('data.csv');
-- Create table from CSV
CREATE TABLE mytable AS SELECT * FROM 'data.csv';
From Browser to Query in Seconds
# Export table as CSV from browser
# Then immediately query:
duckdb -c "SELECT column1, SUM(column2) FROM 'data.csv' GROUP BY column1;"
No schema definition. No import step. Just query.
Pros and Cons
✅ Fastest for ad-hoc analysis
✅ Excellent type inference
✅ SQL interface
❌ Not a traditional database (for persistence, export results)
Handling Common Issues
Column Names with Spaces
Web tables often have headers like "Total Sales" or "Year to Date".
Fix at export: Some tools normalize headers automatically.
Fix at import:
df.columns = df.columns.str.replace(' ', '_').str.lower()
Or in SQL after import:
ALTER TABLE mytable RENAME COLUMN "Total Sales" TO total_sales;
Mixed Number Formats
European (1.234,56) vs US (1,234.56) formats break imports.
Fix before import: Use a tool with cleaning presets, or:
# European to standard
df['value'] = df['value'].str.replace('.', '').str.replace(',', '.').astype(float)
Null Representations
Web tables show blanks, "N/A", "—", "-" for missing data.
df = df.replace(['N/A', '—', '-', ''], pd.NA)
Or in DuckDB:
SELECT NULLIF(column, 'N/A') AS column FROM 'data.csv';
Dates in Various Formats
df['date'] = pd.to_datetime(df['date'], format='mixed', dayfirst=True)
The format='mixed' handles inconsistent date formats in the same column.
Choosing the Right Path
| Scenario | Recommended Path |
|---|---|
| One-time import, small table | CSV + direct database import |
| One-time import, needs cleaning | Python + Pandas |
| Quick analysis, no persistence needed | DuckDB |
| Regular imports from same source | Python script (automate) |
| Need SQL format for sharing/versioning | Direct SQL export |
Example: Full Workflow
Let's say you have a table of product data on a website.
Step 1: Export with HTML Table Exporter → CSV with cleaned numbers
Step 2: Quick validation in DuckDB:
SELECT COUNT(*), COUNT(DISTINCT product_id) FROM 'products.csv';
-- Check for duplicates
Step 3: Load to PostgreSQL:
import pandas as pd
from sqlalchemy import create_engine
df = pd.read_csv('products.csv')
engine = create_engine('postgresql://user:pass@localhost/inventory')
df.to_sql('products', engine, if_exists='replace', index=False)
Total time: 2 minutes.
The Meta-Lesson
The shortest path isn't always the most direct.
Spending 30 seconds on cleaning at export saves 10 minutes debugging import errors. Using the right intermediate format (CSV vs SQL vs JSON) depends on your target database and data complexity.
For most web table → database workflows:
- Export to CSV with number/date cleaning
- Load with your database's native CSV import
- Only bring in Python if you need complex transformations
Simple pipelines are maintainable pipelines.
For Python workflows specifically, see our detailed guide on exporting web tables to JSON for Python & Pandas.
Need clean exports that are database-ready? Learn more at gauchogrid.com/html-table-exporter or try HTML Table Exporter free on the Chrome Web Store.
Top comments (1)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.