I wrote SQL queries to save my scraped data. INSERT statements. UPDATE statements. Handling duplicates. Escaping special characters. It was messy and confusing.
cursor.execute("INSERT INTO products (name, price) VALUES (%s, %s)", (name, price))
What if the product name has a quote in it? What if I want to update instead of insert? What about relationships?
Then I discovered ORM. No more SQL. Just Python code:
product = Product(name=name, price=price)
product.save()
Clean. Simple. Python.
Let me show you what ORM is and why it makes databases much easier.
What is ORM? (Super Simple Explanation)
ORM stands for Object-Relational Mapping.
Big fancy term. But the idea is simple:
Without ORM (what we did in last blog):
# Write SQL
cursor.execute("""
INSERT INTO products (name, price, url)
VALUES (%s, %s, %s)
""", (product_name, product_price, product_url))
# Remember table names
# Remember column names
# Write SQL syntax
# Handle database connections
With ORM (the easy way):
# Write Python
product = Product(
name=product_name,
price=product_price,
url=product_url
)
product.save()
# No SQL!
# Just Python objects
# ORM handles everything
Think of ORM like a translator:
You speak: Python
Database speaks: SQL
ORM = Translator between Python and SQL
You write Python code. ORM converts it to SQL automatically.
The Problem ORM Solves
Problem 1: Writing SQL is Hard
Manual SQL:
# Complex query
cursor.execute("""
SELECT products.name, categories.name, AVG(reviews.rating)
FROM products
JOIN categories ON products.category_id = categories.id
JOIN reviews ON products.id = reviews.product_id
WHERE products.price < 100
GROUP BY products.id, categories.id
HAVING AVG(reviews.rating) > 4
""")
Confusing! Easy to make mistakes!
With ORM:
# Same thing in Python
products = Product.objects.filter(
price__lt=100,
reviews__rating__avg__gt=4
).select_related('category')
Much clearer!
Problem 2: SQL Injection Attacks
Dangerous (SQL injection):
# User input: "'; DROP TABLE products; --"
query = f"SELECT * FROM products WHERE name = '{user_input}'"
cursor.execute(query) # DANGEROUS!
This could delete your entire database!
Safe with ORM:
Product.objects.filter(name=user_input) # Automatically safe
ORM handles escaping automatically. You can't make this mistake.
Problem 3: Database Changes
Manual SQL:
# You rename 'price' column to 'product_price'
# Now you must find and update EVERY query:
cursor.execute("SELECT price FROM products") # BROKEN!
cursor.execute("UPDATE products SET price = 50") # BROKEN!
# 50 places to change!
With ORM:
# Define once
class Product:
product_price = models.DecimalField()
# Use everywhere
product.product_price # Works everywhere automatically
Change in one place, works everywhere.
Problem 4: Relationships
Manual SQL (relationships are hard):
# Get product and its reviews
cursor.execute("SELECT * FROM products WHERE id = 1")
product = cursor.fetchone()
cursor.execute("SELECT * FROM reviews WHERE product_id = 1")
reviews = cursor.fetchall()
# Manual work to connect them!
With ORM (relationships are easy):
product = Product.objects.get(id=1)
reviews = product.reviews.all() # Automatic!
ORM handles relationships for you.
How ORM Works
Step 1: Define Models (Python Classes)
from sqlalchemy import Column, Integer, String, Decimal
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class Product(Base):
__tablename__ = 'products'
id = Column(Integer, primary_key=True)
name = Column(String(200))
price = Column(Decimal(10, 2))
url = Column(String(500))
This Python class represents a database table.
Step 2: ORM Creates the Table
# ORM automatically creates this SQL:
"""
CREATE TABLE products (
id INTEGER PRIMARY KEY,
name VARCHAR(200),
price DECIMAL(10, 2),
url VARCHAR(500)
);
"""
You don't write the SQL. ORM does!
Step 3: Use Python Objects
# Create product (like creating an object)
product = Product(
name="Laptop",
price=999.99,
url="https://example.com/laptop"
)
# Save to database
product.save()
Behind the scenes, ORM creates:
INSERT INTO products (name, price, url)
VALUES ('Laptop', 999.99, 'https://example.com/laptop');
Step 4: Query with Python
# Find products
cheap_products = Product.objects.filter(price__lt=50)
# ORM converts to:
# SELECT * FROM products WHERE price < 50;
All SQL is automatic!
Popular Python ORMs
1. SQLAlchemy (Most Popular)
What it is: Powerful, flexible ORM
Pros:
- Very powerful
- Works with any database
- Large community
- Excellent documentation
Cons:
- Learning curve
- Can be complex
When to use:
- Serious projects
- Need flexibility
- Complex queries
This is what we'll use!
2. Django ORM
What it is: ORM built into Django framework
Pros:
- Very easy
- Great for beginners
- Built-in admin interface
Cons:
- Only with Django
- Less flexible
When to use:
- Using Django framework
- Want simplicity
3. Peewee
What it is: Small, simple ORM
Pros:
- Simple
- Lightweight
- Easy to learn
Cons:
- Less features
- Smaller community
When to use:
- Small projects
- Want simplicity
- Don't need advanced features
Installing SQLAlchemy
We'll use SQLAlchemy (most popular with Scrapy).
pip install sqlalchemy
That's it!
Your First ORM Example
Let's create a simple example to understand ORM.
Step 1: Define Model
# models.py
from sqlalchemy import create_engine, Column, Integer, String, Decimal
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
# Base class for models
Base = declarative_base()
# Define Product model
class Product(Base):
__tablename__ = 'products'
id = Column(Integer, primary_key=True)
name = Column(String(200))
price = Column(Decimal(10, 2))
url = Column(String(500))
def __repr__(self):
return f"<Product(name='{self.name}', price={self.price})>"
Step 2: Create Database Connection
# Create database (SQLite for simplicity)
engine = create_engine('sqlite:///products.db')
# Create all tables
Base.metadata.create_all(engine)
# Create session (like a connection)
Session = sessionmaker(bind=engine)
session = Session()
Step 3: Create Products
# Create products
product1 = Product(
name="Laptop",
price=999.99,
url="https://example.com/laptop"
)
product2 = Product(
name="Mouse",
price=29.99,
url="https://example.com/mouse"
)
# Add to session
session.add(product1)
session.add(product2)
# Save to database
session.commit()
print("Products saved!")
Step 4: Query Products
# Get all products
all_products = session.query(Product).all()
for product in all_products:
print(product.name, product.price)
# Find products under $50
cheap = session.query(Product).filter(Product.price < 50).all()
for product in cheap:
print(f"Cheap: {product.name} - ${product.price}")
# Get one product by ID
product = session.query(Product).get(1)
print(f"Product 1: {product.name}")
Step 5: Update Products
# Get product
product = session.query(Product).filter(Product.name == "Laptop").first()
# Update price
product.price = 899.99
# Save changes
session.commit()
print("Price updated!")
Step 6: Delete Products
# Get product
product = session.query(Product).filter(Product.name == "Mouse").first()
# Delete
session.delete(product)
session.commit()
print("Product deleted!")
Complete Working Example
Let's put everything together:
# complete_example.py
from sqlalchemy import create_engine, Column, Integer, String, Decimal
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
# Setup
Base = declarative_base()
class Product(Base):
__tablename__ = 'products'
id = Column(Integer, primary_key=True)
name = Column(String(200), nullable=False)
price = Column(Decimal(10, 2))
url = Column(String(500), unique=True)
def __repr__(self):
return f"<Product('{self.name}', ${self.price})>"
# Create database
engine = create_engine('sqlite:///products.db', echo=True) # echo=True shows SQL
Base.metadata.create_all(engine)
# Create session
Session = sessionmaker(bind=engine)
session = Session()
# Create products
products_data = [
{"name": "Laptop", "price": 999.99, "url": "https://example.com/laptop"},
{"name": "Mouse", "price": 29.99, "url": "https://example.com/mouse"},
{"name": "Keyboard", "price": 79.99, "url": "https://example.com/keyboard"},
]
for data in products_data:
product = Product(**data)
session.add(product)
session.commit()
print("✓ Products created")
# Query products
print("\n--- All Products ---")
for product in session.query(Product).all():
print(f"{product.name}: ${product.price}")
print("\n--- Cheap Products (< $50) ---")
cheap = session.query(Product).filter(Product.price < 50).all()
for product in cheap:
print(f"{product.name}: ${product.price}")
print("\n--- Most Expensive ---")
expensive = session.query(Product).order_by(Product.price.desc()).first()
print(f"{expensive.name}: ${expensive.price}")
# Update
print("\n--- Updating Laptop Price ---")
laptop = session.query(Product).filter(Product.name == "Laptop").first()
laptop.price = 899.99
session.commit()
print(f"New price: ${laptop.price}")
# Count
total = session.query(Product).count()
print(f"\nTotal products: {total}")
# Close session
session.close()
Run this:
python complete_example.py
You'll see all the SQL that ORM generates automatically!
ORM Key Concepts
Concept 1: Models = Tables
class Product(Base):
__tablename__ = 'products'
One Python class = One database table
Concept 2: Columns = Attributes
class Product(Base):
name = Column(String(200))
price = Column(Decimal(10, 2))
Class attributes = Table columns
Concept 3: Objects = Rows
product = Product(name="Laptop", price=999)
One object = One row in table
Concept 4: Session = Connection
session = Session()
session.add(product)
session.commit()
Session manages database operations
Common ORM Operations
Creating Records
# Create object
product = Product(name="Laptop", price=999)
# Add to session
session.add(product)
# Save to database
session.commit()
Reading Records
# Get all
products = session.query(Product).all()
# Get one by ID
product = session.query(Product).get(1)
# Filter
cheap = session.query(Product).filter(Product.price < 50).all()
# Get first match
product = session.query(Product).filter(Product.name == "Laptop").first()
Updating Records
# Get product
product = session.query(Product).get(1)
# Change attribute
product.price = 899.99
# Save
session.commit()
Deleting Records
# Get product
product = session.query(Product).get(1)
# Delete
session.delete(product)
# Save
session.commit()
Filtering (Queries)
Basic Filters
# Exact match
session.query(Product).filter(Product.name == "Laptop")
# Greater than
session.query(Product).filter(Product.price > 100)
# Less than
session.query(Product).filter(Product.price < 50)
# Like (pattern matching)
session.query(Product).filter(Product.name.like("%top%"))
Multiple Filters
# AND (multiple filters)
session.query(Product).filter(
Product.price > 50,
Product.price < 200
)
# OR
from sqlalchemy import or_
session.query(Product).filter(
or_(Product.price < 50, Product.price > 500)
)
Ordering
# Order by price (ascending)
session.query(Product).order_by(Product.price)
# Order by price (descending)
session.query(Product).order_by(Product.price.desc())
Limit
# Get first 10
session.query(Product).limit(10)
# Get first one
session.query(Product).first()
Relationships (Preview)
One of ORM's best features: handling relationships.
Example: Products and Reviews
from sqlalchemy import ForeignKey
from sqlalchemy.orm import relationship
class Product(Base):
__tablename__ = 'products'
id = Column(Integer, primary_key=True)
name = Column(String(200))
# Relationship
reviews = relationship("Review", back_populates="product")
class Review(Base):
__tablename__ = 'reviews'
id = Column(Integer, primary_key=True)
product_id = Column(Integer, ForeignKey('products.id'))
rating = Column(Integer)
text = Column(String)
# Relationship
product = relationship("Product", back_populates="reviews")
Using Relationships
# Get product
product = session.query(Product).get(1)
# Get all reviews (automatic!)
for review in product.reviews:
print(f"Rating: {review.rating}")
# Go backwards
review = session.query(Review).get(1)
print(f"Product: {review.product.name}")
No manual JOINs needed! ORM handles it.
We'll use this in the next blog when scraping products with reviews!
Why ORM is Perfect for Scrapy
Reason 1: Clean Pipelines
Without ORM:
def process_item(self, item, spider):
cursor.execute("""
INSERT INTO products (name, price, url)
VALUES (%s, %s, %s)
ON CONFLICT (url) DO UPDATE
SET name = EXCLUDED.name
""", (item['name'], item['price'], item['url']))
connection.commit()
Messy!
With ORM:
def process_item(self, item, spider):
product = Product(**item)
session.add(product)
session.commit()
Clean!
Reason 2: Automatic Duplicate Handling
# ORM can handle duplicates elegantly
existing = session.query(Product).filter(Product.url == item['url']).first()
if existing:
# Update existing
existing.price = item['price']
else:
# Create new
product = Product(**item)
session.add(product)
session.commit()
Reason 3: Easy Relationships
# Product with reviews
product = Product(name="Laptop", price=999)
session.add(product)
review = Review(product=product, rating=5, text="Great!")
session.add(review)
session.commit()
Relationships handled automatically!
Preparing for Next Blog
In the next blog, we'll integrate ORM with Scrapy. Here's a preview:
What We'll Build
A Scrapy spider that:
- Scrapes products
- Scrapes reviews for each product
- Saves to database using ORM
- Handles relationships automatically
- Updates existing products
- No manual SQL!
The Models We'll Use
class Product(Base):
id = Column(Integer, primary_key=True)
name = Column(String(200))
price = Column(Decimal(10, 2))
url = Column(String(500), unique=True)
reviews = relationship("Review")
class Review(Base):
id = Column(Integer, primary_key=True)
product_id = Column(Integer, ForeignKey('products.id'))
rating = Column(Integer)
text = Column(String)
product = relationship("Product")
The Pipeline We'll Create
class ORMPipeline:
def process_item(self, item, spider):
# Save product with ORM
product = Product(**item)
session.add(product)
session.commit()
return item
Simple and clean!
Summary
What is ORM?
Object-Relational Mapping. Converts Python objects to SQL automatically.
Why use ORM?
- No SQL needed
- Safer (no SQL injection)
- Cleaner code
- Handles relationships
- Easy to maintain
SQLAlchemy basics:
# Define model
class Product(Base):
__tablename__ = 'products'
id = Column(Integer, primary_key=True)
name = Column(String(200))
# Create
product = Product(name="Laptop")
session.add(product)
session.commit()
# Read
products = session.query(Product).all()
# Update
product.name = "New Name"
session.commit()
# Delete
session.delete(product)
session.commit()
Key concepts:
- Models = Tables
- Columns = Attributes
- Objects = Rows
- Session = Connection
Next blog:
We'll use ORM in Scrapy pipelines. No more manual SQL in our spiders!
Remember:
- ORM makes databases easier
- Write Python, not SQL
- Perfect for Scrapy
- We'll use SQLAlchemy
Practice the examples above. In the next blog, we'll integrate ORM with Scrapy and build a real scraper with database storage!
Happy scraping! 🕷️
Top comments (0)