Happy New Year! As we kick off 2026, many of us are looking at our codebases and thinking about optimization.
When we first start with Ruby on Rails, ActiveRecord feels like magic. User.find(id) or Post.all are simple and intuitive. But as your application grows and your data becomes more complex, basic CRUD isn't enough. You need to perform aggregations, filter based on calculated values, and sometimes, drop down into raw SQL.
In this article, we’ll dive deep into the "heavy lifting" side of ActiveRecord: Select vs. Pluck, Group & Having, and Raw SQL.
1. Select vs. Pluck: The Memory Battle
Many developers use these interchangeably, but they behave very differently under the hood.
-
.select: Fetches only the columns you specify but still returns an ActiveRecord Relation object. This means Rails instantiates a Ruby object for every row returned. -
.pluck: Queries the database and returns a raw Ruby Array of values. It skips the object instantiation entirely.
When to use which?
If you just need a list of IDs or names for a dropdown, use pluck. If you need to call model methods on the results, use select.
# ❌ High memory usage (instantiates 10,000 User objects just to get emails)
emails = User.select(:email).map(&:email)
# ✅ Low memory usage (returns a simple array of strings)
emails = User.pluck(:email)
2. Group and Having: Filtering the Aggregates
Grouping is essential for reporting. However, the biggest point of confusion is usually the difference between where and having.
-
.where: Filters individual rows before they are grouped. -
.having: Filters the groups after the aggregation is calculated.
The Scenario
You want to find users who have spent more than $500 in total across all their orders. You can't use where for the sum, because the sum hasn't happened yet when where runs.
# This query groups orders by user_id and filters for groups where the sum is > 500
User.joins(:orders)
.group("users.id")
.having("SUM(orders.amount) > ?", 500)
.select("users.*, SUM(orders.amount) as total_spent")
3. Complex Ordering with Logic
Sometimes, order(created_at: :desc) isn't enough. You might need to sort by a specific logic, like "Urgent" items first, then "Pending," then "Completed."
For this, we can use Arel.sql to pass fragments of SQL logic safely.
# Ordering by a custom CASE statement
tasks = Task.order(Arel.sql("
CASE
WHEN status = 'urgent' THEN 1
WHEN status = 'pending' THEN 2
ELSE 3
END ASC"
))
4. When ActiveRecord Isn't Enough: Raw SQL
ActiveRecord is powerful, but it’s not perfect. For highly complex reports or database-specific features (like Postgres window functions), raw SQL is the way to go.
There are two main ways to execute raw SQL:
A. find_by_sql
Use this when you want the result to be returned as ActiveRecord objects.
users = User.find_by_sql("SELECT * FROM users WHERE some_complex_condition = true")
B. ActiveRecord::Base.connection.execute
Use this for pure data manipulation or when you don't need Model objects back. It returns an ActiveRecord::Result object which behaves like an array of hashes.
⚠️ Security Warning: Never interpolate strings directly into raw SQL (e.g., "#{params[:id]}"). Always use placeholders (?) or sanitized inputs to prevent SQL Injection.
Conclusion
Mastering these complex queries is what moves you from a "Rails Developer" to a "Database Expert."
As I move further into AI Engineering in 2026, I’m finding that these data manipulation skills are the exact foundation needed for handling large datasets in Python and Pandas. If you can think in SQL sets, you are halfway to thinking in Data Frames!
What’s the most complex query you’ve ever had to write in Rails? Let me know in the comments!
Top comments (0)