Introduction
Amazon Athena is a serverless, interactive query service that allows you to analyze data in Amazon S3 using standard SQL. While it's simple to run queries by passing SQL strings, this approach can be error-prone, hard to maintain, and vulnerable to SQL injection when dealing with dynamic values.
When working with Amazon Athena, most developers start by writing raw SQL queries directly in their code.
For example, here's how a query might look without prepared statements:
Without Prepared Statement (Old Way)
String query = "SELECT * FROM orders WHERE order_date >= DATE '" + startDate + "' " +
"AND order_date <= DATE '" + endDate + "'";
While this works, notice how the query is built by concatenating strings. If startDate or endDate are not properly formatted, it will lead to runtime errors. Also, this pattern doesn't scale well as queries become more complex.
To address this, prepared statements provide a cleaner and safer way to work with queries in Athena. Instead of building queries with string concatenation, developers can use ? placeholders and bind values at runtime. Athena supports prepared statements and execution parameters, making queries more readable, less error-prone, and secure against common issues like type mismatches or SQL injection.
With Prepared Statement (Preferred Way)
Using ? placeholders makes queries cleaner and reusable:
String query = """
SELECT * FROM orders
WHERE order_date >= ?
AND order_date <= ?
""";
List<Object> parameters = List.of(startDate, endDate);
List<Row> rows = athenaExecutor.execute(query, parameters);
This approach improves readability, reduces bugs, and ensures type safety.
However, real-world usage is not always straightforward. In particular, DATE columns often cause type mismatch errors because Athena doesn't automatically infer parameter types.
In the next section, we'll demonstrate a safe approach to using parameterized queries in Athena with Java, focusing on proper handling of LocalDate values.
Problem Statement
When querying Athena using a prepared-statement-like approach, date fields often cause type mismatch errors. For example, suppose we need to retrieve the most recent order for a customer up to a given date.
public Order fetchLatestOrder(Long customerId, String orderType, LocalDate targetDate) {
String query = """
SELECT * FROM %s.vw_orders
WHERE customer_id = ?
AND order_date <= ?
ORDER BY order_date DESC
LIMIT 1
""".formatted(DatabaseSelector.getReportingDatabase());
List<Object> parameters = List.of(customerId, targetDate);
List<Row> rows = athenaExecutor.execute(query, parameters);
if (rows.isEmpty()) {
throw new DataNotFoundException("No order found for given parameters");
}
return mapRowToOrder(rows.get(0));
}
Generated SQL (incorrect at runtime)
SELECT *
FROM mydb.vw_orders
WHERE customer_id = 12345
AND order_date <= 2025-09-05 -- Athena sees this as integer/string, not DATE
ORDER BY order_date DESC
LIMIT 1;
At runtime, Athena interprets targetDate incorrectly, leading to an error such as:
Type mismatch: cannot apply operator date <= integer
Root cause: Athena interprets the LocalDate parameter as a string or number rather than a DATE literal. The comparison order_date <= ? therefore fails.
Solution
The fix is to teach the query execution service how to format parameters correctly, especially for LocalDate. By extending the parameter formatter, we can ensure dates are wrapped in Athena's DATE 'yyyy-mm-dd' literal format.
For example, in the query executor:
private String formatParameter(Object param) {
if (param instanceof LocalDate date) {
return String.format("DATE '%s'", date);
} else if (param instanceof Number) {
return param.toString();
} else if (param != null) {
return "'" + escapeQuotes(param.toString()) + "'";
} else {
return "NULL";
}
}
With this modification, when targetDate is a LocalDate, Athena will receive:
SELECT *
FROM mydb.vw_orders
WHERE customer_id = 12345
AND order_date <= DATE '2025-09-05'
ORDER BY order_date DESC
LIMIT 1;
Athena now interprets the date correctly and the query executes successfully.
Key Takeaways:
- Prepared statements in Athena do not work like traditional database prepared statements.
- They do not inherently prevent SQL injection.
- Using parameterized queries is still recommended for clarity and maintainability, but extra caution is needed for user input.
- Athena requires explicit typing for DATE comparisons.
- Always format LocalDate parameters into DATE 'YYYY-MM-DD' literals.
- Numbers and strings should also be safely formatted and escaped.
- Using a dedicated executor for parameter substitution keeps queries reusable and prevents errors.
- For production, consider using native Athena prepared statements or execution parameters via AWS SDK to avoid manual string substitution entirel y.
Top comments (0)