Drools & Databases, Part 1: Don't Call the DB from the DRL

#drools

One of the most commonly asked Drools questions that I see on StackOverflow is something along the lines of "How do I do a database query from Drools?"

The answer is simply: you don't.

OK, to be entirely truthful, you can but you shouldn't. This is the first in a series of posts in which I'm going to explain why.

Today we're going to talk about the problems on the database side of things; namely:

Database Access is Hard (to do right)

It's actually pretty easy to interact with a database in Java. Even if you're using the built-in JDBC APIs (eg. java.sql.Statement, etc.), it's just not that hard to open a connection, execute a statement, and get results. Libraries like Spring Data and JPA abstract even more away, sometimes going as far as automatically generating your queries for you.

But like many things, just because it's easy to do doesn't necessarily mean you're doing it right. For a toy application, you can get away with a lot of stuff. But as soon as you scale to production, now you need to think about things like connection pooling, transaction management & rollbacks, distributed transactions, backups, audits, and all the other fun stuff that comes with exposing an application to Actual Users.

What this means is that doing a naive JDBC query like this in your Drools ...

// in a DRL file ...
// DO NOT DO THIS. PLEASE.

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.Statement;
import java.sql.ResultSet;

function String getCustomerNameFromDatabase(UUID customerId) {
  Class.forName("com.mysql.jdbc.Driver");
  Connection connection = DriverManager.getConnection(
      "jdbc:mysql://my-db.example.com:33000/something",
      "admin",
      "Secr3t P@ssWd!"
  );
  Statement statement = connection.createStatement();
  ResultSet results = statement.executeQuery("select * from Customer where customer_id='" + customerId + "'");
  results.first();
  return results.getString("customer_name");
}

rule "Example Rule"
when
  $invoice: Invoice( $customerId: custId != null)
  $customerName: String() from getCustomerNameFromDatabase($customerId)
  Receipt( purchaser != $customerName)
then
  // simple use case: the customer name for the customer on the invoice
  // does not match the customer name on the receipt.
end

... is an absolute no-no.

Will it work? Yes, technically. In a toy app or a proof-of-concept that will be run once and never again, then maybe it is "ok". But is it OK for production? Absolutely not.

Ignoring the fact that we're not doing any connection pooling or transaction management -- where's your error handling? What happens if your SQL statement is malformed? What happens if your connection is closed from under you? Heck, what happens if one of these statements throws an exception?

Where do you make sure your connection actually gets closed?

And don't even get me started about the hard-coded database credentials saved in plaintext in your DRL file. Or the fact that we're not sanitizing our inputs.

A slightly more realistic example

... But ok, yes, I admit that's a contrived example. Hopefully by now you know that you need to handle things like Exceptions and do things like close your connections, statements, etc. after you use them, even if an exception occurs.

If we were doing this in Java, I would absolutely suggest using try-with-resources for these elements, since the major classes do properly implement AutoCloseable. However DRL files don't support the full Java SDK, primarily because they're interpreted files rather than compiled -- the Drools framework parses these files as strings and then compiles the bits as needed, with a little DSL magic. (That's a topic for another post, though!)

The point is that while try-with-resources is the modern Java 8+ way of doing our connection management to make sure we properly close our connections, statements, and other sockets, we can't use that in our DRL declared function. As a result, we're going to have to go use something that looks more "old school" ... and I'm meaning Java 5-style "old school".

function String getCustomerNameFromDatabase(UUID customerId) {
  Class.forName("com.mysql.jdbc.Driver");

  Connection connection = null;
  Statement statement = null;
  ResultSet results = null;
  try {
    connection = DriverManager.getConnection(
        "jdbc:mysql://my-db.example.com:33000/something",
        "admin",
        "Secr3t P@ssWd!"
    );
    statement = connection.createStatement();
    results = statement.executeQuery("select * from Customer where customer_id='" + customerId + "'");
    results.first();
    return results.getString("customer_name");
  } catch (Exception e) {
    if (results != nill) {
      try { results.close(); } catch (Exception ignored) {}
    }
    if (statement != null) {
      try { statement.close(); } catch (Exception ignored{}
    }
    if (connection != null) {
      try { connection.close(); } catch (Exception ignored) {}
    }
  }

  return null;
}

And there we go, that's our dangling Connection, Statement, and ResultSet taken care of. Nasty, eh?

But wait! There's more! After all, we're still vulnerable to SQL injection -- look at those unsanitized SQL queries!

Some basic cleanup

The simplest way to go about this would be to replace our naive string-concatenation with a PreparedStatement. This isn't actually that big of a deal to change, all things considered.

function String getCustomerNameFromDatabase(UUID customerId) {  
  Connection connection = null;
  PreparedStatement statement = null;
  ResultSet results = null;
  try {
    Class.forName("com.mysql.jdbc.Driver");
    connection = DriverManager.getConnection(
        "jdbc:mysql://my-db.example.com:33000/something",
        "admin",
        "Secr3t P@ssWd!"
    );
    String query = "select * from Customer where customer_id=?";
    statement = connection.prepareStatement(query);
    statement.setString(1, customerId.toString());
    results = statement.executeQuery();
    results.first();
    return results.getString("customer_name");
  } catch (Exception e) {
    if (results != nill) {
      try { results.close(); } catch (Exception ignored) {}
    }
    if (statement != null) {
      try { statement.close(); } catch (Exception ignored{}
    }
    if (connection != null) {
      try { connection.close(); } catch (Exception ignored) {}
    }
  }

  return null;
}

Honestly this is getting tedious, so fixing the ResultSet is left as an exercise for the reader. Consider: what do we do if the query returns no rows? more than one row?

The last part we'll address is the exposed database credentials. Here we're unfortunately trying to work around a limitation of the subset of languge features available in DRL functions: we can't use the DriverManager in its entirety because it relies on properties being automatically loaded from the classpath. This isn't available natively in DRL functions, so we're using the old-school Class.forName method which went out of vogue with Java 7.

A better design would be, admittedly, to pass the credentials into working memory. That would, at least, keep them from being hard-coded in plain-text in the DRL file itself.

So we'll create a simple POJO to pass into the rules:

class DbConnectionInfo {
  private String url;
  private String username;
  private String password;
  // getters, setters omitted for brevity
}

... and then pass that from the rules to the function:

function String getCustomerNameFromDatabase(DbConnectionInfo connectionInfo, UUID customerId) {  
  Connection connection = null;
  PreparedStatement statement = null;
  ResultSet results = null;
  try {
    Class.forName("com.mysql.jdbc.Driver");
    connection = DriverManager.getConnection(
        connectionInfo.getUrl(),
        connectionInfo.getUsername(),
        connectionInfo.getPassword()
    );
    String query = "select * from Customer where customer_id=?";
    statement = connection.prepareStatement(query);
    statement.setString(1, customerId.toString());
    results = statement.executeQuery();
    results.first();
    return results.getString("customer_name");
  } catch (Exception e) {
    if (results != nill) {
      try { results.close(); } catch (Exception ignored) {}
    }
    if (statement != null) {
      try { statement.close(); } catch (Exception ignored{}
    }
    if (connection != null) {
      try { connection.close(); } catch (Exception ignored) {}
    }
  }

  return null;
}

rule "Example Rule"
when
  $cxnInfo: DbConnectionInfo()

  $invoice: Invoice( $customerId: custId != null)
  $customerName: String() from getCustomerNameFromDatabase($cxnInfo, $customerId)
  Receipt( purchaser != $customerName)
then
  // simple use case: the customer name for the customer on the invoice
  // does not match the customer name on the receipt.
end

And there we go. Still awful, but marginally less so.

But that doesn't mean it's any good

Even with all that work, and 31 lines of code (not counting the not-yet-implemented ResultSet cleanup), this is just one database query, and it's spectacularly not production-ready.

For any production-grade database access, there's a certain minimum of hardening we need to do to properly prepare for the transactional volumes inherent in real-world applications.

Looking back at our example, you'll notice that we've got a pretty basic implementation, courtesy of the JDBC API. This is fine for when you're learning, but nobody that I know of actually uses pure JDBC like this in production. Our implementation doesn't support connection pooling or transaction management. And, honestly, the only way I know how to implement either of those is with libraries that aren't compatible with DRL functions.

I mentioned this previously, but the Java language features available for use in the DRL are a very restricted subset of the full Java language. Database interactions with modern libraries rely on features that are simply not available in DRL functions. We can't just slap a @Transactional annotation on the DRL function (no annotations in DRL! Or, well, not like this.) I suppose it might be possible to implement connection pooling via something like c3p0, but it would be likely a hundred or more lines of code and the results would be sub-optimal.

Moving beyond database hardening, the other major issue here is that this approach simply does not scale. In our example, all we're trying to do is get the Customer's name from the database. That's one very simple query, for one very distinct use case. No application ever stays that small for long. As the application grows and additional rules are implemented, each subsequent query will need to go through this same process and face these same issues.

Just one database "getter" function and its dependent rule was already a 31-to-5 line ratio. Can you just imagine how huge your DRL files will be as your application and number of queries scale?

Not only that, but imagine more complex queries. Our example was a trivial SELECT * to retrieve one row per certain conditions. Now imagine what update calls would look like, and how you'd have to implement (manually) roll backs. Or consider a delete call that cascades across multiple tables.

At the end of the day, DRL functions have a place and a purpose -- and this is not it.

The End

This isn't actually the end, we're just wrapping up this first section of my extended rant about why you shouldn't be trying to access the database from your rules. In this post we covered why you shouldn't try to do this by implementing DRL functions directly. Next time I'll talk about why even a properly implemented data access layer shouldn't be invoked from the rules.

Happy coding!

Cover image attribution: trondmyhre4 @ Pixabay