Forb Yuan

Posted on Dec 19

What‘s the Real Problem for ORMs

The designers and developers of ORMs have never truly understood the real problems they need to address. Once this problem is solved, the ORM will no longer be an ORM.

The Dynamic Query Problem and Combinatorial Mathematics

Let's revisit the dynamic query problem first.

In an information system, developers provide users with a query interface containing n query parameters. The user fills out k of these parameters to make a request $(k \in [0, n])$. The system constructs a query statement based on these k parameters. Developers need to combine the corresponding query conditions for every possible combination of k query parameters, which is known as the dynamic query problem.

The core of this problem is the introduction of an uncertain variable k by the end-user. SQL, in its original design, was only intended for developers and did not account for user input. Therefore, each SQL statement is static, corresponding to only one set of query conditions. Since the ORM framework does not address this issue, developers need to combine the corresponding query conditions based on the user's input.

Assuming the n query parameters are independent and the order of the query conditions is fixed, this problem can be described using combinatorial mathematics as a subset selection problem: choosing k elements from a set of n elements provided by the query interface to form a subset.

$$
\sum_{k=0}^{n} \binom{n}{k} = 2^n
$$

From the above formula, we can see that for a query interface with n query parameters, there are $2^n$ possible query clauses. For example, if there are 3 query parameters, 8 query clauses need to be written; if there are 10 query parameters, 1024 query clauses are needed. Both writing and maintaining this are extremely difficult.

ORM frameworks have not paid attention to this problem, nor have they provided a solution. A natural way is to use if statements to check the query parameters and determine whether the corresponding query condition should be concatenated. Since each if statement produces two possible outcomes (TRUE or FALSE), corresponding to whether or not to execute the query condition concatenation within the if block, the n if statements result in $2^n$ possible concatenation outcomes.

At first glance, this problem seems to be solved, but the issue with this approach is that developers must write an if statement for each query parameter. As the number of query parameters increases, the number of if statements grows, and code maintenance still remains difficult.

This is the real problem that needs to be solved in database access, instead of just mapping object models to relational models.

Query Object Mapping Method

When we use the Introduce Parameter Object refactoring method to group all the query parameters into a single object, we get an object with n fields, and the construction of query clauses depends solely on this object.

For an object with n fields, each field can either be assigned or unassigned, resulting in $2^n$ possible combinations of assignments. If each field maps to a query condition, we can use the $2^n$ combinations of assignments to construct $2^n$ query clauses.

Thus, we shift our approach to map query clauses through an object instead of mapping objects to relations. We call this object used to construct query clauses a Query Object, and the method of combining query conditions into query clauses based on the field assignments of the Query Object is called the Query Object Mapping Method.

Object Mapping

For programming languages that support reflection, we can use reflection to read the values of the fields and determine whether each field's assignment should be mapped to a query condition. The algorithm is as follows:

Traverse the fields of the Query Object instance.
Use reflection to obtain the value of each field and map the assigned fields to query conditions.
Use logical operators such as AND to combine the query conditions into query clauses.

One simple way to map fields to query conditions is by using annotations to declare query conditions along with the fields.

Reflection and annotation declarations help us encapsulate the query clause construction code into a framework that is available for all developers to use, greatly simplifying their work.

For example, for a UserQuery object defined by developers, the framework can use the algorithm described above to construct the corresponding query clause based on the UserQuery assignment:

public class UserQuery {
    @Condition("name LIKE CONCAT('%', ?, '%')")
    private String nameLike;
    @Condition("age > ?")
    private Integer ageGt;
    @Condition("valid = ?")
    private Boolean valid;
}
// a possible query clause: WHERE age > ? and valid = ?

After confirming the advantages of object mapping over the if statement concatenation approach, we further discuss how to construct different query conditions based on field characteristics to avoid maintaining string conditions.

Field Mapping

In SQL, query conditions mainly include the following three types:

Here is the translation:

Comparison Query Conditions: For example, age > ?, this condition is represented using Predicate Logic. Predicate logic is used to express basic comparison operations, such as equality (=), greater than (>), less than (<), greater than or equal to (≥), less than or equal to (≤), and not equal to (≠).
Logical Query Conditions: These are formed using Boolean Algebra, combining multiple conditions with logical operators such as AND, OR, and NOT. These conditions express the logical relationships between multiple query conditions. For example, age > 30 AND valid = true is a logical query condition where two conditions are combined using the "AND" operator.
Subquery Conditions: These conditions involve a nested query, typically based on Relational Algebra. Relational algebra provides the mathematical foundation for database queries, used to express relationships between tables and nested queries. For example, age > (SELECT avg(age) FROM t_user) is a subquery condition, where avg(age) is a subquery that returns a result and is compared with a field in the outer query.

Based on these mathematical principles, we design three types of fields to construct query conditions.

Mapping Comparison Query Conditions via Predicate Suffix Fields: Comparison query conditions are usually composed of column names, comparison operators, and parameters. In DSLs (domain-specific languages), predicate phrases are often used to represent comparison operators. For example, eq represents equality (=), gt represents greater than (>), and so on. condition.gt("age", 30) represents the query condition age > 30.

We append the predicate phrase to the column name to represent the query condition. For example, the field ageGt represents the query condition age > ?. Similar suffixes include Eq, Ne, Ge, Lt, Le, In, NotIn, Null, Like, etc., which allows us to map various comparison query conditions through field suffixes.

Constructing Logical Query Conditions via Logic Suffix Fields: Logical query conditions are a group of query conditions connected by logical operators such as AND or OR.

The type of logic suffix fields is either a collection or a query object used to construct multiple query conditions, where each element or field corresponds to one query condition.

The name of the logical suffix field includes the logical suffix And or Or, indicating the logical operator that connects multiple query conditions.

Constructing Subquery Conditions via Subquery Fields: Subquery fields should be query objects.

For example, for a subquery condition like age > (SELECT avg(age) FROM t_user), we can break it down into three parts:

For the condition part age >, we can reuse the mapping method for predicate suffix fields. However, to avoid naming conflicts with the existing predicate suffix field ageGt, we need to add distinguishing characters, such as ageGtAvg.
For the main subquery part SELECT avg(age) FROM t_user, we can declare the column name and table name using annotations like @Subquery(select = "avg(age)", from = "t_user"), or define it in the field name like ageGtAvgAgeOfUser.
For the WHERE clause of the subquery, we can reuse the query object mapping method.

With these three types of fields, we can automatically construct most query conditions. For other query conditions, we can continue developing new methods to support them.

Achievement

Through object and field mapping, the query object now has the following four key characteristics:

Constructs comparison query conditions;
Constructs logical query conditions;
Constructs subquery conditions;
Dynamically combines query conditions based on query parameters.

From a mathematical theory perspective, the query object mapping method effectively addresses the uncertainty introduced by end users and constructs various query conditions in SQL, forming an object-based dynamic query language.

Compared to the static query clauses in SQL, this dynamic query language provides the additional feature of dynamically combining query conditions. On this basis, we can continue constructing other parts of the SQL statement.

Implementations

The query object mapping method constructs corresponding query conditions solely through the field metadata, making it applicable to any object-oriented programming language. Here are examples in Java and Go.

Java Example:

public class UserQuery {// WHERE
  String nameLike       // AND name LIKE ?
  Integer ageGt;        // AND age > ?
  Integer ageLe;        // AND age <= ?
  Boolean valid;        // AND valid = ?
  UserQuery userOr;     // AND (age > ? OR age <= ? OR valid = ?)
  @Subquery(select = "avg(age)", from = "t_user")
  UserQuery ageGtAvg;   // AND age > (SELECT avg(age) FROM t_user [WHERE])
}

GitHub: http://github.com/doytowin/doyto-query

Go Example:

type UserQuery struct {   // WHERE
    NameLike *string       // AND name LIKE ?
    AgeGt    *int          // AND age > ?
    AgeLe    *int          // AND age <= ?
    Valid    *bool         // AND valid = ?
    UserOr   *[]UserQuery  // AND

 (age > ? OR age <= ? OR valid = ?)
                           // AND age > (SELECT avg(age) FROM t_user [WHERE])
    ScoreGtAvg *UserQuery `subquery:"select:avg(age),from:t_user"`
}

GitHub: http://github.com/doytowin/goooqo

Each field corresponds to one or more query conditions, and the conditions are combined based on field assignments to form the final query clause. Logical and subquery conditions can also be constructed by reusing query objects. These advantages are not available in SQL, which is a static language.

By defining query objects in this way, developers no longer need to explicitly write if statements to concatenate query conditions. The framework can use reflection to read the assignment of each field and implicitly include the assignment checks and query condition concatenation in the framework code, greatly simplifying the code writing and maintenance for dynamic queries.

Discussion

For pagination and sorting, we can still define the relevant parameters in the query object and construct the pagination and sorting clauses based on these parameters when generating the SQL statement. We simply need to declare that these parameters are not used to construct query conditions.

We can also generate the previously manually written if statement concatenation code for the query object through code generation. This would eliminate the performance impact of reflection and support programming languages that do not provide reflection.

Furthermore, the query object can also be used to construct MongoDB queries:

{
  "$and": [
    {"age": {"$gt": {}}},
    {"age": {"$lte": {}}},
    {"memo": null},
    {"memo": {"$regex": {}}},
    {"valid": {"$eq": {}}},
    {"$or": [{}, {}, {}]}
  ]
}

(The empty objects are placeholders similar to SQL.)

Since all query languages are designed based on the same mathematical principles, databases like MongoDB, Redis, and ElasticSearch have compared and converted their query languages to SQL. Some databases even directly support portions of the SQL standard. Therefore, the query object mapping method we designed based on these mathematical principles is not only applicable to all object-oriented programming languages, but also to all database query languages. This goes beyond the theoretical scope of ORM.

As for complex queries involving aggregation and joins, these can be constructed using a view object mapping method, which is beyond the scope of this paper.

Conclusion

This paper categorizes the dynamic query problem as a subset selection problem in combinatorial mathematics, proposes directly using query object assignments to handle the assignment combinations of query conditions, and fully utilizes the characteristics of objects and fields to construct various query conditions in SQL. This leads to the development of an object-based dynamic query language, becoming a more efficient database access solution than ORM.

DEV Community

What‘s the Real Problem for ORMs

The Dynamic Query Problem and Combinatorial Mathematics

Query Object Mapping Method

Object Mapping

Field Mapping

Achievement

Implementations

Discussion

Conclusion

Top comments (0)

Read next

Announcing Local Development Support!

New API Playground in the Zuplo Developer Portal

Optimize Time-To-First-Call with API key auth

Creating a Simple CRUD API with Zuplo and Firestore