This blog post is about how to add a groupBy
type to your GraphQL API, built with Ruby on Rails and the graphql
gem. Since I won't cover basic usage of the graphql
gem, some prior knowledge is necessary to follow up. If you're totally new to this, you may should read this excellent tutorial at first: https://www.howtographql.com/graphql-ruby/0-introduction/.
So let's start grouping our records. At first, we have to think about the schema at all. My first approaches were based on arguments only, but it quickly got me into a dead end. GraphQL is strictly typed, and a solution that's based on arguments only may need to infer types, which is not possible with GraphQL. A field that you use for grouping could by any scalar type (like String
, Int
, Float
, Boolean
, ...), but an argument can't have multiple types and scalar unions aren't covered by the specification. So I've looked around and found an interesting proposal that someone wrote for Prisma: https://github.com/prisma/prisma/issues/1312. The schema is very straightforward and I don't have to deal with the typing issues of my previous approach.
Now let's create our desired query that contains grouping. Wouldn't it be nice to write one query that returns a list of users and their timetrackings between two dates? Imagine how much REST-endpoints you would have to query for this (unless you create a special route for it 😬):
query {
users {
edges {
node {
id
firstname
lastname
timetrackings(filter: { date: [">= 2018-08-27", "<= 2018-09-21"] }) {
groupBy {
date {
key
connection {
edges {
node {
id
duration
date
}
}
}
}
}
}
}
}
}
}
So let's start building this!
Creating the basic schema
We have two models: User
and Timetracking
, where users can have many timetrackings. Since we're following the Relay specification and using the class-based syntax of the graphql
-gem, we're creating a BaseObject
, which includes a special setup method for our connections, and a BaseNode
:
# /app/graphql/types/base_object.rb
# frozen_string_literal: true
module Types
class BaseObject < GraphQL::Schema::Object
field_class Types::BaseField
def self.setup_connection_filter_field(name, type, is_null, override_options = {})
# Prepare options
default_field_options = { type: type, null: is_null, connection: true }
field_options = default_field_options.merge(override_options)
# Create the field
field(name, field_options) do
argument :order_by, String, required: false
argument :page, Int, required: false
argument :per_page, Int, required: false
# Allow an override block to add more arguments
yield self if block_given?
end
end
end
end
# /app/graphql/types/base_node.rb
# frozen_string_literal: true
module Types
class BaseNode < BaseObject
implements GraphQL::Types::Relay::Node
global_id_field :id
field :created_at, Scalars::DateTime, null: false
field :updated_at, Scalars::DateTime, null: false
end
end
The setup_connection_filter_field
method is useful to create connections that have the same arguments across the whole schema. Let's use this for our User
and Timetracking
types:
# /app/graphql/types/user_type.rb
# frozen_string_literal: true
module Types
class UserType < BaseNode
graphql_name 'User'
field :firstname, String, null: false
field :lastname, String, null: false
field :email, String, null: false
setup_connection_filter_field(:timetrackings, Connections::TimetrackingConnection, true) do |field|
field.argument :filter, InputObjects::TimetrackingFilter, required: false
end
def timetrackings(**args)
Functions::Query.new(records: object.timetrackings, args: args).resolve
end
end
end
You may wonder about Functions::Query.new(records: object.timetrackings, args: args).resolve
, but I won't go too much into details here, since I'm going to cover this in an extra blog post. To cut a long story short: this is basically a class that fetches records and applies the filter
argument (which is set as an extra argument in the setup_connection_filter_field
) and the arguments that are shared across all connections (order_by
, page
and per_page
– for those who doesn't use Relay's pagination). Looking at InputObjects::TimetrackingFilter
, it's an InputObject
which defines argument that you can use to filter records (like date
in the example query above).
Now let's create the other types:
# /app/graphql/types/timetracking_type.rb
# frozen_string_literal: true
module Types
class TimetrackingType < BaseNode
graphql_name 'Timetracking'
field :duration, Float, null: false
field :date, Scalars::Date, null: false
field :description, String, null: true
field :user, UserType, null: false
field :daily_rate, Float, null: false
field :revenue, Float, null: false
end
end
# /app/graphql/edges/timetracking_edge.rb
# frozen_string_literal: true
module Edges
class TimetrackingEdge < Base
graphql_name 'TimetrackingEdge'
node_type Types::TimetrackingType
end
end
# /app/graphql/connections/timetracking_connection.rb
# frozen_string_literal: true
module Connections
class TimetrackingConnection < Base
graphql_name 'TimetrackingConnection'
edge_type Edges::TimetrackingEdge
end
end
We also need /app/graphql/edges/user_edge.rb
and /app/graphql/connections/user_connection.rb
, but this can be inferred from the previous example 😊
This may look like a lot of boilerplate, but the examples are very reduced to the basics. If we're going to add authorization to some fields or types, the extra classes will help much to achieve this.
And finally our basic query type:
# /app/graphql/types/query.rb
# frozen_string_literal: true
module Types
class Query < Types::BaseObject
graphql_name 'Query'
setup_connection_filter_field(:timetrackings, Connections::TimetrackingConnection, true) do |field|
field.argument :filter, InputObjects::TimetrackingFilter, required: false
end
def timetrackings(**args)
Functions::Query.new(records: Timetracking.all, args: args).resolve
end
setup_connection_filter_field(:users, Connections::UserConnection, true) do |field|
field.argument :filter, InputObjects::UserFilter, required: false
end
def users(**args)
Functions::Query.new(records: User.all, args: args).resolve
end
end
end
Now we're able to fetch users, timetrackings and their connections to each other. But what's about grouping? Let's extend our schema.
Adding the groupBy
field to the schema
Since we may want to add grouping to each of our connections later (or at least to some of them), we need a solution that won't add too much boilerplate.
At first, we want to add a groupBy
field to the timetracking connection, so let's do this:
# /app/graphql/connections/timetracking_connection.rb
# frozen_string_literal: true
module Connections
class TimetrackingConnection < Base
graphql_name 'TimetrackingConnection'
field :group_by, Types::TimetrackingGroupType, null: false
def group_by
object
end
edge_type Edges::TimetrackingEdge
end
end
As you can see, we've added an own TimetrackingGroupType
type, so let's have a look at this:
# /app/graphql/types/timetracking_group_type.rb
# frozen_string_literal: true
module Types
class TimetrackingGroupType < Types::BaseObject
graphql_name 'TimetrackingGroup'
setup_group_field(
key: :date,
key_type: String,
graphql_name: 'TimetrackingGroupDateResult',
connection_type: Connections::TimetrackingConnection
)
def date
Functions::Group.new(object.nodes, group_by: :date, arguments: object.arguments).resolve
end
end
end
Reducing boilerplate means, that we don't want to add an own type for each field that can be grouped. Imaging dozens of columns in large tables that could be used to generate reports – we would have to create an extra class for each of the fields, which then would have the same fields again and again: key
and connection
(which you may remember from our query at the beginning of the post). So there must be a way to generate this programmatically. The solution is simple: creating another field generator. It work's the same way like our setup_connection_filter_field
method and it's implemented in our BaseObject
type also:
# /app/graphql/types/base_object.rb
# frozen_string_literal: true
module Types
class BaseObject < GraphQL::Schema::Object
field_class Types::BaseField
def self.setup_group_field(key:, key_type:, graphql_name:, connection_type:)
# We don't want to add a new class (and therefore a new file) for every field that
# can be grouped, so we're generating the type dynamically.
type =
Class.new(Types::BaseObject) do
graphql_name graphql_name
field :key, key_type, null: false
setup_connection_filter_field :connection, connection_type, true
def key
object[:key]
end
def connection
object[:values]
end
end
field_options = { type: [type], null: false }
field(key, field_options)
end
def self.setup_connection_filter_field(name, type, is_null, override_options = {})
# Prepare options
default_field_options = { type: type, null: is_null, connection: true }
field_options = default_field_options.merge(override_options)
# Create the field
field(name, field_options) do
argument :order_by, String, required: false
argument :page, Int, required: false
argument :per_page, Int, required: false
# Allow an override block to add more arguments
yield self if block_given?
end
end
end
end
Now graphql
is going to create this class dynamically while the schema get's initialized and we don't have to worry too much about boilerplate code.
Let's dive into the last part: where does the grouping happens? Looking at the date
function in our /app/graphql/types/timetracking_group_type.rb
, you may have noticed this one: Functions::Group.new(::Timetracking, group_by: :date, arguments: object.arguments).resolve
.
It's another little helper we wrote, like the Functions::Query
class that we used above. This helper provides the key (for example the date) and the grouped values (the connection
field of the grouped query), which are used in our dynamically created class inside the setup_group_field
method:
# /app/graphql/functions/group.rb
# frozen_string_literal: true
module Functions
class Group
def initialize(records, group_by:, arguments:)
@records = records
@group_by = group_by
@arguments = arguments
end
def resolve
# This method returns data in a grouped way:
#
# [
# {
# key: '2018-05-02',
# values: [
# Record1,
# Record2,
# Record3
# ]
# }
# ]
#
# But performance would be very bad if we would iterate through the whole array
# each time we want to add a value to this key. That's why we start like this:
#
# {
# '2018-05-02': [
# Record1,
# Record2,
# Record3
# ]
# }
#
# This is way more performant and we can change this structure of the array,
# or rather the hash, afterwards
tmp = {}
records = Functions::Query.new(records: @records, args: @arguments).resolve
records.each do |record|
tmp[record[@group_by]] = [] unless tmp[record[@group_by]]
tmp[record[@group_by]] << record
end
result = []
tmp.each { |key, value| result << { key: key, values: value } }
result
end
end
end
As you can see, we won't group the records with SQL – but why? Looking back at our initial query, where we wanted to get grouped timetrackings of all users: the timetrackings are complete nodes of the Timetracking
type, but nested in a parent object which acts as a group. If we would want to group our records with SQL, we would have to aggregate the results. Imagine this query: Timetracking.select('date, duration').group('date')
. This cannot work, because we have multiple records for one date, and therefore multiple durations – but which duration should SQL pick to show in the resulted table of the query?
date | duration |
---|---|
2016-05-28 |
n records |
So we have to aggregate the duration
field to make this query work: Timetracking.select('date, max(duration) as max_duration').group('date')
:
date | duration |
---|---|
2016-05-28 | 200 |
But that's not what we want – we want all records, grouped by a specific column. Speaking in SQL, we could just get the distinct values of date
and then fire a query for each of the keys. But this would lead into one additional query per key. Assuming that you want to query timetrackings for a year, you would have 1 + 365 queries. And this is something that we definitely don't want to have ☝🏻Additionally, since we want to query all records in a period and just group them by a specific column, we should do this programmatically.
Now let's have a look at our Functions::Group
class again. As you can see, we're fetching all records of the given model, adding them to an object of keys and values, and transforming this object to an array afterwards, because our dynamically generated TimetrackingGroupDateResult
expects an array as type (👀 type: [type]
).
And that's it! Now we can run our initial query and get the data that we want:
Top comments (0)