DEV Community

Peter Oesteritz
Peter Oesteritz

Posted on • Updated on

How to add a `groupBy` field to your GraphQL API

This blog post is about how to add a groupBy type to your GraphQL API, built with Ruby on Rails and the graphql gem. Since I won't cover basic usage of the graphql gem, some prior knowledge is necessary to follow up. If you're totally new to this, you may should read this excellent tutorial at first: https://www.howtographql.com/graphql-ruby/0-introduction/.

So let's start grouping our records. At first, we have to think about the schema at all. My first approaches were based on arguments only, but it quickly got me into a dead end. GraphQL is strictly typed, and a solution that's based on arguments only may need to infer types, which is not possible with GraphQL. A field that you use for grouping could by any scalar type (like String, Int, Float, Boolean, ...), but an argument can't have multiple types and scalar unions aren't covered by the specification. So I've looked around and found an interesting proposal that someone wrote for Prisma: https://github.com/prisma/prisma/issues/1312. The schema is very straightforward and I don't have to deal with the typing issues of my previous approach.

Now let's create our desired query that contains grouping. Wouldn't it be nice to write one query that returns a list of users and their timetrackings between two dates? Imagine how much REST-endpoints you would have to query for this (unless you create a special route for it 😬):

query {
  users {
    edges {
      node {
        id
        firstname
        lastname
        timetrackings(filter: { date: [">= 2018-08-27", "<= 2018-09-21"] }) {
          groupBy {
            date {
              key
              connection {
                edges {
                  node {
                    id
                    duration
                    date
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

So let's start building this!

Creating the basic schema

We have two models: User and Timetracking, where users can have many timetrackings. Since we're following the Relay specification and using the class-based syntax of the graphql-gem, we're creating a BaseObject, which includes a special setup method for our connections, and a BaseNode:

# /app/graphql/types/base_object.rb
# frozen_string_literal: true

module Types
  class BaseObject < GraphQL::Schema::Object
    field_class Types::BaseField

    def self.setup_connection_filter_field(name, type, is_null, override_options = {})
      # Prepare options
      default_field_options = { type: type, null: is_null, connection: true }
      field_options = default_field_options.merge(override_options)

      # Create the field
      field(name, field_options) do
        argument :order_by, String, required: false
        argument :page, Int, required: false
        argument :per_page, Int, required: false

        # Allow an override block to add more arguments
        yield self if block_given?
      end
    end
  end
end
# /app/graphql/types/base_node.rb
# frozen_string_literal: true

module Types
  class BaseNode < BaseObject
    implements GraphQL::Types::Relay::Node

    global_id_field :id

    field :created_at, Scalars::DateTime, null: false
    field :updated_at, Scalars::DateTime, null: false
  end
end

The setup_connection_filter_field method is useful to create connections that have the same arguments across the whole schema. Let's use this for our User and Timetracking types:

# /app/graphql/types/user_type.rb
# frozen_string_literal: true

module Types
  class UserType < BaseNode
    graphql_name 'User'

    field :firstname, String, null: false
    field :lastname, String, null: false
    field :email, String, null: false

    setup_connection_filter_field(:timetrackings, Connections::TimetrackingConnection, true) do |field|
      field.argument :filter, InputObjects::TimetrackingFilter, required: false
    end

    def timetrackings(**args)
      Functions::Query.new(records: object.timetrackings, args: args).resolve
    end
  end
end

You may wonder about Functions::Query.new(records: object.timetrackings, args: args).resolve, but I won't go too much into details here, since I'm going to cover this in an extra blog post. To cut a long story short: this is basically a class that fetches records and applies the filter argument (which is set as an extra argument in the setup_connection_filter_field) and the arguments that are shared across all connections (order_by, page and per_page – for those who doesn't use Relay's pagination). Looking at InputObjects::TimetrackingFilter, it's an InputObject which defines argument that you can use to filter records (like date in the example query above).

Now let's create the other types:

# /app/graphql/types/timetracking_type.rb
# frozen_string_literal: true

module Types
  class TimetrackingType < BaseNode
    graphql_name 'Timetracking'

    field :duration, Float, null: false
    field :date, Scalars::Date, null: false
    field :description, String, null: true
    field :user, UserType, null: false
    field :daily_rate, Float, null: false
    field :revenue, Float, null: false
  end
end
# /app/graphql/edges/timetracking_edge.rb
# frozen_string_literal: true

module Edges
  class TimetrackingEdge < Base
    graphql_name 'TimetrackingEdge'

    node_type Types::TimetrackingType
  end
end
# /app/graphql/connections/timetracking_connection.rb
# frozen_string_literal: true

module Connections
  class TimetrackingConnection < Base
    graphql_name 'TimetrackingConnection'

    edge_type Edges::TimetrackingEdge
  end
end

We also need /app/graphql/edges/user_edge.rb and /app/graphql/connections/user_connection.rb, but this can be inferred from the previous example 😊

This may look like a lot of boilerplate, but the examples are very reduced to the basics. If we're going to add authorization to some fields or types, the extra classes will help much to achieve this.

And finally our basic query type:

# /app/graphql/types/query.rb
# frozen_string_literal: true

module Types
  class Query < Types::BaseObject
    graphql_name 'Query'

    setup_connection_filter_field(:timetrackings, Connections::TimetrackingConnection, true) do |field|
      field.argument :filter, InputObjects::TimetrackingFilter, required: false
    end

    def timetrackings(**args)
      Functions::Query.new(records: Timetracking.all, args: args).resolve
    end

    setup_connection_filter_field(:users, Connections::UserConnection, true) do |field|
      field.argument :filter, InputObjects::UserFilter, required: false
    end

    def users(**args)
      Functions::Query.new(records: User.all, args: args).resolve
    end
  end
end

Now we're able to fetch users, timetrackings and their connections to each other. But what's about grouping? Let's extend our schema.

Adding the groupBy field to the schema

Since we may want to add grouping to each of our connections later (or at least to some of them), we need a solution that won't add too much boilerplate.

At first, we want to add a groupBy field to the timetracking connection, so let's do this:

# /app/graphql/connections/timetracking_connection.rb
# frozen_string_literal: true

module Connections
  class TimetrackingConnection < Base
    graphql_name 'TimetrackingConnection'

    field :group_by, Types::TimetrackingGroupType, null: false

    def group_by
      object
    end

    edge_type Edges::TimetrackingEdge
  end
end

As you can see, we've added an own TimetrackingGroupType type, so let's have a look at this:

# /app/graphql/types/timetracking_group_type.rb
# frozen_string_literal: true

module Types
  class TimetrackingGroupType < Types::BaseObject
    graphql_name 'TimetrackingGroup'

    setup_group_field(
      key: :date,
      key_type: String,
      graphql_name: 'TimetrackingGroupDateResult',
      connection_type: Connections::TimetrackingConnection
    )

    def date
      Functions::Group.new(object.nodes, group_by: :date, arguments: object.arguments).resolve
    end
  end
end

Reducing boilerplate means, that we don't want to add an own type for each field that can be grouped. Imaging dozens of columns in large tables that could be used to generate reports – we would have to create an extra class for each of the fields, which then would have the same fields again and again: key and connection (which you may remember from our query at the beginning of the post). So there must be a way to generate this programmatically. The solution is simple: creating another field generator. It work's the same way like our setup_connection_filter_field method and it's implemented in our BaseObject type also:

# /app/graphql/types/base_object.rb
# frozen_string_literal: true

module Types
  class BaseObject < GraphQL::Schema::Object
    field_class Types::BaseField

    def self.setup_group_field(key:, key_type:, graphql_name:, connection_type:)
      # We don't want to add a new class (and therefore a new file) for every field that
      # can be grouped, so we're generating the type dynamically.
      type =
        Class.new(Types::BaseObject) do
          graphql_name graphql_name

          field :key, key_type, null: false
          setup_connection_filter_field :connection, connection_type, true

          def key
            object[:key]
          end

          def connection
            object[:values]
          end
        end

      field_options = { type: [type], null: false }
      field(key, field_options)
    end

    def self.setup_connection_filter_field(name, type, is_null, override_options = {})
      # Prepare options
      default_field_options = { type: type, null: is_null, connection: true }
      field_options = default_field_options.merge(override_options)

      # Create the field
      field(name, field_options) do
        argument :order_by, String, required: false
        argument :page, Int, required: false
        argument :per_page, Int, required: false

        # Allow an override block to add more arguments
        yield self if block_given?
      end
    end
  end
end

Now graphql is going to create this class dynamically while the schema get's initialized and we don't have to worry too much about boilerplate code.

Let's dive into the last part: where does the grouping happens? Looking at the date function in our /app/graphql/types/timetracking_group_type.rb, you may have noticed this one: Functions::Group.new(::Timetracking, group_by: :date, arguments: object.arguments).resolve.

It's another little helper we wrote, like the Functions::Query class that we used above. This helper provides the key (for example the date) and the grouped values (the connection field of the grouped query), which are used in our dynamically created class inside the setup_group_field method:

# /app/graphql/functions/group.rb
# frozen_string_literal: true

module Functions
  class Group
    def initialize(records, group_by:, arguments:)
      @records = records
      @group_by = group_by
      @arguments = arguments
    end

    def resolve
      # This method returns data in a grouped way:
      #
      # [
      #   {
      #     key: '2018-05-02',
      #     values: [
      #       Record1,
      #       Record2,
      #       Record3
      #     ]
      #   }
      # ]
      #
      # But performance would be very bad if we would iterate through the whole array
      # each time we want to add a value to this key. That's why we start like this:
      #
      # {
      #   '2018-05-02': [
      #     Record1,
      #     Record2,
      #     Record3
      #   ]
      # }
      #
      # This is way more performant and we can change this structure of the array,
      # or rather the hash, afterwards

      tmp = {}
      records = Functions::Query.new(records: @records, args: @arguments).resolve

      records.each do |record|
        tmp[record[@group_by]] = [] unless tmp[record[@group_by]]
        tmp[record[@group_by]] << record
      end

      result = []
      tmp.each { |key, value| result << { key: key, values: value } }

      result
    end
  end
end

As you can see, we won't group the records with SQL – but why? Looking back at our initial query, where we wanted to get grouped timetrackings of all users: the timetrackings are complete nodes of the Timetracking type, but nested in a parent object which acts as a group. If we would want to group our records with SQL, we would have to aggregate the results. Imagine this query: Timetracking.select('date, duration').group('date'). This cannot work, because we have multiple records for one date, and therefore multiple durations – but which duration should SQL pick to show in the resulted table of the query?

date duration
2016-05-28 n records

So we have to aggregate the duration field to make this query work: Timetracking.select('date, max(duration) as max_duration').group('date'):

date duration
2016-05-28 200

But that's not what we want – we want all records, grouped by a specific column. Speaking in SQL, we could just get the distinct values of date and then fire a query for each of the keys. But this would lead into one additional query per key. Assuming that you want to query timetrackings for a year, you would have 1 + 365 queries. And this is something that we definitely don't want to have ☝🏻Additionally, since we want to query all records in a period and just group them by a specific column, we should do this programmatically.

Now let's have a look at our Functions::Group class again. As you can see, we're fetching all records of the given model, adding them to an object of keys and values, and transforming this object to an array afterwards, because our dynamically generated TimetrackingGroupDateResult expects an array as type (πŸ‘€ type: [type]).

And that's it! Now we can run our initial query and get the data that we want:

Screenshot of Postman, showing the query and the result

Top comments (0)