DEV Community: Ritikesh

Dynamic JWT authentication and secrets rotation in Rails Applications

Ritikesh — Sun, 27 Feb 2022 13:17:03 +0000

Introduction

JWT is one of the most popular authentication & authorization techniques employed in modern applications. There are several articles and guides available on how to get started with JWT in any application or framework. In this post however, we will talk about some lesser talked about tricks of JWT, specifically in the context of large applications.

Generally speaking, the larger the application, the more internal and external services it has to talk to. External services usually have their own way of authenticating and authorizing third party API calls. With internal systems however, organisations prefer to use JWT tokens because of their inherent flexibility and versatility. A sample JWT based handshake between 2 rails applications using ruby-jwt would look like this -

# caller
payload = {
  "iss": "auth_service",
  "exp": Time.now.to_i + 1.minute,
  "aud": "main_application",
  "resources": ["update_user"]
} # most common attributes, but not limited to these.

#defaults to HMAC
token = JWT.encode(payload,
             Rails.application.credentials.jwt_secrets.main_application)
# make API call to main_application with token

# callee
# common auth service
token = request.headers['Authorization']
payload = JWT.decode(token, Rails.application.credentials.jwt_secrets.auth_service)
# use payload to authorize the request resources being accessed

Dynamic JWT authentication

The above is a simplified take of how one would authorize API requests. However, as mentioned earlier, larger applications generally talk to a lot of services. To avoid service-to-service dependencies and to maintain a healthy security posture, it is advisable to have unique secrets for each service the application talks to. To support authorizing the ever growing list of services, the authentication logic needs to be implicitly generic.

This is where the flexibility of JWT tokens come into the picture. JWT payloads usually advise carrying an issuer attribute, which points to the original issuer of the token. In this case, our third party services. We can use this attribute from the payload to identify which service has issued this token when making the API request.

Coming from a traditional authentication system like encryption or hashing, one might argue - To identify the issuer from the payload, I would first need to decode the payload for which I need the secret. But to get the secret, I need to know the issuer from the payload. DEADLOCK.

This is where the versatility of JWT comes to the fore. One does not need the secret to decode a JWT payload. In fact, you can decode any JWT token on JWT's website without ever needing the secret. However, it is advisable to ALWAYS verify the claims.

Coming back to our requirement of identifying the service using the JWT payload, the ruby-jwt gem allows a block to be passed to the decode method, allowing access to the original payload. The return value of the block would then be used to verify the claim. Our earlier example can now be tweaked slightly to make it generic -

# caller logic does not change
payload = {
  "iss": "auth_service",
  "exp": Time.now.to_i + 1.minute,
  "aud": "main_application",
  "resources": ["update_user"]
} # most common attributes, but not limited to these.

#defaults to HMAC
token = JWT.encode(payload,
             Rails.application.credentials.jwt_secrets.main_application)
# make API call to main_application with token

# callee
# common auth service
token = request.headers['Authorization']
payload = JWT.decode(token) do |payload|
  Rails.application.credentials.jwt_secrets[payload['iss']]
end
# use payload to authorize the request resources being accessed

Rotating JWT secrets

Securing applications is of the utmost importance in today's digital first world. Inspite of all the preventive measures organisations take, there is never a zero vulnerability guarantee. In such an environment, teams must always be prepared to respond to security incidents.

In the event of security incidents involving secret/data leaks, the leaked tokens/secrets are first rotated to ensure bad actors are not able to fully leverage the exploit. A common problem with rotating secrets in large applications is that it is very hard to coordinate the changes across multiple systems at the same time.

This can also be solved natively with ruby-jwt (this was recently added to the library), which allows verifying claims against multiple secrets if the finding-a-key block returns an Array. For the first deployment - the application would need to maintain the old and the new secrets in the secrets hash against the issuer. Sample below -

# multiple secrets are supported for each issuer
secrets = { 'auth_service' => ['old_secret', 'new_secret'] }

JWT.decode(token) do |payload|
  secrets[payload['iss']]
end

Once the downstream services completely switch to the new secret, the application can remove support for the older secret.

# clean up the older secret
secrets = { 'auth_service': 'new_secret' }

JWT.decode(token) do |payload|
  secrets[payload['iss']]
end

This allows rotating secrets easily in production without impacting live systems.

Design a multitenant application on Rails 6 with horizontal sharding

Ritikesh — Fri, 25 Dec 2020 08:44:47 +0000

One of the most common design patterns for multitenant architectures is to associate every tenant with a unique subdomain on your root domain. For eg. if your application runs on example.com, marvel as a tenant would access the system using marvel.example.com and so on.

This pattern has its own advantages(easy/faster DNS resolution when running on a multi pod setup) and disadvantages(DNS updates for every tenant creation). Instead of debating that, we will delve into how to implement this architecture in a Rails application using the new multi & horizontal DB setup provided by Rails 6.0/6.1.

To begin with, we will need a Tenant model. Since your tenants will be identified by subdomains, it makes sense to have a subdomain column in the table along with other application required attributes. Each tenant belongs to a Shard and all data of that tenant would reside on that shard. So we will need a shard model as well.

We can begin by setting up the required database configurations first:

# config/database.yml

default: &default
  adapter: sqlite3
  pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>
  timeout: 5000

development:
  default:
    <<: *default
    database: primary_db
  default_replica:
    <<: *default
    database: primary_db_replica
    replica: true
  shard1:
    <<: *default
    database: shard1_db
  shard1_replica:
    <<: *default
    database: shard1_db_replica
    replica: true

We will define the required models as well accordingly.

# app/models/application_record.rb

# frozen_string_literal: true
class ApplicationRecord < ActiveRecord::Base
  self.abstract_class = true

  db_configs = Rails.application.config.database_configuration[Rails.env].keys

  db_configs = db_file.each_with_object({}) do |key, configs|
    # key = default, db_key = default
    # key = default_replica, db_key = default
    db_key = key.gsub('_replica', '')
    role = key.eql?(db_key) ? :writing : :reading

    db_key = db_key.to_sym
    configs[db_key] ||= {}

    configs[db_key][role] = key.to_sym
  end

  # connects_to shards: {
  #   default: { writing: :default, reading: :default_replica },
  #   shard1: { writing: :shard1, reading: :shard1_replica }
  # }
  connects_to shards: db_configs
end

# app/models/global_record.rb

# frozen_string_literal: true
class GlobalRecord < ActiveRecord::Base
  self.abstract_class = true

  connects_to database: { writing: :default, reading: :default_replica }
end

# app/models/tenant.rb

# frozen_string_literal: true
class Tenant < ApplicationRecord
  include ActsAsCurrent

  validates :subdomain, format: { with: DOMAIN_REGEX }
  # other DSL

  after_commit :set_shard, on: :create

  private

  def set_shard
    Shard.create!(tenant_id: self.id, domain: subdomain)
  end
end

# app/models/shard.rb

# frozen_string_literal: true
class Shard < GlobalRecord
  include ActsAsCurrent

  validates :domain, format: { with: DOMAIN_REGEX }
  validates :tenant_id

  before_create :set_current_shard

  private

  def set_current_shard
    self.shard = APP_CONFIGS[:current_shard] #shard1
  end
end

With multitenant architectures, there will always be a global context and a tenant specific context. We isolate such models through abstract classes ApplicationRecord and GlobalRecord. They also take care of abstracting database connections and setting up the required isolations.

We can also leverage the BelongsToTenant pattern for all models that belong to a tenant and inherit from ApplicationRecord.

All ActiveRecord inherited models connect by default to a default shard and a writing role unless connected_to another connection. Hence, when connecting to GlobalRecord inherited models, we will not require any explicit connection handling.

We can also define a proxy class to abstract out all application specific connection handling logic:

# app/proxies/database_proxy.rb

# frozen_string_literal: true
class DatabaseProxy
  class << self
    def on_shard(shard: , &block)
      _connect_to_(role: :writing, shard: shard, &block)
    end

    def on_replica(shard: , &block)
      _connect_to_(role: :reading, shard: shard, &block)
    end

    def on_global_replica(&block)
      _connect_to_(klass: GlobalRecord, role: :reading, &block)
    end

    # for regular executions, since Global only connects to default shard,
    # no explicit connection switching is required.
    # def on_global(&block)
    #   _connect_to_(klass: GlobalRecord, role: :writing, &block)
    # end

    private

    def _connect_to_(klass: ApplicationRecord, role: :writing, shard: :default, &block)
      klass.connected_to(role: role, shard: shard) do
        block.call
      end
    end
  end
end

With this setup in place, we can now write both application and background middlewares that handle shard selection and tenant isolation on a per request or job basis.

# lib/middlewares/multitenancy.rb

# frozen_string_literal: true
module Middlewares
  # selecting account based on subdomain
  class Multitenancy
    def initialize(app)
      @app = app
    end

    def call(env)
      domain = env['HTTP_HOST']

      shard = Shard.find_by(domain: domain)
      return @app.call(env) unless shard

      shard.make_current
      DatabaseProxy.on_shard(shard: shard.shard) do
        account = Account.find_by(subdomain: domain)

        account&.make_current
        @app.call(env)
      end
    end
  end
end

# config/application.rb
require 'lib/middlewares/multitenancy'

config.middleware.insert_after Rails::Rack::Logger, Middlewares::Multitenancy

Anybody who's building new products on the web, Ruby on Rails has never been better to kickstart your next big unicorn.

Multitenant Architecture on Rails 6.1

Ritikesh — Thu, 24 Dec 2020 04:01:28 +0000

Rails, the framework built on top of Ruby, just got its latest version(6.1) released. A lot of features and enhancements have gone into the latest version of Rails. You can read the official announcement for more details.

I will be focusing particularly on the Multi-DB improvements section, what changed and how we can leverage Rails' native multi DB handling techniques for building scalable multitenant applications.

Rails 6.0 was the first official rails version to support multiple databases. From the release notes:

The new multiple database support makes it easy for a single application to connect to, well, multiple databases at the same time! You can either do this because you want to segment certain records into their own databases for scaling or isolation, or because you’re doing read/write splitting with replica databases for performance. Either way, there’s a new, simple API for making that happen without reaching inside the bowels of Active Record. The foundational work for multiple-database support was done by Eileen Uchitelle and Aaron Patterson.

This allowed application developers to be able to define multiple database connections for a single application. Before this, developers had to use one of the many third party gems for any kind of multi DB support in Rails. Even though the ruby/rails community is very vibrant, third party gems often come with maintenance overheads with respect to upgrades, breaking changes, bugs, performance issues, etc.

With Rails 6.0, you could define your database.yml in such a way:

# config/database.yml

default: &default
  adapter: sqlite3
  pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>
  timeout: 5000

development:
  primary:
    <<: *default
    database: primary_db
  primary_replica:
    <<: *default
    database: primary_db_replica
    replica: true
  animals:
    <<: *default
    database: animals_db
  animals_replica:
    <<: *default
    database: animals_db_replica
    replica: true

Then define ActiveRecord Abstract classes that could connect to these databases.

# app/models/application_record.rb

# frozen_string_literal: true
class ApplicationRecord < ActiveRecord::Base
  connects_to database: { writing: :primary, reading: :primary_replica }
end

# app/models/animals_base.rb

# frozen_string_literal: true
class AnimalsBase < ApplicationRecord
  connects_to database: { writing: :animals, reading: :animals_replica }
end

# app/models/user.rb

# frozen_string_literal: true
class User < ApplicationRecord
end

The abstract classes and models inheriting from them would both now have access to the connected_to method which can be used to establish connection to the configured database connections.

# some_controller.rb

# frozen_string_literal: true
ApplicationRecord.connected_to(role: :reading) do
  User.do_something_thats_slow
end

This approach worked great for primary-replica setup or setups where models had clear separation. i.e. a model always queried from a single database. However, with modern multi-tenant SaaS applications, horizontal sharding is almost a basic necessity. Depending on the tenant that's accessing the application, the application should be able to select which database it wants to query the data from. While how the application shards horizontally is DSL and can vary from a case to case basis, how it is able to connect to the underlying databases should be something that the framework should be able to handle. And so they did.

With Multi-DB improvements released in 6.1, you can now define shard connections for your abstract classes as well. The example from above changes as:

# config/database.yml

default: &default
  adapter: sqlite3
  pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>
  timeout: 5000

development:
  primary:
    <<: *default
    database: primary_db
  primary_replica:
    <<: *default
    database: primary_db_replica
    replica: true
  animals:
    <<: *default
    database: animals_db
  animals_replica:
    <<: *default
    database: animals_db_replica
    replica: true
  animals_shard1:
    <<: *default
    database: animals_db1
  animals_shard1_replica:
    <<: *default
    database: animals_db1_replica
    replica: true

# app/models/application_record.rb

# frozen_string_literal: true
class ApplicationRecord < ActiveRecord::Base
  connects_to database: { writing: :primary, reading: :primary_replica}
end

# app/models/animals_base.rb

# frozen_string_literal: true
class AnimalsBase < ApplicationRecord
  connects_to shards: { 
    default: { writing: :animals, reading: :animals_replica },
    shard1: { writing: :animals_shard1, reading: :animals_shard1_replica }
  }
end

# app/models/cat.rb

# frozen_string_literal: true
class Cat < AnimalsBase 
end

Similar to 6.0, we can then leverage the connected_to method for switching(/establishing) connections to the configured databases.

# some_controller.rb

# frozen_string_literal: true
AnimalsBase.connected_to(shard: :shard1, role: :reading) do
  Cat.all # reads all cats from animals_shard1_replica
end

Native multi DB connection switching and handling would go a long way in helping developers move away from a lot of complex 3rd party gems in favor of out-of-the-box tools. I have already started leveraging this in one of my applications. I will be sharing more on how to build an effective sharding / connection switching strategy on top of what's natively available with Rails in my next post. Thanks for reading and happy holidays!

Distributed request tracing in Rails

Ritikesh — Mon, 07 Sep 2020 17:31:23 +0000

The microservices pattern is a highly debated topic. The pros and cons are heatedly discussed over forums, blogs, podcasts, social media, and literally everywhere else. We'll skip that argument for another day. Let's dive into how we can enable better request tracing in a microservices architecture in a pure Ruby on Rails world. Distributed tracing / debugging is one of the biggest challenges in a microservice architecture.

The X-Request-ID is a standard HTTP header. The header, as defined in the blog post, is :

A unique request ID, represented by a UUID, is generated at each HTTP request received by the platform routing servers. It is added to the request which is passed to your application containers.
If the X-Request-ID header is already defined by the client, it won’t be overridden except if it doesn’t respect the following format:
20-128 alphanumerical characters and the symbols +, =, / and -.

The key point to focus here is:

If the X-Request-ID header is already defined by the client, it won’t be overridden

We will use the same header to our advantage when making calls to all our external microservices.

The ActionDispatch::Request module in rails makes the uuid method available on the request object. We can use this in our controllers:

class ApplicationController < ActionController::Base
  before_action :set_thread_data

  private
  def set_thread_data
    Thread.current[:uuid] = request.uuid
  end
end

We can then leverage this Thread context from the Proxy classes making requests to our microservices.

class ServiceProxy
  attr_reader :headers, :params, :method, :url, :handler

  def initialize(headers:, params:, method:, url:, handler:)
    @headers = headers
    @params = params
    @method = method
    @url = url
    @handler = handler
  end

  def make_request
    circuit.run do
      RestClient::Request.execute(
        method: method, url: url, payload: params, 
        headers: headers, read_timeout: CircuitConstants[handler][:read_timeout],
        open_timeout: CircuitConstants[handler][:open_timeout]
      )
    end
  end

  private
  def circuit
    Circuitbox.circuit(handler, CircuitConstants[handler])
  end

  def headers
    @headers_with_request_id ||= begin
      return @headers unless @headers.is_a?(Hash)
      @headers['X-Request-Id'] = Thread.current[:uuid]
      @headers  
    end
  end
end

All modern web frameworks will respect this header and use it to set the request level UUID. In Rails, this is handled by the ActionDispatch::RequestId middleware.

We should also set the application level tagged logging to make use of these request uuids:

# config/application.rb
config.log_tags = [ :uuid ]

After implementing the above, logs will be tagged to the request uuid and will start looking like the log snippet below:

With the above setup, all requests flowing through all the microservices will have the same request-id set, enabling easy request tracing and in-turn, all application issues, easily debuggable.

Serving private content from S3 using CloudFront

Ritikesh — Thu, 03 Sep 2020 12:52:13 +0000

Freshworks’ IT service management tool, Freshservice, enables organizations to simplify their IT operations. Freshservice provides ITIL-ready components that help administrators manage incidents, problems, changes and releases, and the asset management component helps organizations exercise control over their IT assets.

What happens when your application’s core S3 bucket is marked as dangerous by Google’s safe browsing feature and your web application is displayed as suspicious when accessed on Chrome/Firefox? We will talk about how we reacted to a similar incident that happened with us, and how we reduced the impact it had on our customers. We will also delve into design changes made to further reduce the impact of such events in the future.

Default Attachment Processing

Freshservice is powered by Ruby on Rails. Nginx passenger-backed servers are hosted on the EC2 instances of AWS. The servers are hosted in four AWS data centers — US East (US), Europe Central (EUC), India (IND), and Australia (AU).

Freshservice uses AWS’s S3 (Simple Storage Service) for file storage. S3 + paperclip forms the crux of our attachment storage and processing engine. Customers use this service for various purposes like uploading their logo or favicon for branding, enabling their end users upload avatars, attaching supporting files or images on tickets, uploading the signed contract in the contracts module, etc. Upon upload, the files are scanned for viruses and uploaded to a unique path in our S3 bucket. To ensure our customers’ data is never lost, we also have a DR (Disaster Recovery) setup in place**. The attachments bucket is replicated in near real-time to another bucket in a different region within the same geography.

Attachments are protected by a set of standard security measures. The first check, for example, is tenancy — customers can only access their own attachments. Attachments are always fetched using pre-signed URLs that expire within a short interval, typically 5 minutes, but this interval may vary on a per use-case basis. The URLs generated are of the format:

https://s3.amazonaws.com/bucket_name/path_to_file.extension?signature.

AU and IND regions currently do not have a secondary region for us to have a DR setup. US and EUC have a DR setup in place.

The Unexpected service outage

To ensure secure browsing for its users on the internet, Google safe browsing scans websites for dangerous & deceptive content. When a site is found to be in breach of its guidelines, they are marked unsafe by Google Safe Browsing. A security warning is then displayed on browsers when a user tries to visit the site in question.

Sometime in the middle of January 2020, Google Safe Browsing decided to mark our attachments bucket’s root URL ( https://s3.amazonaws.com/bucket_name ) as unsafe. This rendered the web application unusable as most pages included some form of attachment on the page. This affected all our customers hosted in that region.

First Response

As a first line of defense, to ensure customers are able to login into the system and use it, we temporarily disabled attachment-backed branding like logo and favicon. We also disabled loading user avatars to ensure customers are able to view and process at least the tickets that did not carry any attachments. Unfortunately, a vast majority of tickets had attachments and there was no easy way to support them.

We considered creating a new bucket where all new attachments would be uploaded while syncing older attachment data into this new bucket using S3 sync. This turned out to be a non-starter due to the size of our bucket, as the sync would end up taking days if not weeks. This would be unacceptable for us as our customers would be unable to support their end users without essential information like screen shares or contract documents while the sync completes. We therefore decided to use our DR bucket which was already in sync with the original bucket. The challenge was that the DR bucket was hosted in a different region and accessing it across two AWS regions would introduce an additional network latency overhead to both read and write operations. We were, therefore, uncomfortable going all in with this as our new bucket. Instead, we decided to move the reads alone to the DR bucket, temporarily, until Google addressed the blocking of our bucket caused by false positives in their system.

We quickly prepared code changes to start reading from the DR bucket and deployed them. We also reverted all the previously made temporary changes related to customer branding and user avatars. The application was now fully in use.

After multiple attempts in reaching out to Google to sort the problem from their end, we finally had some luck and were able to get them to mark the bucket URLs as safe. They even guaranteed that this wouldn’t happen in the future.

But we were skeptical enough to decide we did not want to rely on this guarantee. We decided to keep the fallback available in production so we could easily switch back if we were blocked inadvertently again. To achieve this, we moved the "reads-from-dr" logic under a flag stored in Redis. To avoid hitting Redis for each request, we wrapped this around MemoizeUntil with 1-minute refreshes. This would make us future-proof if such an event reoccurred. All we would need to do is to flip the flag in Redis to fall back to reading from the DR bucket.

Second incident

Cut forward to the middle of April 2020 and Google marked the bucket as insecure once again. This time though, we could control the impact duration by toggling the flag in redis and customers were able to access their portal shortly after the problem was detected. We also realised that although this solution worked, there were a few issues, notably:

We support both inline and regular attachments. Regular attachments can afford to have delays in loading as they are not expected to appear immediately. But inline attachments have to appear realtime. The "read-from-dr" solution that we had was "near-real time", not real time. Customers on faster networks would have noticed broken image uploads when, actually, the upload was successful in the backend. This is because the time taken to process and respond to the upload request would be shorter than the S3 sync. Hence the image would not be found when read from the DR bucket right after it was uploaded.
If there happened to be a delay in S3 sync during this period, we would be caught completely off guard and wouldn’t have an alternative solution in place. This would make us too dependent on AWS than we would like to.

The Final Solution

At first, we considered building a reverse proxy solution as a gateway barrier for our S3 accesses. This proxy would be responsible for maintaining all our secure access processes while also reducing the blast radius in case of future incidents. We wanted to host our proxy under attachments.freshservice.com. With our sharded multi-tenant architecture, which serves each tenant under a unique subdomain on our root freshservice.com domain, this isolation would be rather straightforward. Each tenant would have its own subdomain, similar to their existing freshservice subdomain. for attachment processing under the root proxy URL — attachment.freshservice.com

This approach would solve all our problems but add another piece of infrastructure requiring additional provisioning, maintenance and monitoring. We were reluctant to add more infrastructure overhead for this problem and were looking for a readily available service that would solve this for us. That’s when we came across AWS Cloudfront’s signed private access feature. Secure access of attachments through pre-signed auto-expiring URLs was a core design principle in our current setup, giving security the highest priority. The signed private access feature from AWS Cloudfront allowed us to retain that design principle while giving us the leisure of not having to maintain another infrastructure component. We were immediately convinced and decided to go ahead with Cloudfront + S3 as our attachments service provider. We used trusted signers as our default signing strategy for generating signed private URLs. The new URL format for CDN enabled accesses would look like:

https://subdomain.attachments.freshservice.com/path_to_file.extension?signature.

However, with data localisation clauses from different governments, we could not leverage the default features of a CDN like cloudfront. Hence, we had to disable both default and edge caching on the distribution so that the customer’s data would reside in the same region as that of the origin. We can do this by setting minimum, maximum and default TTL to 0.

This would, however, lead to exorbitant CDN access costs. Moreover, we also noticed that during both the aforementioned incidents, there were some suspicious sign ups misusing freshservice for spamming. Google never gave us a valid reason for marking the site as unsafe in both the cases. They just mentioned that this would not happen in the future and that their algorithms were learning and getting better.

Hence, as a caution and to keep costs in check, we came up with a unique strategy to enable the "CDN+S3" approach only for suspicious-looking accounts while keeping the remaining accounts on the default S3 accesses. An account is deemed suspicious depending on various anti-spam measures that we have. Depending on the spam score of an account, its attachments will be served from either S3 directly or through CDN + S3. This is internally controlled via feature flags. We also kept default switches to both disable or enable CDN accesses at application level as well.

This would enable us to control the blast radius to each tenant and ensure we are able to serve our customers without any service interruptions.

Incidents like these, unavoidable to a certain extent, help us display our truly customer-first values that are ingrained in every individual within the organisation. After ensuring that the customers could resume their operations, we did what we do best — make informed engineering decisions to ensure that our customers do not get impacted again in future.

(This post was co-authored by Valarpiraichandran A)

Four Action Mailer features you should know about

Ritikesh — Thu, 13 Aug 2020 08:30:43 +0000

ActionMailer is the default email library that comes with Rails. It has a ton of hidden features that aren’t spoken about or discussed as much as some of its counterparts like ActiveRecord or ActiveSupport. In this article, we will cover some of those features, and understand how to scale and debug email sending better with ActionMailer.

Interceptors

According to the official Rails documentation , "Interceptors allow you to make modifications to emails before they are handed off to the delivery agents. An interceptor class must implement the :delivering_email(message) method, which will be called before the email is sent."

This can be a very powerful hook to help you extract some generic processes or rule out of your mailer or notifier classes. We leverage interceptors in Freshservice to be able to set default mailboxes and the From address that our customers intended to use for sending out emails to their recipients. The following code snippet demonstrates how you can leverage Interceptors to achieve the same:

The set_smtp_settings method retrieves the current tenant's configured mailbox from the thread and assigns it to the mail’s smtp_settings attribute. The mail class internally uses the ‘smtp_settings’ attribute to connect to the mailbox for delivering the email.

This method also takes care of setting all product-specific custom headers required for product functionalities and for the platform scaling. To avoid IP reputation abuse through spamming and a potential service disruption to our premium customers, depending on the state of the current tenant, we also append headers to ensure that the right IP is selected when delivering the corresponding emails.

The fix_encodings method, as the name suggests, fixes any encoding issues that arise out of unsupported user entered data.

Observers

Right back from the official Rails documentation , “Observers give you access to the email message after it has been sent. An observer class must implement the :delivered_email(message) method, which will be called after the email is sent.” Like interceptors, observers offer hooks that can be leveraged for generic processes.

For email debugging purposes, we wanted to print an email’s message-id header tagged along with the default ActionMailer log that prints Sent email to #{recipients_list}. This log is printed by the default LogSubscriber from ActionMailer. But this LogSubscriber did not have access to the mail object. Hence we couldn’t leverage the LogSubscriber for this.

We then decided to use Observers to achieve this. We also had to override the default LogSubscriber’s deliver method to avoid duplicate Sent email to #{recipients_list} log messages.

We can also log other headers from the mail object for analytics or logging purposes.

Alternatively, this can be achieved through ActiveSupport::Notification as well by subscribing to the deliver.action_mailer from ActionMailer.

Perform Deliveries (or not)

According to the Official Rails documentation , perform deliveries determine "whether deliveries are actually carried out when the deliver method is invoked on the Mail message. By default they are, but this can be turned off to help functional testing. If this value is false, deliveries array will not be populated even if the delivery_method is :test."

This option is useful for test environments or your CI pipelines to avoid consuming your email quotas provided by email service providers. This can also be leveraged in development mode to avoid spamming your mailbox with too many emails when developing or testing a feature. The mail content is printed out by default on STDOUT. You can even make this configurable using a thread variable or file in tmp directory. Another use case for setting this to false dynamically is to avoid sending spam emails depending on the tenant that's sending the email or the content of the email.

Here's how you do this using the interceptor method defined above:

Setting this at mail level ensures that other emails are not impacted. This is achievable through ActionMailer callbacks as well.

Callbacks — Have better control over your emails

With Rails 4.0, ActionMailer was introduced to ActiveSupport callback hooks before_action similar to ActionController. According to the changelog: Allow callbacks to be defined in mailers similar to ActionController::Base. You can configure default settings, headers, attachments, delivery settings or change delivery using before_filter, after_filter, etc. Justin S. Leitgeb

This was a great addition to the framework as it allowed for great possibilities. Some of the items discussed in the previous sections are easily achievable via callbacks.

Freshservice is a cloud based SaaS product that implements a multitenant architecture at its core. As a product, we take pride in making it easy to sign up and get started on from the very first day. But this ease comes with a pain of handling excessive spam. The multitenant architecture becomes the victim here as one bad fish can make the entire pond dirty. To ensure our customers aren't impacted by spammed signups, we have several spam filters and blockers at different levels within the system. For example, upon signup, we do a spam lookup for the tenant based on historical pattern and data and set a spam score for it. Depending on the score, access to certain features or channels are blocked. Email being one of the primary channels, we wanted to safeguard our email reputation and avoid emails delivered from Freshservice being marked as spam. This essentially meant that we had to block email sending from the application for spammy tenants. While not enqueuing jobs for these tenants was the easy way to do this, it was getting incredibly hard to ensure that these checks were followed every time we were enqueuing a background job to deliver an email. We missed a few times and that's when we wanted to add another layer of protection — one at Action Mailer level.

The simplest solution was to set perform_deliveries to false from the interceptor or through a before_action defined in a base ApplicationMailer like below:

But we were still processing the mail templates and the entire job just to not deliver the emails. We weren't satisfied and wanted something better. That's when we uncovered a hidden gem that's rarely talked about. We came across that setting , where the response_body in a before_action callback aborts the mailer processing right away.

Rails comes preloaded with a bunch of really powerful frameworks loaded with tons of features. We are constantly evolving our codebase and in turn our product to leverage whatever Rails has to offer to better serve our customers.

Load/Eager Paths in Rails and why you should Care

Ritikesh — Mon, 06 Jul 2020 11:28:43 +0000

Ruby has a way of letting it know where to find classes and modules and other constants - $LOAD_PATH.

Rails adds some of its default directories like app/controllers/*, app/models/*, etc. to this path. It also allows you to append to this path by adding the directory or pattern to the magical configuration autoload_paths in application.rb or any of the environment files like test.rb or production.rb. This setting is often leveraged by gems to add files pertaining to the gem to the $LOAD_PATH for rails/ruby to be able to load them when required or instructed to - autoload vs eager load. There are tons of articles covering the autoload vs eager load part.

Instead I would like to talk about a small issue I noticed when abusing the loadpath, sort of, and why following Rails' conventions become necessary.

Say, you have declared a model in the path app/models/solution/article.rb, and kept app/models in autoload_path, Rails through ActiveSupport#safe_constantize will be able to load Solution::Article class. This is because app/models/solution/ like all other subdirectories are part of the $LOAD_PATH.

When trying to load a top level class called Article, Rails will not be able to find it, because it simply does not exist.

Now, say, if you've added app/models/**/* in the autoload_path instead. This would flatten the paths so that everything is available globally. Now when Solution::Article is tried to be load, it's available like before. But if Article is attempted to be loaded, even that's available as article.rb is in the $LOAD_PATH and Rails will throw up an error saying expected app/models/solution/article.rb to define the class Article. 🤦‍♂️

Hence, it is always advisable to avoid using patterns like **/* for flattening load paths to make class definitions available. They tend to mess up how Rails resolves classes and constants and with Rails moving towards the faster and cleaner Zeitwerk library for auto/eager/reloading of classes and modules, it becomes imperative to follow clean class/namespace definition practises.

Quick hacks to lighten your database loads with minimal code

Ritikesh — Thu, 02 Jul 2020 07:42:09 +0000

[Graduating from startup to scaleup means having to constantly evolve your applications to keep pace with your customers’ growth. In this Freshworks Engineering series, Rails@Scale, we talk about some of the techniques we employ to scale up our products.]

Databases are a core component in most applications. But database loads can be a silent performance killer, causing sluggish application response times and leading to unsatisfied customers. While most database accesses are necessary, some can be avoided. The most obvious example would be ensuring that your application effectively leverages cached data. However, there can be oversights. We identified such missed opportunities and made minimal code changes to reduce a large number of queries fired to the database.

Leveraging the delegation pattern for cached accesses

Delegation is a common software development pattern from the object-oriented programming world. It uses object composition to achieve code reusability similar to the inheritance worlds.

There are some very detailed articles on delegation patterns and how to apply those principles in Ruby/Rails applications. In this blog, we talk about how you can optimize performance with little changes to your existing delegation code.

Freshservice, like other Freshworks products, is a multi-tenant SaaS web application. Each tenant is unique and can have configurations that are specific to them. We store some of these configurations in Redis and some of them in the good old RDBMS. The choice of storage is based on factors such as the requirement of ACID properties or constraints or relationships.

The RDBMS-based configuration table is backed by ActiveRecord Model called TenantConfigs and directly related to our tenant model with the has_one relationship. To reduce the load on the database and for faster access to frequently read data, we use memcached as an LRU cache. The TenantConfigs objects are also cached per tenant for faster access.

The tenant model has a handful of attributes delegated to the TenantConfigs association. We noticed that at least one of these attributes were accessed as part of every web request that was made to the application. The delegation was a good old Ruby delegation defined as:

Every time we did tenant.locale, Rails would fire a DB query to fetch the locale information from the tenant_configs table. Since we were already caching this information, we wanted to leverage it for faster access and reduce the load on the database. To achieve this, we simply replaced the delegatee from tenant_configs to tenant_configs_from_cache.

The delegate method comes from ActiveSupport and requires the delegatee to be a valid method within the scope of the defining class. Hence, there is no change in how the drop-in replacement works.

As expected, the changes gave us a significant drop in the number of queries fired for the tenant_configs model, from almost 1,500 queries per minute (QPM) to near zero. This was achieved by merely appending two words of code.

A reactive approach to Rails’ uniqueness validations

Ruby on Rails allows you to enforce attribute or functional uniqueness at the ORM level through the magical validates_uniqueness_of validation in ActiveRecord. This makes ActiveRecord query check if the database already contains a record for the said attribute(s).

If it does, the validation fails and Rails doesn’t save the record. This is in line with Rails’ way of doing validations and works seamlessly. However, if you google validates_uniqueness_of, you’ll find numerous articles such as this talking about why this validation is not entirely reliable. Briefly speaking, the main pitfalls are:

They are not reliable. Even with the uniqueness checks, there could still be race conditions and an attempt could be made to insert duplicate values in unique columns;
The above scenario would not be handled cleanly when relied upon validates_uniqueness_of and would throw up a 500, even if you used the safer .save;
Depending on the scale, generates way too many SELECT 1 queries.

We were lucky enough to not face the race condition issue in production. However, with growing scale, the number of exists/select 1 queries to our databases was increasing. Hence we decided to move away from uniqueness validations (validates_uniqueness_of) for some of our core models.

Since our database already had unique indexes, all we had to do was remove the validates_uniqueness_of calls in the models. But this approach has a problem. When trying to save duplicates, the database would issue a rollback and Rails would raise an ActiveRecord::RecordNotUnique exception.

Normally, you would expect only methods ending with a ! like save! to raise exceptions. However, this exception gets raised even when using the regular ‘save’ as well. If we were to proceed with the removal of validates_uniqueness_of from our models, our controllers would have to handle this exception specifically.

To solve this problem and to maintain consistency with our current code flows, we ended up writing a gem (record_not_unique) to capture ActiveRecord::RecordNotUnique exceptions at the model level and add a validation error on the associated attributes(s).

The exception would be captured only for exception-safe methods such as save and update_columns. save! would continue to throw exceptions. After replacing validates_uniqueness_of with handle_record_not_unique, we measured the results using our log aggregator.

https://gist.github.com/ritikesh/465b69314bce378a9bedaa17d2ffa8e5

Some of the noticeable results were as follows:

Downtrend on Select 1 queries on the tickets table after changes were deployed to production(1). From 200,000+ queries over the week to zero.

Downtrend on Select 1 queries on the users table after changes were deployed to production(1). From 57,000+ queries over the week to zero.

For other lesser 'written-to' modules such as groups, select 1 queries per week per table reduced by about 20,000.

This was a significant load reduction on our databases without making any changes to our existing code flows.

Bill Gates rightly said, "Measuring programming progress by lines of code is like measuring aircraft building progress by weight". The gains we were able to achieve with the above two techniques with minimal lines (words) of code is testament to that.

On demand debug logging with memoize_until

Ritikesh — Fri, 26 Jun 2020 09:36:00 +0000

Caching

Caching is a commonly used software optimization technique and is employed in all forms of software development, be it web, or mobile, or even desktop. A cache stores the results of an operation for later use. For example, your web browser would use a cache to load this blog faster should you visit again in the future, enabled by the storage of static resources such as .js, .css, and images in your browser’s memory.

The most common reasons for using cache are:

ttl (time to live) — cache data automatically expiring after a specified time interval
Consistency — The data is always the same when read from different processes — multiple app servers or background processes is a norm in today’s cloud-first architectures.

This allows the cache to be fresh — frequently invalidated and refreshed because of the ttl — and Consistent — because it’s a single source of truth. Though caching is a powerful tool, it, usually, is a separate process running on another server accessed by network calls. Cache systems are invariably fast but network calls add bottlenecks to the overall response time. With multiple processes making simultaneous calls over the same network — in a closed vpc setup — the cache would need to scale along with your components to keep up.

Memoization

Memoization is a specific type of caching pattern that is used as a software optimization technique. It involves remembering, or in other words, caching a complex operation’s output in-memory on the machine that’s executing the code. Memoization finds its root word in “memorandum”, which means “to be remembered.”

Memoization has an advantage in having the data cached in-memory on your machine, thereby avoiding network latencies. But you would rarely find memoization, multi-process consistency, and expiration used together.

We wanted a library that could memoize expensive operations like network calls to the database or file stores like S3. The library should support expiring the memoized values from time-to-time and also be consistent across our multiprocess stateless architecture. Looking up “memoize” in rubygems gave us multiple options like memoize_ttl, persistent_memoize, memoize_method, etc. — They either lacked on the consistency front, or the expiration front, or on both. Some added the bottlenecks of writing to disk, which might increase your IOPs and latencies.

MemoizeUntil

Introducing memoize_until. A powerful yet simple memoization library that focuses on the dynamic nature and consistency of all caching systems in a multi-process environment and brings them to the memoization world.
MemoizeUntil memoizes(remembers) values until the beginning of a predetermined time metric — this can be minute, hour, day and even a week.
To begin with, install the gem:

or simply add it to your Gemfile

The public API defines methods for each time interval that is supported by the library. In a Rails environment, MemoizeUntil checks for a YML file defined in config/memoize_until.yml and initialises the application with this set. This becomes a single place to track all your predefined memoization keys(also referred to as a “purpose”) for each interval. Run time purposes are also supported and can be extended through the add_to API.

The store class is responsible for the core memoization logic. MemoizeUntil class creates and maintains a factory constant of store objects for each interval during initialization. These objects are initialized with a nested hash with each purpose as a unique first level key. Upon invocation, MemoizeUntil looks up the factory constant for the interval through the public method definition with the same name, and calls fetch on that store object. The fetch method computes the current moment — if the interval is day, it simply calculates the moment as today’s date — and uses that moment as a subkey in the nested hash. If the interval has a value assigned to it, that value is returned. If the key does not exist, the store clears all previously memoized data to avoid memory bloat, and the original block passed to the method is called and the result is stored for the given moment. MemoizeUntil also handles nils, i.e. nils are also memoized.

Auto fetching of data at the beginning of the pre-specified time metric guarantees consistency across processes.

How we use it

We built MemoizeUntil as a general purpose library and are using it as a general optimisation pattern across our products and platform services. Most common applications of it involve memoizing configurations such as spam thresholds, API limits for services, etc. There are some specific use-cases as well, some of which we will cover in future blog posts.

One such use-case is the need for switching to debug logging on the fly. We default to info-level logging in production to save on log sizes and costs associated with serving those logs to third party and in-house services for debugging and metric purposes. Like with all software, there are issues reported by our customers that are reproducible only in their environments and accounts. To debug such issues, we often need debug logs, which are not available because of the default “info” log level set in our production app. Typically, this would require a production deployment to change the log level and then another deployment to revert back to the older level.

To avoid this friction with deployment dependencies, we wrote middleware for both our app and background workers, which checks for a redis key and changes the log level if the key is set. Sample app middleware is as follows:

To avoid cache calls for every web request, we wanted to memoize this setting in local memory but also frequently check the cache store if it has been updated. This seemed like a perfect use case for using MemoizeUntil. The cached data required refreshing, but not instantly.
The readme covers additional use cases like how to extend MemoizeUntil for runtime keys and values — and more.

Not another cache store

MemoizeUntil is not a replacement for a cache store, it’s merely an optimization technique to reduce network calls to your cache store or database through memoization by guaranteeing consistency. Since everything is stored in-memory, memory constraints on the remote servers also need to be considered — although, thanks to the cloud, this isn’t as big a concern as it once used to be. Also, unlike truly standard memoization libraries that memoize method calls for each unique set of parameters, we have taken a slightly modified approach with purposes. These purposes can be application level(predefined purposes) or tenant level(runtime purposes).

Optimizing string interpolations in Ruby

Ritikesh — Mon, 15 Jun 2020 04:52:27 +0000

[Ruby on Rails is a great web application framework for startups to see their ideas evolve to products quickly. It’s for this reason that most products at Freshworks are built using Ruby on Rails. Moving from startup to scale-up means having to constantly evolve your applications so they can scale up to keep pace with your customers’ growth. In this new Freshworks Engineering series, Rails@Scale, we will talk about some of the techniques and patterns that we employ in our products to tune their performance and help them scale up.]

Freshservice is a cloud-based IT help desk and service management solution that enables organizations to simplify IT operations. Freshservice provides ITIL-ready components that help administrators manage Assets, Incidents, Problems, Change, and Releases. The Asset Management component helps organizations exercise control over their IT assets.

Freshservice is a SaaS product powered by Ruby on Rails. It is backed by a MySQL database for persistence storage and uses Memcached and Redis extensively for caching and config storage purposes. Multi-tenancy is at the core of a SaaS product and tenant isolation is an important factor when dealing with data. To ensure that, the keys representing data in the cache stores are always suffixed with each tenant’s unique ID.

We benchmarked the idea of changing the way we build cache keys for getting and setting data on both Memcached and Redis. We noticed that for every cached object that belonged to some entity (or entities) like a tenant or a user, we were creating hash objects to build a unique key representing that entity in the cache store.

This is a fairly common practice of building a string in Ruby (the cache key in our case):

The above snippet creates a new hash object unnecessarily. Since the end goal is to have an interpolated string, we can skip the hash creation part entirely. With the alternate approach, we could just interpolate the required constant with the dynamic value. This value could be anything that responds to to_s.

We introduced a #key method on the module, which would take a symbol and a value as params and returns an interpolated string. The resulting key that was generated was the same with both approaches, hence the existing cached objects wouldn’t be affected by this change.

We ran a benchmarking exercise to check if there was any performance impact of this activity. The benchmarking results were as follows:

Benchmarks we ran

Note: The above benchmarks were run on our current production versions of Ruby/Rails (2.3.7 / 4.2.11.1)

Currently, we’re supporting up to 2 dynamic values via 2 different methods (MemcacheKeys#key, MemcacheKeys#multi_key). Making the method accept a dynamic number of values would create an array on each invocation via the Ruby splat operation and would defeat the whole purpose.

After shipping out the key generation changes over the span of a month and a half, we have noticed significant improvements in object allocations and GC frequencies.

The above approach also reduces class constant footprints. We had previously included the module in numerous classes and modules with an include MemcacheKeys just to access a constant for building the required key. Even though constants are included via reference and not duplicated, the class still has to maintain a reference to it. You can check the module#constants method. After the changes, key constants no longer need to be referenced in multiple places.

How to support utf8 characters in a utf8 mysql table

Ritikesh — Wed, 20 Nov 2019 07:08:38 +0000

Originally published on Freshworks’ official Blog on November 15, 2019

Freshworks’ IT service management tool Freshservice enables organizations to simplify IT operations. Freshservice provides ITIL-ready components that help administrators manage assets, incidents, problems, change, and releases, and theasset management component helps organizations exercise control over their IT assets.

Freshservice is powered by Ruby on Rails.

Nginx passenger-backed servers are hosted on the EC2 instances of AWS. The servers are hosted on four data centers — US East (US), Europe Central (EUC), India (IND), and Australia (AU).

UTF8 in MySQL

Our databases and the underlying tables were all created with utf8 encoding to ensure that a majority of characters are supported at the database level as well. However, MySQL’s “utf8” encoding only supports three bytes per character. The real UTF-8 encoding needs up to four bytes per character. This bug was never fixed. A workaround was released in 2010: a new character set called “utf8mb4”. For more on the MySQL and UTF8 story, you can read “In MySQL never use utf8” blog post.

Emojis in Freshservice — the Rails 3 way

Emojis are valid UTF8 characters and require 4 bytes to be stored and retrieved. However, with the above-specified limitation from MySQL’s native UTF8 encoding, we could not natively support all emojis. To process them, we were relying on an open-source gem called gemoji-parser. The gem allowed serialization of the emoji content by tokenizing emojis into an associated textual representation and de-tokenizing them later. To safely create tickets that contained emoji content, we parsed the email contents to detect emojis tokenized every ticket’s description and notes’ content before saving. We did not de-tokenize emojis when displaying ticket information (for performance reasons and because customers didn’t mind it); therefore, the tickets contained the emojis in their textual representation. For example, “😀” was rendered as “😀”.

While things worked with our set up on Rails 3, there were some known challenges:

The gem was severely outdated. A lot of emojis didn’t work well with the gem and were saved in an unreadable format (such as, ‘\xF0\x9F\x98\x81…’), causing the ticket’s layout to be broken. This required us to fix it manually by going over the ticket’s description and cleaning up unreadable text/HTML parts. Patching or updating the gem regularly with new emojis wasn’t a feasible long term solution.
There was a performance overhead to tokenize email content, ticket descriptions, and note content.
There was a possible future performance overhead if we rendered emojis as-is by de-tokenizing the saved subject and description for every ticket in the ticket list and details page views.

Emojis in Rails 4

The Rails 4 framework had internally updated the MySQL session variables and enforced Rails-MySQL to run in strict mode. For information on why Rails enforced this, see “Use strict mode in mysql”. The tables were created with a utf8 charset and Unicode collation and MySQL raised errors (Incorrect string value: ‘\xF0\x9F\x98\x81…’ for column description at row 1) when any invalid content (in this case, an emoji) was inserted into the tables. This was an indispensable concern for the Freshservice team because any content containing emojis would be discarded by the system. We decided to explore the possible solutions to this problem.

There were a couple of ways to address this. The first option was to convert all the rich-text fields into “serializable fields” in Rails. Rails internally uses YAML for serializing data and the default YAML parsers were capable of serializing emojis as well. However, this had a few concerns. Changing the fields to “serialized” would require us to run a data correction script for all our existing records as they would not be parseable as YAML. Secondly, we were simply moving the serialisation-deserialisation overhead from database writes to database reads, which was much worse.

The other option was to make Rails and MySQL run completely with the utf8mb4 encoding set. However, we have hundreds of tables across shards on multiple data centers and migrating all the tables wasn’t feasible considering the scale at which Freshservice operates. Also, utf8mb4 (4 bytes) columns occupy more space than the traditional utf8 (3 bytes) columns, and database indexes have a length limit of 767 bytes. This meant that any table having indexes with varchar(255) columns would cause problems because the index that previously occupied 255*3 = 765 bytes would now occupy 255*4 = 1020 bytes. This implied that we would have to change the columns’ length in the index (not a feasible solution for all our use cases) or update the innodb large_prefix value and reindex all our data. Updating indexes, either way, would have taken far too much time and wasn’t worth the effort.

We decided to test a more reactive approach. We evaluated running Rails in utf8mb4 mode without migrating any database columns. This meant that we could set Rails level encoding to utf8mb4 and migrate only specific columns of some of our tables to accept utf8mb4 content and extend the behavior to other modules or tables in the future.

To set Rails to run in utf8mb4, we must update the encoding key in database.yml as follows:

Most of our apps’ emoji content comes from two sources: email and mobile. While mobile as a channel is still under utilised, the same cannot be said for email. The email channel is primarily used to create tickets and add notes to existing tickets. Emails are also used to create solution articles (as a draft) and add notes to other modules such as Change, Problem, and Release. For the first cut, we decided to support emojis for tickets and solution articles (higher chances of adding rich content). We ran a benchmarking exercise to check if there was any performance impact of this activity. The benchmarking results were as follows:

As one can see, there was very little difference in terms of performance (~ +0.5ms/ticket for a ~20KB string as the ticket’s description) between the various scenarios. Even though benchmarks are just a means of approximation, this approximation was good enough for us to continue.

The migration

We knew that the migration would take a while because it involved tickets and notes (some of our biggest tables in MySQL), and chose to do it during the holiday season.

On Christmas eve, we began altering the tables in the EUC data center first. We used LHM (Large Hadron Migration — a tool that allows us to perform database migrations online while the system is live, without locking the tables) because the production environment had significantly large data sizes. We also ensured that all our databases had significant space available as LHM duplicates entire tables during the migration and we were dealing with large tables. We finished EUC, IND, and AU data centers around Christmas eve and started planning for the US data center migration, which was the most complex one. Before beginning with the US data center migration, we enabled utf8mb4 on Rails in all the previously migrated data centers and tested to ensure that there were no production-only surprises(😉) in store for us. After this, we began running migrations in the US data center on the weekend before the new year (as traffic was at its lowest) and finished the entire process around midnight of Jan 02.

Considering the load and size of the data involved, we had to continuously monitor the migrations and handle scenarios that were expected to happen only in production. For instance, LHM usually does a full scan of data from min(PKEY) — max(PKEY). Our database is based on a sharded multitenant architecture where a tenant and all its data reside in a single database shard. Each tenant has unique IDs for all entities across all shards. Some of the new shards had some stale database entries with min(PKEY) as 1 and max(PKEY) in billions. When we began noticing that some of our shards were taking longer to process, we investigated the issue and found the stale entries causing unnecessary scans for millions of records that weren’t even present in the system. We did a full clean up of such stale entries and this helped to reduce the LHM scan durations significantly.

After a 10-day effort, emojis support was successfully rolled out on all data centers for tickets (subject and description), notes (body), and solution articles (title and description).

Post deployment, there was a visible impact on garbage collection (GC) and object allocations, as a result of removing the costly tokenizations for every save or update.

GC analysis

GC analysis

Object allocation analysis

Object allocation analysis

Key takeaways

Understanding benchmarks: benchmarks should just be used for references. We should not rely too much on them, especially when working with large scale migrations. Our benchmarking exercise indicated an increase in the processing time but ultimately it was not the case. There was a significant boost in performance due to the reduction in GC cycles and object allocations.
Understanding production migrations: They are very complex and should be done with great care, especially when dealing with customer data. There is always that one surprise awaiting to show up only in production.
Solving problems the hard (read clean) way comes with its own advantages.

PS:

Spread the love; start creating tickets on Freshservice with emojis.😎

Why just cache when you can memoize(with expiration and guaranteed consistency)

Ritikesh — Sat, 09 Mar 2019 21:23:36 +0000

Why just cache when you can memoize(with expiration and consistency)

Google images, WP rocket

Memoization is a specific type of caching that is used as a software optimisation technique.

Caching is a commonly used software optimisation technique and is employed in all forms of software development, be it web or mobile or even desktop. A cache stores the results of an operation for later use. For example, your web browser will most likely use a cache to load this blog faster if you visit it again in the future.

So, when I talk about memoization, I am talking about remembering or caching a complex operations’ output in-memory. Memoization finds its root word in “memorandum”, which means “to be remembered.”

While caching is powerful, it usually is another process running on some other server bound by the network calls. Cache systems are invariably fast but network calls add bottlenecks to the overall response times. Add multiple processes making simultaneous calls over the same network — in a closed vpc setup — and the cache would need to scale as your components to keep up. Memoization has an advantage in this aspect where the data is cached in-memory, thereby avoiding the network latencies.

The most powerful aspects of preferring to use cache are:

ttl (time to live) — cache data automatically expiring after a pre-specified time interval
The data is always same when read from different processes — multiple app servers or background processes is a norm in today’s cloud-first architectures.

This allows the cache to be fresh — frequently invalidated and refreshed because of the ttl — and consistent — because it’s a single source of truth. However, the same is not true for memoization and you would barely find memoization, multi-process consistency and expiration used together.

In this blog however, you’ll see how and when to wield these simple but powerful techniques together, to optimise your own programs and make them run much faster in some cases.

Introducing memoize_until. A powerful yet simple memoization technique that focusses on the dynamic nature and consistency of all caching systems in a multi-process environment and brings that to the memoization world.

MemoizeUntil memoizes(remembers) values until the beginning of a predetermined time metric — this can be minute, hour, day and even a week. The store upon expiry auto-purges previous data — to avoid memory bloat — and refreshes the data by requesting the origin. Since the process auto-fetches data at the beginning of the pre-defined time metric, it is guaranteed to be consistent across processes.

To begin with, simply, install the package via npm:

npm install memoize_until

Then require the module and initialise it with your use-cases and use it where required.

const MemoizeUntil = require('memoize_until').MemoizeUntil

MemoizeUntil.init({ 
 day: ['custom1', 'custom2']
})

MemoizeUntil.fetch('min', 'default', () => { 
 return 'SomeComplexOperation'; 
})

For a simple example, let’s consider your production-ready app has a public facing API and you want to implement a FUP(fair usage policy) and hence set appropriate rate limiting. But you could almost foresee some of your customers complaining and wanting an increased API limit every now and then. This requires your API limit to be dynamic.

Traditionally, developers would save this as a configuration in the configuration database and load it once per request. But over time, such configurations have moved on to be retained in cache stores like redis which are traditionally very fast but the network latencies remain. To avoid cache calls for every web request, you would want to memoize the API limit locally and make use of it for every request but also frequently check the cache store if it has been updated. This seems like a perfect use-case for using memoize_until. The cached data needs refreshing, but not instantly. Sample usage can be found in this gist:

The readme covers extra documentation like how to extend memoize_until for truly dynamic behaviours — dynamic keys and values — and more.

Note: memoize_until is not a replacement for a cache store, it’s merely an optimisation technique to reduce network calls to your cache store or database through memoization by guaranteeing consistency. Since everything is stored in-memory, memory constraints on the remote servers also needs to be considered — although, thanks to the cloud, this isn’t as big a concern as it once used to be.