DEV Community

David Tchekachev
David Tchekachev

Posted on

Why We Moved API Authentication from the Gateway to Our Microservices

Moving our API authentication and authorization from gateway to microservice layer

TL;DR

There are various systems design to check if an API call can be processed, we recently moved that logic from our API gateway (aka APIGW) to our microservices.
The main reason being that our APIGW (kong) was facing some performance issues during auth validation, and so creating a bottleneck at the entry point of our infrastructure.

After moving that logic into our microservices, our APIGW is now only focusing on routing the requests (which it does very well), while our APIs are scaling independently based on the load they receive.

IVAO API Architecture Change

A bit of context

Preamble

At IVAO, we built and operate a set of 15 microservice APIs written in NestJS, running on Kubernetes, served by kong as our API Gateway.

Those API services serve over 200 separate endpoints under api.ivao.aero, which require some form of authentication (API Key or JWT Bearer token). Authentication and generic authorization checks were handled by kong, thanks to a Kong custom plugin written in Go, passing headers with needed information to upstream services. The configuration was done using Kong CRD KongPlugin

Here are the most common access rules we have on our API endpoints:

  • Is the request coming from a user's browser or an autonomous application/bot ?
  • Is the request coming from an official IVAO website or from a 3rd-party ?
  • Is the request coming from a staff user ?
  • Is the request coming from a non-suspended user or application ?

One important information to keep in mind about the control of how our APIs are used: we encourage all IVAO users to build on top of them! Our goal is to expose as much as technically and legally possible for users to build the tools they want to. We even have a website (developers.ivao.aero) where users can generate API Keys and OAuth2 credentials and start querying our APIs within minutes!

IVAO Developers Website

The "issues" we were facing

The main issue we were facing was was the plugin’s ability to handle high load. During peak times (typically weekends at IVAO), Kong was processing over 500 HTTP requests per second and often crashed or dropped requests without clear indication of why.. Our main suspect was the custom plugin, based on similar GitHub issues and strange CPU/memory usage reported by its process. We tried scaling Kong from 7 pods up to 15 pods, some of them were still failing...

Our second issue impacted mostly our Web development team as the auth layer wasn't directly in the code they were working on, nor was it in the same language (Go instead of NodeJS), nor could it support very specific rules per endpoint.

Possible solutions ?

Our first and easiest attempt was to optimize options passed to Kong to operate with better performance, as well as review the plugin’s code in depth to fix some memory leaks with database sockets left open. That didn’t help much.

Then we started looking into Kong alternatives, and found multiple options: Envoy, Traefik, etc... But migrating from Kong plugin into those would be a bit painful and wasn't guaranteed to be smooth as we use a single domain and it can point to only one of those Ingress controllers.

Our 3rd option was to use a simple NestJS middleware that would do the same checks the Kong plugin was doing. This solution would make the auth process more maintainable and flexible, while being directly integrated into the NestJS request lifecycle, our internal libs (error messages, logging, caching, database models, etc...). We went with that solution.

Progressive migration

As mentioned in the preamble, we have around 15 separate API codebases, dedicated to various business objectives. Copy-pasting the same code in each of them would be a terrible idea to have a maintainable solution over time.

Luckily, we already had a shared internal library (common-api) that provides utilities, functions, configurations, and plugins reusable across all services.

For example, here is our common NestJS boilerplate code:
  const app = await NestFactory.create<NestFastifyApplication>(
    TopLevelModule,
    new FastifyAdapter({ ignoreTrailingSlash: true }),
    {
      cors: {
        origin: '*', // Allow all origins for CORS
        methods: ['GET', 'POST', 'PUT', 'DELETE', 'PATCH', 'OPTIONS'], // Allow these HTTP methods
        preflightContinue: false, // Do not pass the preflight response to the next handler
        optionsSuccessStatus: 204, // Use 204 for successful OPTIONS requests
      },
      bufferLogs: true,
    },
  );
  const logger = app.get(Logger);

  app.setGlobalPrefix(options.prefix);
  app.useGlobalPipes(new ValidationPipe(TransformationOptions));
  app.useGlobalFilters(new ExceptionsFilter(app.get(HttpAdapterHost)));
  app.useGlobalInterceptors(new AuroraInterceptor());
  app.useLogger(logger);

  // UPDATE: Disabled compression due to RAM usage and CPU load it creates
  // Compression should be registered before static plugin
  // https://dev.to/ivao/why-we-switched-from-nestjs-to-rust-over-one-hidden-and-costly-setting-d9f
  // await app.register<FastifyCompressOptions>(compression, {
  //   // Disable `br` encoding as it takes ages to compress whazzup and other huge response payloads
  //   encodings: ['deflate', 'gzip', 'identity'],
  // });

  await app.register(fastifyStaticPlugin, {
    root: process.env.STATIC_FILES_ROOT_PATH || '/',
  });

  // Backwards compatibility with express for @nestjs/passport https://github.com/nestjs/passport/issues/60
  // Source: https://github.com/nestjs/nest/issues/5702#issuecomment-979893525
  app
    .getHttpAdapter()
    .getInstance()
    .addHook('onRequest', (request, reply, done) => {
      reply.setHeader = function (key, value) {
        return this.raw.setHeader(key, value);
      };
      reply.end = function () {
        this.raw.end();
      };
      request.res = reply;
      request.raw.query = request.query || {};
      done();
    });

  let swaggerOptions = new DocumentBuilder()
    .setTitle(options.title)
    .setDescription(options.description)
    .setVersion(options.version)
    .setContact(
      'IVAO Web Services',
      'https://wiki.ivao.aero/en/home/devops/api/documentation-v2',
      'web@ivao.aero',
    );

  app.useGlobalGuards(new AuthGuard(app.get(Reflector)));

  swaggerOptions = swaggerOptions
    .addBearerAuth(
      {
        type: 'openIdConnect',
        openIdConnectUrl: 'https://api.ivao.aero/.well-known/openid-configuration',
      },
      'oauth2',
    )
    .addSecurity('consumer', {
      type: 'apiKey',
      scheme: 'https',
      in: 'header',
      name: 'apiKey',
    });

  const document = SwaggerModule.createDocument(app, swaggerOptions.build());
  SwaggerModule.setup(`docs/${options.name}`, app, document);

  await enableHealthChecks(app);

  process.on('uncaughtException', (error: Error) => {
    logger.error(error);
  });

  process.on('unhandledRejection', (reason, promise) => {
    promise.catch((err: Error) => {
      logger.error(err);
    });
  });

  process.on('rejectionHandled', (promise) => {
    promise.catch((err: Error) => {
      logger.error(err);
    });
  });

  await app.listen(3000, '0.0.0.0');
  return app;
Enter fullscreen mode Exit fullscreen mode

The implementation was pretty straightforward thanks to NestJS framework.
We added 2 new NestJS middlewares, enabled by default in a non-blocking mode, just adding metadata into the request object passed along:

  • API Key middleware: Checks for an API Key passed as Header or Query Param, ensure it's valid, and loads associated application's details.
  • JWT middleware: Checks for a JWT Bearer token present in the Authorization header, verifies the signature as being issued by IVAO's OAuth service, decodes the content before passing it down the NestJS request lifecycle.

Now that we authenticated the request, we need to authorize it by blocking forbidden requests. For this we have added a series of NestJS guards:

  • User Authentication: Only JWT tokens issued to browser users by our SSO are allowed. Ensures this isn't a bot/application making the request.
  • Application Authentication: Only API Keys or OAuth2 Client Credentials authentications are allowed. Usually bots and autonomous applications, or internal tools
  • Official applications: Only IVAO-issued applications are allowed. Especially for sensitive actions and data that we don't want to grant 3rd-party access to
  • Required OAuth2 Permission: Requires the user to have a specific permission based on their staff position within IVAO.
  • Required OAuth2 Scope: Ensures the user has granted consent to the application before making the given request. Such as performing actions on their behalf or retrieve personal data.

Thanks to those middlewares being enabled across all our APIs, we can now decorate each endpoint with the guard we want, instead of having a separate config file related to k8s instead of the code itself.

Did it work ?

Code-wise, the migration was pretty smooth and we were able to quickly decommission our custom plugin, and let each API do the auth checks while having the most up-to-date code.

However, we did have to increase the resources on some of the API pods due to this additional logic being added on their pods instead of the central Kong deployment.

Ever since that migration, we didn't experience any issues with kong, which confirmed that moving the auth logic away was the right call.

Top comments (0)