Jayesh Shinde

Posted on Oct 15

Reusing HTTP and SDK clients in AWS Lambda to avoid “too many open files” (FD) errors

#serverless #lambda #aws #node

TL;DR

We hit sporadic network errors in a high-throughput Lambda that made HTTP calls (Axios) and AWS SDK calls.
Root cause: creating new HTTP clients/agents per invocation ballooned the number of open sockets (file descriptors).
Fix: initialize clients and their https.Agent once at module scope with keep-alive and reuse them across warm invocations. For AWS SDK v2, also set AWS_NODEJS_CONNECTION_REUSE_ENABLED=1.

The scenario

We had a Lambda that was invoked asynchronously to process a large dataset (thousands of events). Inside the handler, we created an Axios client and AWS SDK client(s) for each invocation. Under sustained concurrency, we started seeing intermittent network failures.

Symptoms we saw

These popped up in CloudWatch logs while the Lambda was busy:

“too many open files” errors:
- Error: EMFILE: too many open files, open
- NodeError: getaddrinfo ENFILE
Connection instability:
- AxiosError: socket hang up
- Error: read ECONNRESET
- Error: connect ECONNRESET
Occasional timeouts and throttling-like behavior despite healthy downstream services

These were worse during bursts when many async invocations overlapped.

What’s really happening (FDs and sockets in Lambda)

Every TCP connection (HTTP/HTTPS) consumes a file descriptor (FD).
Lambda execution environments have a relatively low per-process FD limit (commonly around 1024).
If you create a new HTTP client (and thus a new https.Agent) per invocation, each agent can open many sockets. Under high concurrency, you exhaust FDs, leading to the errors above.
Lambda reuses the same execution environment for multiple “warm” invocations. Objects created at module scope are kept alive and reused, which is exactly what we want for clients and connection pools.

Why Node’s `https.Agent` matters

The agent controls connection pooling and keep-alive.
Creating a new agent per invocation increases the number of socket pools and the total sockets in use.
Reusing a single agent keeps the number of open sockets bounded and allows connection reuse across requests, reducing FD pressure and latency.

The anti-pattern (what we had)

Creating new clients and agents inside the handler:

import axios from 'axios';
import https from 'https';

// Anti-pattern: runs on every invocation
export const handler = async () => {
  const ax = axios.create({
    httpsAgent: new https.Agent(), // new agent each time
  });

  const resp = await ax.get('https://api.example.com/data');
  return resp.data;
};

Same issue with AWS SDK if you new a client per invocation, especially if you also create its own agent.

The fix (module-level reuse with keep-alive)

Move client and agent creation to module scope so they’re created once per warm environment and then reused.

Axios

import axios from 'axios';
import https from 'https';

const httpsAgent = new https.Agent({
  keepAlive: true,
  maxSockets: 60,         // tune based on expected concurrency per environment
  maxFreeSockets: 10,
  timeout: 30_000,        // socket idle timeout
  freeSocketTimeout: 30_000,
});

const ax = axios.create({
  headers: { 'Content-Type': 'application/json' },
  httpsAgent,
});

export const handler = async () => {
  const resp = await ax.get('https://api.example.com/data');
  return resp.data;
};

AWS SDK v3

import https from 'https';
import { NodeHttpHandler } from '@aws-sdk/node-http-handler';
import { S3Client } from '@aws-sdk/client-s3';

const httpsAgent = new https.Agent({
  keepAlive: true,
  maxSockets: 60,
  maxFreeSockets: 10,
  timeout: 30_000,
  freeSocketTimeout: 30_000,
});

const s3 = new S3Client({
  region: process.env.AWS_REGION,
  requestHandler: new NodeHttpHandler({
    httpsAgent,
    connectionTimeout: 3_000,
    socketTimeout: 30_000,
  }),
});

export const handler = async () => {
  const out = await s3.listBuckets({});
  return out;
};

AWS SDK v2

Reuse clients, and enable connection reuse via env var.

import https from 'https';
import AWS from 'aws-sdk';

// Also set in Lambda env: AWS_NODEJS_CONNECTION_REUSE_ENABLED=1
AWS.config.update({
  region: process.env.AWS_REGION,
  httpOptions: { agent: new https.Agent({ keepAlive: true, maxSockets: 60 }) },
});

const s3 = new AWS.S3();

export const handler = async () => {
  const out = await s3.listBuckets().promise();
  return out;
};

Results after the change

FD-related errors (EMFILE, ENFILE, socket hang ups) disappeared under the same workload.
Lower p95 latency due to connection reuse.
Fewer outbound connection spikes visible on NAT Gateway/ENI metrics (for VPC Lambdas).
More predictable behavior during bursts.

Bonus mitigations

Concurrency control: use SQS with a sane maxConcurrency/batchSize, reserved concurrency, or step-wise throttling to prevent bursts from scaling FD usage across many environments at once.
Timeouts and retries: set realistic timeouts; add backoff with jitter to avoid synchronized retries.
context.callbackWaitsForEmptyEventLoop = false: can help the handler return even if the agent keeps idle sockets open (don’t overuse).
Consider undici for HTTP in Node 18+; it provides efficient HTTP/1.1 keep-alive by default.

Quick checklist

Initialize HTTP clients and SDK clients at module scope.
Use a shared https.Agent with keepAlive: true; set maxSockets, maxFreeSockets, and timeouts.
For AWS SDK v2, set AWS_NODEJS_CONNECTION_REUSE_ENABLED=1.
Avoid creating clients/agents inside loops or inside the handler.
Monitor and tune under realistic concurrency.

Closing thoughts

FD exhaustion is easy to miss until traffic scales. In serverless, the simplest lever is to reuse resources across warm invocations. One shared agent + one shared client per execution environment eliminates a whole class of flaky, intermittent network issues.

DEV Community

Reusing HTTP and SDK clients in AWS Lambda to avoid “too many open files” (FD) errors

TL;DR

The scenario

Symptoms we saw

What’s really happening (FDs and sockets in Lambda)

Why Node’s `https.Agent` matters

The anti-pattern (what we had)

The fix (module-level reuse with keep-alive)

Axios

AWS SDK v3

AWS SDK v2

Results after the change

Bonus mitigations

Quick checklist

Closing thoughts

Top comments (0)

TL;DR

The scenario

Symptoms we saw

What’s really happening (FDs and sockets in Lambda)

Why Node’s https.Agent matters

The anti-pattern (what we had)

The fix (module-level reuse with keep-alive)

Axios

AWS SDK v3

AWS SDK v2

Results after the change

Bonus mitigations

Quick checklist

Closing thoughts

Why Node’s `https.Agent` matters