I wish I knew how to use MongoDB connection in AWS Lambda

#serverless #mongodb #node #aws

A few weeks ago something strange began to happen. Random Lambda functions from ALL environments throwing errors. My face turned white.

MongoError: no connection available at nextFunction (/var/task/node_modules/mongodb-core/lib/cursor.js:540:25) at Cursor.next

No clear pattern. Same code for mongo connection worked fine for 11 months and broke at 1 day. 0.0004% of invocations generated by 270 Æ› functions in node.js on us-east-1 randomly lose db connection.

To give you some context, the difference between Docker and Lambda here is in the way the latter handles environment state.

You’ve been taught to establish db connection on app startup. But serverless function runs fresh every time.

When function returnsâ€Š–â€Šall background processes are freezed.

Lather function startsâ€Š–â€Šthey’re resumed, db object is waiting for reuse.

This worked for a while, until it stopped. Even functions deployed months ago were infected. Meanwhile a fleet of microservices inside Docker containers sailing along w/o any issues whatsoever. Looks like a debug hell, huh?

I end up with three suspects: me the developer, MongoDB and AWS. Of course it’s not me, I write perfect code with zero bugs. Sometimes. Never.

Mongo

The most obvious. Why do you fail? Maybe you can’t handle the load? Are you web scale* at all??

Joking aside, load chart looked ok. Moderate system usage. Still that didn’t stop me from bothering my hosting provider.

Even Dave confirms Mongo load chart is flat as the Earth. Moving on.

Node

Since you cannot debug Æ› functions, the only way to have insight on the system is using AWS X-Ray. Sort of Zipkin for Lambdas.

It is incredibly useful, but traces only calls to aws sdk and outgoing http endpoints. For mongo calls you need to write some custom code and it is visible only inside a trace details, not on the map above.

Stacktrace of mongodb-core/lib/cursor.js:540:25 led me to some very recent code in mongodb-core driver package. Committed on Sep 9, released on Oct 10. Week or two for me to update npm dependencies, and bingo. Exact time when the error started to pop up in logs.

Turns out, at the same time the commit author wrote an article Managing Connections with the MongoDB Node.js Driver. Insightful dive into nitty-gritty of reconnection mechanism ðŸ‘ðŸ»

It gave me an idea to listen on reconnectFailed event to print logs and fail loudly. While waiting for a random error to occur again I went to RTFM.