Debugging is not just a tedious frustrating activity, but can be also an intriguing and creative (though still a bit frustrating) hunt!
This is a summary of what happened last week, which I want to document for historical reasons ( and might help someone else struggling with a similar situation)
The issue
Some days ago, some of the dependencies on one project were updated: they were just patches or minor versions with no breaking changes, our code compiled, tests passed, and QA on the deployed application was also successful.
Though, we started experiencing a very confusing bug.
When the app was running locally (React Frontend on port 3000 and Serverless backend on port 3001) some endpoints started giving an error we had never seen before, coming straight from the database ORM we are using:
Sequelize: QueryGenerator - Unknown structure passed to order / group: Literal { val: ' ASC' } (data: null)}
We immediately check the dependencies update and indeed Sequelize had been updated, then we also found an issue on their Github and some questions on StackOverflow mentioning a similar error.
Nevertheless, reverting to the previous working had no effect at all. The error was still happening and we had no clue what other dependency could have something to do with an error so deep in Sequelize codebase.
The most confusing part was that everything was working everywhere else outside the combination between localhost frontend and localhost backend.
As I said. Deployed frontend and backend were perfectly working.
And the integration tests too.
And so the localhost endpoint which I hit with Postman or Curl.
Only the requests coming from the application were failing, and not all the time, like 80% of the time, and if clicked on the failing request to open it in another browser window... it worked!
Gradually, after spying and comparing the requests on Postman, after redirecting local to deployed URLs and vice-versa, and having check environment variables everywhere, the only possible candidate had to do Serverless offline, which was also updated from 5.x to 6 (We overlooked that, because we already manually fixed all the breaking changes listed in their changelog and everything was fine when testing via postman ).
Something else I noticed in the Network Dev Tools where that some requests to our localhost coming were cached and some other not. Then I realized that some minor changes in serverless offline could have been in caching...
I run sls offline --help
and noticed a CLI option:
--allowCache: Allows the code of lambda functions to cache if supported.
I quickly tried adding that to the start localhost server... and everything worked!!!
Problem solved?
I started googling about serverless offline and allowCache and weird errors and I found many seemingly unrelated errors, from AWS SDK to Apollo Server and they were all related to this flag which was changed recently to allow hot module reloading.
And in fact, by adding --allowCache
, the lambda code was cached, and the problem was solved... BUT any change to the backend code was not reloaded, and in order to locally test our changes, we would have to kill and restart the server every time. No Way!
After reading some more I found another interesting CLI option:
--useChildProcesses: Run handlers in a child process
Starting serverless offline with --useChildProcesses
instead of --allowCache
was also fixing the problem, and allowing hot module reload!!!
There is still a caveat though.
useChildProcess changes slightly the behavior of the emulation because by starting a new process at every request, whatever you are storing outside your lambda handler is NOT reused ( that means variables like DB connections or flags and markers). It is therefore slightly slower, and most importantly potentially different from real production scenarios ( where indeed some variables are on purpose persisted between invocations)
But what did cause the problem?
My guess is that probably serverless offline reloads some packages, but not all of them, due to the caching mechanism they have in place, and this causes issues in the case of Singletons or Global variables.
How that translated to the specific error I don't know, because debugging the data arriving in the handler didn't help, and once the data was passed to Sequelize to interact with the database, what I could see was just that some of their instances/variables were undefined. The way modules are required and the way modules are cached and hot reloaded messes up with whatever has to persist between runs.
A solution with a compromise
So after being misled for a while into debugging our code, our data and reverting changes to Sequelize, and investigating their docs and codebase, we finally found out that a simple CLI option in Serverless offline would solve the issue...
Still, it does not entirely, so if you happen to have a similar problem keep in mind that:
-
allowCache
solves the error with the Singletons/Global variables, but HotReload stops working. On the other hand, the behavior of your endpoints will be much closer to the real-world scenario. In AWS Lambda containers,e whatever we store/instantiate outside the handler is being persisted (for a while, until the container itself is shut down). This behavior is handy but also tricky, quoting the docs:
Take advantage of execution environment reuse to improve the performance of your function. Initialize SDK clients and database connections outside of the function handler, and cache static assets locally in the /tmp directory. Subsequent invocations processed by the same instance of your function can reuse these resources. This saves cost by reducing function run time.
To avoid potential data leaks across invocations, donβt use the execution environment to store user data, events, or other information with security implications. If your function relies on a mutable state that canβt be stored in memory within the handler, consider creating a separate function or separate versions of a function for each user.
Check here for more information about it.
-
useChildProcesses
fixes the error, allows module reloading, but the behavior of the emulation is not so close to reality, because every invocation starts a new child process, so kind of a brand new container, like when you have multiple concurrent requests to your Lambda.
In our script in package.json to start local development and testing npm run start:offline
we decided to keep --useChildProcesses
because hot reloading is more important to ease the development process and reduce feedback loop in coding - testing.
But we will keep in mind, document, and likely create a separate script to start:offline with allowCache
if we need to test the real behavior ( or if we are coding/testing the frontend and don't need hot module reloading for the backend code).
I hope it helps
Photo by Agence Olloweb on Unsplash
Top comments (0)