Unpacking my head

#php #database #devops

In a Cordova application it seems like pretty often our error reporting system is complaining about an HTTP Error 0. I've pretty much always seen now that HTTP 0 errors are usually related to some sort of CORS problem. Or perhaps the system couldn't connect. That is, maybe the application was trying to issue a request while there was a disconnection of the network and when it came back up, it decided to report the error that was cached.

But then I came across this answer on SE: https://stackoverflow.com/a/26451773/77209

That answer goes way more in depth into why CORS-related errors are technically status = 0. It's essentially because it's a problem before the request gets made.

So, I recently ran into a problem where I was still seeing these HTTP 0 errors. So I decided to check on the logs while some other stuff was running. That's when I noticed something.

Side note: I check New Relic APM regularly. It's open in a tab and I peruse through it and have alerts set up in case something wild happens. Nothing ever was showing in this tab. At least nothing too wild. A couple of DB issues and a handful of errors with a remote web service.

But back to the logs: so in the logs I checked the timestamp of the request in my reporting system with the log... and I see that the OPTIONS request for the resource responded with a 502 status code.

Hmmm. Why? Open up the error logs...

... [crit] 30081#30081: *4243516 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: #.#.#.#, server: redacted.server.name, request: "POST /api/v2/request-endpoint HTTP/1.1", upstream: "fastcgi://unix:/var/run/php/php7.3-fpm.sock:", host: "redacted.server.name", referrer: "http://url/"

Whoa. So the webserver lost connection to FPM. Is FPM crashing?

[24-Jan-2020 13:48:57] WARNING: [pool www] child 13296 exited on signal 15 (SIGTERM) after 259.005807 seconds from start

Well, I'm not entirely sure why at this point but given that the application infrastructure has changed recently, I decided to go back over the config files.

php-fpm got bumped a little bit since some stuff was taken off this server. Also, it appears that I had been hitting the pool limits and so given the size of the box, it got a pretty sizeable bump.

So after deploying that change, I started getting errors again.

MySQL connection limit reached. Whoops. Upping the number of php-fpm processes means that now that number needs to go up some.

So after deploying that change, it really got me thinking about how easy it is to miss this stuff.

Even with something like New Relic and error reporting, I was getting these weird errors from the client which told me nothing.

Monitoring stuff is hard, especially if you're not even sure what you're looking for.

Anyway, things have died down considerably. The number of random status = 0 errors is practically 0 now.

I'm not sure if New Relic's infrastructure system would have caught this, but hey, they give you a 30-day free trial, so why not.

DEV Community

Unpacking my head

Top comments (0)