DEV Community

Ahsan Nabi Dar
Ahsan Nabi Dar

Posted on

is it DNS ? Yes it is

This isn't another post about facebook DNS issue. This is about my service and how DNS added significant latency to the request cycle and how I went through figuring if is it DNS.

So I have a poor man's personal Ad blocking DoH and it has been running well for my day to day use. To share the service with people outside my location it needed optimal routing as it is hosted in a location optimal for me and a poor man can't afford to have a global network. Hence I decided to front my service with a CDN. Ain't this post about DNS ? Yes but nothing works without DNS not even CDN. CDNs don't only provide optimised routing they also help with SSL handshake and has a list of other useful features. From here on you might feel like this is a post slamming Cloudflare but rest assured I haven't used a better Enterprise product then theirs. Also I don't know of any other CDN that provides support for WebSocket, gRPC, HTTP/3 to name a few on a CDN level right now. It is an amazing product to use at scale. That said for a personal plan I was not impressed and why, here is the story and it was due to DNS. We don't use their DNS at my work place we have a custom hostname setup with them.

I have experience working with Cloudflare at my employer and I naively expected the same quality of an Enterprise plan on their free plan. Yes, Cloudflare has a free plan with seemingly unlimited traffic without Argo, will get back to what is Argo. The only catch is you need to use their DNS nameservers and that is how DNS comes in to play, custom hostnames are only available on Business and Enterprise plan. To use the free plan and route the traffic via Cloudflare, I bought a new domain and added their nameservers and literally surrendered my domain to them. You get 2 IPs for free plan which are pretty bad when it comes to routing traffic, losing connectivity multiple times as it seem to be a bandwidth sharing plan. So I upgraded to their Pro plan which cost $20 and comes with some goodness such as increased page rule limit, enhanced access to analytics etc etc. Along with some connection stability. At this point the connectivity improved but the latency was still too high more than what I was experiencing on direct connection to my DoH. My first guess was it must be because I am not using Argo, their optimal network routing solution. So I paid another $5 and bought that as well which gives 1GB data on activation and is charged at $0.10/GB (Damn! that is expensive). Suddenly a new IP appeared for my domain resolution. Hurray!!! this should solve my latency woes surely ? It turned out to be a premature assumption. I saw no change in latency. It was still going almost 3x compared to direct connection. Then I opened a support ticket and started digging.

To give you an idea about the difference I am using the Nebulo app on Android to utilise DoH. These are latencies on connection and it doesn't get any better than on connection latencies only increase from these numbers.

Direct connection - 394ms

nebulo direct connection

via Cloudflare - 1074ms

nebulo via cloudflare

First I did a traceroute to my server and it maxed out at 30 hops. Not very impressive
traceroute direct

Then I did traceroute for via Cloudflare and it was impressive far shorter route
traceroute via cloudflare

Then how come the latency is so high ? That is when I decided to run httpstat which is a nice wrapper on top of curl and output less know variables of immense value for debugging.

httpstat for direct connection translated to the latency I was seeing on average in the Nebulo app. DNS lookup @ 7ms, can't complain about that.

httpstat direct

Now was the turn for running httpstat on via Cloudflare. DNS Lookup @ 1599ms, how come everything is so bad ?

Image via cloudflare

So I ran the request few more times and when the DNS Lookup is cached the SSL handshake was slow

Image 2

To eliminate variables that are not consistently bad ran fresh requests where parts aren't cached and the only thing that came out consistently bad was the DNS

Image 1

is it DNS? It sure looks like it. The route is optimised, handshake is flakey possible due to using universal certificate and not a dedicated one. A dedicated certificate cost another $5 and I didn't wanted to pay anymore as Cloudflare has a no refund policy so its like throwing money and expecting it to improve and also what started free was suddenly getting expensive.

When direct connection was bad
direct bad

via Cloudflare became worse with DNS being the culprit.
via cloudflare worse

As per Cloudflare Argo it is improving connectivity by over 53% and I don't doubt that

argo cloudflare

Even after all this I still didn't wanted to pin this on Cloudflare and decided to try out some other CDN that would give a free trial atleast and doesn't require to use their DNS but rather work with custom hostname to pin the DNS and was cheap enough to not break the bank in case I end up using it. After checking out different CDNs based on CDNperf I settled on Bunny CDN. It was the cheapest and gave majority of features as part of usage and a 14 day trial. The setup was simple and I had it running in no time. Yes it lacks some advance features compared to Cloudflare and everytime you setup a new pull zone suddenly a burst of malicious request hit your zone to find exploits for which you need to setup edge rules to block them out or blacklist those IP ranges.

bunny malicious traffic

I do like their real time logging dashboard though. So once I got past all this it was time to test the latency

First up was to test it in Nebulo and it just beat the direct connection. My first hunch was they must have a route more optimised than cloudflare as "Network Prioritization" is something only Enterprise customer get in Cloudflare that I definitely didn't have even with Argo.

nebulo bunny

So i did a traceroute and surprisingly it didn't have an optimised route it seems as bad as direct connection but must be having better connectivity between hops.
bunny traceroute

Now it was time to check httpstat and nail it down and with an impressive DNS Lookup via Bunny CDN. It sure pointed towards poor DNS from Cloudflare.
bunny traceroute

It might be that Nebulo isn't caching the DNS Lookup and causes the latency but I tried Intra as well and the results were same. Cloudflare came the last while Direct connection was in the middle with Bunny taking the lead.

The expected was true, CDN do optimise your delivery and a bad DNS can ruin it. I still love Cloudflare , just sad that their cheaper plans are not as expected compared to Enterprise option, their forum is full of such reviews/complaints, wish had gone through that before buying into their non refundable paid options. If I ever get traffic at a scale that I have to reach out for Enterprise option on any CDN I would definitely go for them. Along this I found Bunny a nice CDN to use that is cheap enough for personal projects.

is it DNS ? Yes it is

Discussion (0)