re: How do you manage deployment configs? (Especially large scale cloud agnostic ones) VIEW POST

FULL DISCUSSION
 

"...terraform was suppose to be a single solution for everything,..." Nothing is a 'fix all the things' solution. Anyone who tries to sell you on that concept (for anything in life) is either lying or ignorant. Terraform is provisioned, Ansible is configuration management, K8 is container orchestration and management. Each tool has a job it is good at.

"Have university degree, write YAML for a living" - DJE, 2019

"...Is it normal to always throw in the towel at the end and code up a custom configuration management script to handle all this chaos?" - No two systems are alike; every system requires some level of customization. that being said if your application follows established patterns the vast majority of the processes implementable with little customization.

"...It feels like I am reinventing the wheel on these things..." You probably are. :D When I get feelings like this is when I start searching. 99.9% of everything you want to do, has been done before, organized, and turned into a design pattern. Just have to find the right pattern and apply it.

"...Sidetrack: a large part of me just feels like redoing terraform in nodejs out of frustration, to support my use cases..." You might want to look at pulumi.com/.

"...does not exist, or does poorly performance wise..." Ah, the real meat of the situation. Th need for multi-provider / multi-region is performance. Does your organization currently have acceptable performance metrics codified? SLA, SLO, ROI allowances? I #feel# like unless you MUST have 100% real time low latency communication (tele-conference surgery, Air traffic control, et al.) a second or two of latency for a 25% reduction in complexity might be worth looking at.

My mantra when developing anything:
Make it work, Make it right, Make it fast, In that order.

I'd love to sit down and discuss your situation in detail and provide some outside feedback. Sometimes it is hard to see the forest when you are stuck in the weeds.

 

"Have university degree, write YAML for a living" - DJE, 2019

Laughing out loud at this - yes thats what I feel at times now.

"...does not exist, or does poorly performance wise..."

Its not so much on the performance on the user side, as the line may imply. A huge miswording on my part, its more akin to not fitting the requirements.

I might be lacking context on this one. So after taking a night sleep and tackling at the problem again, with fresher mind. Might be better to phrase it this way.

In general, we have 3 major layers in our infrastructure (atleast in the context of this discussion)

  • Proxy layer
  • Testing Browsers
  • Everything Else

The layer which causes the most pain, configuration wise is the proxy layer, on our "pro plus" testing plan, we allow our users to run UI browser test scripts in a country of their choice. So that they can test IP based geo restrictions/behavior of their servers. (we call it our "region selection" feature).

When it comes to the size of these servers, as they are just custom configured secure proxies, they are typically the equivalent of AWS micro to small servers (depending on workload for a region).

But its where configuration hell starts from. For example, alicloud is effectively the only major provider for indonesia, and is not supported in the current version of terraform.

GCP (our main cloud provider) is out of the picture, amusingly in part because their network is too optimized. No matter where your servers physically are, the recieving server either thinks its from USA, or the same data center they are at. Throwing geo detection out of control.

However, as we slowly scale up the number of regions / countries we support on this layer, from 12 to N. It multiply the configurations needed for the lower layer

Moving down to the testing browser layer, this generally run in 1 of our 2 main GCP clusters. Due to the limitation of selenium servers, this ends up in kubernetes yaml to deploy a group of container per proxy above. We used to do update this configuration by hand until misconfiguration became an increasingly common mistake in caught in testing (we test ourselves!).

So now we are transiting to generate the configuration, based on output from the "proxy layer" given by either terraform or the cloud provider API (eliminating any possible typo in ip addresses)


"...It feels like I am reinventing the wheel on these things..." You probably are. :D

Doing a shout out here on dev.to, cause it seems like everywhere I looked its either Ansible, or terraform. (Or DIY)

You might want to look at pulumi.com/.

Definitely will look into this (thank you!)


I'd love to sit down and discuss your situation in detail and provide some outside feedback. Sometimes it is hard to see the forest when you are stuck in the weeds.

Feel free to DM me directly on Twitter - twitter.com/picocreator or on dev.to

 

"...Throwing geo detection out of control..." Can you use the location of the requesting browser rather than GeoIP?

"...common mistake in caught in testing (we test ourselves!)..." That is awesome to hear! This practice is often called 'dog fooding'. It is where you 'eat' the thing you 'provide'.

dev.to DM coming your way. :) I look forward to talking with you soon.

"...Throwing geo-detection out of control..." Can you use the location of the requesting browser rather than GeoIP?

Unfortunately that is subjected to the "testing website" implementation >=(

it seems that for majority of websites we are helping test - "ip based" detection as opposed to GPS (probably cause the browser will prompt for permission)

It also really gave me lots of insight into how heavily optimized GCP networking is on the lowest level possible, including even BGP, when I deep dived into why this is happening. (but thats a huge side track)

code of conduct - report abuse