loading...
Cover image for  How to waste half a million dollars
Heroku

How to waste half a million dollars

nocnica profile image Nočnica Fee ・4 min read

Every startup I’ve worked for had the same mission statement. It might not have been the mission statement etched into a plaque on the wall, but it existed just the same. Here it was:

Make a product so good that people who don’t work for you are excited to tell their friends. Charge people for the product. Make money.

The ‘how’ is missing from this statement, but it’s not the point of this article. I want to talk about how teams get distracted. How some of your best, brightest, most effective people can spend all day working on things that won’t help this mission at all.

How we go from ‘make money’ to ‘run servers’

No one starts a company with a goal of getting really good at running servers. They start with a product or service that they want to deliver to customers.

hey for this whole article I’m pretending like platform/hosting startups don’t exist. Please let me live. Also, this example is centered around a startup but is equally applicable for orgs of all sizes.

Let’s create a company called GoCo. They have a fantastic product that will improve the lives of other software companies and want to take it to the world. The first version of the product will be hosted on something like Heroku. Heroku handles all the updating, patching, and general concerns of making servers serve. Once the product is built GoCo delivers demos, gets some first users, and generates interest at meetups and conferences.

And for a time it is good.

Then as the product grows, and the user base grows, the CFO of GoCo says:

“We’re spending hundreds of dollars on Heroku, we need to set up our own servers to reduce that bill”

One of the most capable engineers at GoCo, let’s call her Grace, spends a couple weeks getting the application running on low cost virtual machines. This is cheaper than a platform and sure enough, the bill goes down.

Money saved, right?

Even right here our theoretical startup has made an error. They saw a reduction of a bill and said they saved money. What everyone forgot was that they paid Grace the engineer for several weeks of work to make this changeover happen. Did the cash saved even equal what they paid, in cash, for Grace to do the work? Probably not, but it gets worse because in that same time Grace could have built something for the product.

What could your best engineers build if they weren’t trying to save you a few hundred bucks on servers?

By tasking Grace the engineer to save you on operational costs, GoCo was depriving itself of the new features and optimizations that could have been delivered in that time. Missing features don’t show up on a balance sheet, but if we don’t deliver cool features and better product there is no way the startup is going to make money.

The #1 best way for a company to save money is to fire everyone and go out of business. And that’s not in anyone’s mission statement.

It gets worse: eventually servers become a full-time job

Sooner or later GoCo’s servers go down. That discount server farm they use only guarantees that it won’t catch on fire or be physically penetrated, but patching, updating, and configuring servers is GoCo’s job. And Grace, who is just as smart and capable as can be, isn’t even a full-time operations engineer and is probably evaluated on how many features she ships and not how well she patches servers.

GoCo’s executives sit down for a ‘post-mortem:’ the servers went down during a critical conference and everyone is upset. The solution at the time seems clear: GoCo needs an operations specialist or maybe a whole team.

There may be some discussion about what the new salary for this full-time Ops specialist will cost, there probably won’t _be a discussion of how bringing on a new engineer can cost $50,000 in engineer time, bonuses, and start up time. And there definitely won’t be a discussion of how suddenly one of our highest-paid employees is someone whose job has _absolutely nothing to do with the mission.

I’ve seen executives complain about the cost of customer support, about the cost of team building events and of course the cost of software services. I’ve never seen one complain that some portion of the engineering team no longer deliver great features but instead deliver great servers.

No one ever says ‘what if we got rid of all these servers?’

A year later and GoCo is thriving. Customers are excited, and marketing is talking about a GoCoCon in 2021. The new operations team has great reviews, and in all-staff meetings they celebrate delivering “Five Nines” or 99.999% uptime.

And again, and I’m sorry if I’m hammering the same point, no one ever sits down and says ‘What would Heroku or another Platform-as-a-Service cost us compared to what our entire operation teams costs us?” Because I would submit that a team of 3-4 operations engineers is costing GoCo half a million dollars a year.

  • Salary
  • Benefits
  • Stock options
  • Hiring and EOY bonuses
  • Communications overhead
  • Office Overhead

I think $500,000 is a conservative guess for these costs.

And think most managers need to go back and take a look at Platform-as-a-Service as a way to host their products. A comparison with a low-cost EC2-style service should not be based on the relative service bill, but what you’ll spend on the engineers you’ll need and how your entire team will be affected by losing sight of the mission.

Posted on by:

nocnica profile

Nočnica Fee

@nocnica

Actually the pug from Dune (1984)

Heroku

Heroku is a container-based cloud Platform as a Service (PaaS). Developers use Heroku to deploy, manage, and scale modern apps. Our platform is elegant, flexible, and easy to use, offering developers the simplest path to getting their apps to market.

Discussion

markdown guide
 

GoCo has a awesome TODO application. Their initial features included:

  1. Adding ToDo's
  2. Marking them complete

At some point, someone decided that it's important to spend time (and thus money) talking to customers, and getting their perspective on what they really want. The CEO of course didn't like that, but did it anyway. And they discovered a slew of new features that could be added to the application:

  1. Deleting ToDo's without marking them complete
  2. Sharing ToDo's
  3. Keeping a ToDo history
  4. Autocompletion of ToDos
  5. Sharing ToDos with friends and family
  6. Real-Time Collaborative ToDo management

1000's of customers later, the application went from having a snappy 50ms response time when viewing the website to 4 seconds to load!!! Because everyone was so focused on writing features, that they didn't consider everything else that is involved in how computers work. In the months following, the encountered the following issues, all of which could have been solved by engineers focused not on delivering new features, but on the "operational health" of the application:

  1. They ran out of disk space and caused an outage
  2. The MySQL server storing all the ToDo's ran into the PrimaryKey Integer limit, and crashed. Causing a huge outage as someone struggled to migrate the tables to use a new primary key of Int64 (or BigInt) m.signalvnoise.com/update-on-basec...
  3. After completion of the autocomplete feature, the entire application ground to a halt. Why? Every key press would send a packet to the server, asking the server for options that could be used to complete what the user was writing. The network & cpu load on the server caused huge latency issues.

I'm actually going to stop here... I could write more, but instead I'll provide some links:
github.com/hjacobs/kubernetes-fail...
devopsgroup.com/blog/devops-does-n...
devops.com/can-we-do-without-devops/

The point I'm trying to make, is that if you write software, and think that "the features" that make up your application are the bread and butter, you are in for a surprise. Your application is software. Software runs on hardware. Your application has both virtual and physical dependencies. All these things are not "features". But they are reality.

Yes, AWS or Heroku can save you money in the beginning. They can even save you money over a long duration. But if you aren't actively working to keep the machine running smoothly, then you will eventually be spending it on either outsourced PaaS costs (which can be high) and on lost revenue due to poor customer experience, which can be sometimes impossible to recover.

 

First of all Wes I really appreciate anyone taking the time to write such a thorough reply. I’ll sit down and take a closer look later today and probably write a follow up essay. You make good points!

 

I'd like to also add that Infrastructure Engineering (or Infrastructure Automation) is more than just managing the "operational health" of the an application, or being the "experts" on Heroku, AWS, Kubernetes, etc. A team that's focused on automation can also be incredible "enablers" for features teams to focus on writing features. By making sure the application runs smoothly, by paying attention to and driving forward on Continuous Integration, Continuous Deployment, and Continuous Delivery automation.

Automation Engineers can also do things like:

  1. Create dashboards and collect metrics to support business decisions
  2. Create auto-healing scripts, services, and applications so that when something goes wrong (which it will), there's automation in place to either self-heal and/or collect detailed information and context of the problem in order to help developers resolve the situation quickly.
  3. Create integrations such as SlackBots that enable developers to execute certain tasks quickly and with confidence.

Hell, without Automation Engineers and Infrastructure Engineers, things like AWS, and Kubernetes and Heroku wouldn't even exist. These services provide common "abstractions" for running applications. But each company, and each application, and each team is different. They all have different needs, wants, desires. So it makes sense to encourage high velocity development by your feature teams, by investing in the foundational work of the infrastructure and processes involved in running and releasing new versions.

This, this this THIS is great stuff, do you mind if, in a follow up, I quote most of the text in your first comment? I really like it and it points to something I really didn't talk about in this piece: as we move toward cloud tools, engineers who understand those platforms and can build automations within them become incredibly valuable.

I don't mind. I'm just happy to provide some additional perspective. As you can tell, I feel very strongly about this, not only because this is my current job position, but because I chose this position. I've seen the incredible amount of value that automation engineering can provide to feature developers.

Kubernetes is my current passion. So, in a way, it's similar to Heroku, in the application teams don't have to worry about much. In fact, I'm currently working on a "application interface" model in that every application (and thus application developer) should do the following:

  1. have a Dockerfile in their repo (or some way to generate it)
  2. follow the "12 factors" 12factor.net/
  3. expose metrics at /metrics on a separate port than your main application (if possible)
  4. allow for common openTracing configuration to be set/injected at deploy time

The rest will be handled for you - the "environment" can run your immutable images and ship logs, metrics, tracing and health information to relevant locations without being explicitly defined.

This allows for an application not only to work natively in say docker-compose but on any number of Kubernetes clusters trivially.

You should really write up this comment as its own article, it’s really a good standard, and I like the merging of a 12 factor app with a predictable interface.

I love seeing what a good platform engineer can do for their team, making it easier for them to do what they’re good at!

 

I'm an operational engineer, and basically what you are telling me is:

  • I'm a money burner
  • I don't contribute anything useful because I'm not developing new features
  • My job should better be outsourced

Wow, thanks.... really...

And guess what, even when you choose a PaaS-slotion, you will still have something like an operational engineer. People who focus on stability and integrating different systems. Someone who's in charge of your DRP and backup/restore strategy. You know, all the stuff developers don't need to worry about so they can focus on building the cool stuff and fixing bugs.

 

It reminds me of my first ever bankruptcy back in the day.
Our company had a fantastic product, which other companies absolutely wanted. It was a revolutionary way to let people model their own part of processes from their perspective, dock those parts together and deploy it on some kind of runtimeserver I never fully understood.

My task was to automate the installation routine, because setup, testing and integration into the clients infrastructure took most of our engineers about a day. Free of charge of course.

I was done after about 3 weeks, the automation worked (and is used, unchanged, to this day in the new parent company as I've been told), but our CEO didn't "want to waste time with 50k € clients. We have multi-million clients in the pipeline!").

Those multi-million jobs never came in and the company slowly creeped towards death, until about 6 months later, they filed for bankruptcy.

The automation, once kicked off on the target server, was done after 5 minutes, with additional 10 minutes of manual post-testing for our conscience.

I tried to explain that "In a day, my automation makes atleast a million too, if you get enough of the small clients in". He didn't understand how the "Junior Integration Developer" had the gall to tell him how to run his company. So naturally, I was sacked first, on my birthday as well.

That was fun, so yeah, I can sign your post out of experience. It's sad really, how so many great companies with fantastic ideas die, due to a simple "let's save money/let's focus on the big money".

 

I love this post, it reminds me of something that I learnt from one of the unsplash developers who stated that they'd prefer to throw money at problems not related to their core offering rather than build in house, and they've seen a lot of success with that methodology, allowing the team to stay small and focus on delivering a great service.

 

I disagree, ec2 or any kind of cloud instance unmanaged can ride your bill exponentially, managed you still need a cloud engineer or some sort of support so you need to hire someone to keep it up and orchestrate the entire ecosystem.
On top...let developers ride without an exclusive cloud architect
monitoring and you will have poor performance code because "cloud exponentially" grow.l and can keep with the demand.
So now..you have more than 2 problems, high bill on cloud services, and unmanaged ecosystem, a cloud engineer and performance experts, and...add that for sure developers got "used to" to the cloud of choice so it render you completely dependant to that cloud provider, which in some ocassion are low perfomant in development lifecycle( without entering the realm of security, because in complex systems you need a dedicated cloud specific security engineer), by the moment you may need to change ( think slack and zoom and aws migration) you need to spent hundred of thousands in experts .

 

That's presuming heroku can ENTIRELY replace the need for someone to think about the infrastructure, which is definitely not the case. They just end up doing it through admin panels and bills instead of the command line.

 

The #1 best way for a company to save money is to fire everyone and go out of business.

This is a fantastic quote. And an excellent article too, well done!