Engineering leaders simply have little to no visibility of the surplus costs around internal technology infrastructure tooling. The lack of awareness around bloated infrastructure and its associated costs is not just staggering, but underscores how much ‘technology’ under the hood is a source of avoidable expenditures.
To roll this back, identify blind spots, and slash costs, is a herculean task, albeit an inevitable one given the climate we find ourselves in. But where does one start? Rather, how much does an engineering leader know about their costs, and how to revisit them? That’s harder to gauge than one might assume.
Arctic Temperatures – Engineering Tooling
A typical unicorn startup will be spending a minimum of $5 million every year on all tooling. (In conversations I’ve had, that’s a conservative number.) From cloud storage costs to monitoring, and project management tools, the costs to run a startup at that scale grows disproportionately over time. A large chunk of this is because of what I think of as ‘Arctic’ tooling.
Arctic tooling is the software your business essentials are built on. But, the visibility and pain of understanding its inherent costs are fairly complex. It’s not as simple as your Google Suite costs, or internal Slack messaging costs. This is about AWS storage costs; the bread and butter of engineering. The thermal insulation to your arctic winter. Costs you simply can’t do away with. (But can definitely reduce with engineering efforts)
The painstaking efforts to understand these costs and bring them down are hard. Couple this with a lack of awareness, and we have a pandemic of poor infrastructure, and massive bills at the end of every month. You’ll see this chatter online – CTOs bemoaning their large AWS bills, and how they simply have no clue what’s biting them. There’s a reason for this, and that’s poor ‘Observability’ practices.
What is Observability?
The measurement and attribution of performance in a complex software environment is called Observability. ‘Performance’ of software comes in various shapes and sizes. For example; latency. In simple words, it’s the time taken for a mobile app to load.
The immediate one that comes to mind – One-time passwords. Those agonizing 30+ seconds you have to wait for an OTP to arrive in your messages so you can complete a transaction – that’s a form of latency. The two seconds it takes for a food delivery app to populate available dishes in a restaurant – latency.
I know what you’re thinking. Surely, a two-second latency is not bad! Well, think of gamers. A two-second latency is an eternity when you’re playing a shooting game. In fact, at 100 milliseconds, a game is unplayable. How does one fix this? Well, one has to first ‘observe’ that this phenomenon happens to users. Then you have to dissect it; is it for all users? Maybe users only in a certain region? Maybe users with a certain kind of phone. And the list goes on.
The practice of understanding the performance of software is complicated. Like latencies, we have concurrent users, (think 30+ million people wanting to watch an India vs Pakistan cricket match) 5xx errors (when a system can’t fulfil a request), etc. Again, the list is endless.
If we categorise these into RED metrics, we can gauge the key ones that impact a business. And mind you, these have a direct impact on revenues. A frustrated user might choose to book a cab, or order food, from another provider. In fact, I'd contend that latencies are the new downtimes. If it’s slow – a consumer will bounce. They will simply choose other alternatives.
Fun fact: More than 10 years back, Amazon found that a 100ms latency cost them one percent in sales. Oh, you know why Google is so fast? Because they found that an extra 500ms to show search results cost them a 20 percent drop in traffic. Latencies are… hard problems.
The Hidden C.O.S.T. Of Instrumentation
Tech companies are losing staggering amounts of money by poor instrumentation. It’s an untold story, because no one knows how deep the rabbit hole goes. Leaders are grappling with this phenomenon now as funding dries, and pressures mount on controlling expenses.
All tech comes under 4 key instrumentations:
The long winter is here, and if you don’t want the inevitable frostbite, audit your entire internal software as a service (SaaS) environment.
Nishant Modak is Founder, Last9. Views are personal and do not represent the stand of this publication.