DEV Community: Nikolay Chirkov

$100K/day cloud bill isn't a Bug – it's by Design

Nikolay Chirkov — Thu, 08 May 2025 19:57:33 +0000

Cloud platforms are built to scale. That’s their core feature — and their hidden risk. Every request to a cloud function, database, or storage API has a cost. If enough requests arrive, even legitimate-looking ones, the backend will scale automatically and incur that cost — and the account owner will receive the bill.

This is not an exception. It is the intended behavior.

Real Incidents of Cost-Based Abuse

Several public cases illustrate how cloud billing can be exploited or spiral out of control:

$100K in 24 hours via Firebase – A WebGL hosting app saw a sudden traffic spike and was billed over $100,000. The cloud service scaled perfectly. No failure occurred — other than financial.
One public file in Firebase = $98K – A single shared file led to massive egress usage and a near six-figure bill.
GCP DDoS → $100K+ projected bill – Valid-looking requests during a DDoS ran up charges with no way to stop them quickly.

These examples — and many others — follow the same pattern: no security breach, just usage that scaled and billed exactly as designed.

Why Protections Often Fail

Rate limits are global and imprecise Most limits apply per service, not per client. For example: a database may be capped at 100 queries per second. If there are 100 legitimate clients and 1,000,000 automated attackers, legitimate users may not be served at all.

Limits are hard to balance across services Every backend (DB, API, cache) needs separate tuning. Too tight = outages. Too loose = runaway costs. In distributed systems, this balance is nearly impossible.

Budget alerts are too late Billing data can lag by 15 minutes to several hours. By the time alerts arrive, thousands of dollars may already be spent.

Attackers look like users Tokens can be pulled from apps or frontends. Even time-limited tokens — like AWS pre-signed S3 URLs — can be refreshed by any client the attacker controls.

Becoming a “legitimate client” is often as simple as making an HTTPS request.

What Could Help?

To protect against cost-based abuse, three mechanisms can be combined:

1. Per-client real-time quota enforcement Each client gets a monetary quota. Every request (log, DB op, message) deducts from it. Clients near their limit are automatically slowed or paused — without affecting others.

2. Proof-of-work before provisioning New clients must solve a computational puzzle before access. This cost is:

Negligible (milliseconds) under normal use — for both real users and attackers
Increased during abuse — e.g., if mass registrations occur

The mechanism uses a pool of bcrypt hashes with a dynamic seed, difficulty, and verification target. More details here

3. Optional cleanup and usage-aware control Inactive clients can be dropped. Clients near quota can trigger backend checks (how fast was quota used, is usage organic, etc.). Note: this is app-specific and may require custom business logic.

Outcome: Cost-Limited Scalability

When every client has a cap and must do work to onboard:

Abuse becomes expensive
Real users aren't throttled globally
Backend resources scale safely
Alerts aren’t needed to stop financial loss — enforcement is automatic

The attack surface shifts: instead of “can I make this API fail?”, it becomes “can I afford to keep sending requests?”

Final Thought

Clouds scale. And they bill. What they don’t do — by default — is distinguish between a valuable client and a costly one.

Security doesn’t end at authentication. When requests generate cost, economic boundaries matter.

Systems need a way to say “no” before the invoice says “too late.”

Interviewing Software Developers: From Junior to Architect in a Single Programming Task

Nikolay Chirkov — Tue, 06 May 2025 05:55:53 +0000

Over the years, I’ve interviewed around 100 software developers at Google and roughly the same number across my own companies. One thing has become very clear:

Resumes don’t work.

They’re too noisy. You get flooded with titles, buzzwords, and irrelevant project summaries. So I distilled everything down to one single task. One prompt I can give to anyone — junior or architect — and instantly get a signal.

The task?

Write a library that calculates the sum of a vector of values.

That’s it. No extra requirements. The beauty is that it looks trivial — but the depth reveals itself as the candidate explores edge cases, generalization, scalability, performance, and design.

🪜 Level 1: The Junior Developer

Most junior candidates start like this:

int Sum(int* data, size_t num_elements) {
    int result = 0;
    for (size_t i = 0; i < num_elements; ++i)
        result += data[i];
    return result;
}

It compiles. It runs. But you immediately see:

No const
No null check
Indexing instead of pointer-based iteration
No header splitting or inline consideration
No thoughts about reusability or API quality

Already, you’re learning a lot.

🪜 Level 2: The Mid-Level Developer

The next tier generalizes the code:

template<typename T>
T Sum(const T* data, size_t num_elements);

Then comes overflow protection — separate input/output types:

template<typename O, typename I>
O Sum(const I* data, size_t num_elements) {
    O result{0};
    if (data) {
        for (size_t i = 0; i < num_elements; ++i)
            result += static_cast<O>(data[i]);
    }
    return result;
}

They start thinking in terms of the STL:

template<typename InputIt>
int Sum(InputIt begin, InputIt end);

And even bring in constexpr:

template<typename InputIt>
constexpr int Sum(InputIt begin, InputIt end);

Eventually someone realizes this is already in the standard library (std::accumulate) — and more advanced candidates point out std::reduce, which is reorderable and SIMD/multithread-friendly (and constexpr in C++20).

At this point, we’re talking fluency in STL, value categories, compile-time evaluation, and API design.

🧠 Level 3: The Senior Engineer

Now the conversation shifts.

They start asking:

What’s the maximum number of elements?
Will the data fit in memory?
Is it a single-machine process or distributed?
Is the data streamed from disk?
Is disk the bottleneck?

They consider chunked reads, asynchronous prefetching, thread pool handoff, and single-threaded summing when disk I/O dominates.

Then comes UX: can the operation be paused or aborted?

Now we need a serializable processing state:

template<typename T>
class Summarizer {
public:
    Summarizer(InputIt<T> begin, InputIt<T> end);
    Summarizer(std::ifstream&);
    Summarizer(std::vector<Node> distributed_nodes);

    void Start(size_t max_memory_to_use = 0);
    float GetProgress() const;
    State Pause();
    void Resume(const State&);
};

Now they’re designing:

Persistent resumability
State encoding
Granular progress tracking

They add:

Asynchronous error callbacks (e.g., if input files are missing)
Logging and performance tracing
Memory usage accounting
Numeric precision improvements (e.g., sorting values or using Kahan summation)
Support for partial sort/save for huge datasets

They’ve moved beyond code — this is system architecture.

⚙️ Level 4: The Architect

They start asking questions few others do:

Is this running on CPU or GPU?
Is the data already in GPU memory?
Should the GPU be used for batch summing?
Should the CPU be used first while shaders compile?
Can shaders be precompiled, versioned, and cached?

They propose:

Abstract device interface (CPU/GPU/DSP)
Cross-platform development trade-offs
Execution policy selection at runtime
Binary shader storage, deployed per version
On-device code caching and validation

And memory gets serious:

Does the library allocate memory, or use externally-managed buffers?
Support for map/unmap, pinned memory, DMA

Now we need:

Detailed profiling: cold vs. warm latencies
Per-device throughput models
Smart batching
First-run performance vs. steady-state

Then come platform constraints:

Compile-time configuration to shrink binary size
Support for heapless environments
Support for platform-specific allocators
Encryption of in-flight and at-rest data
Memory zeroing post-use
Compliance with SOC 2 and similar standards

💥 Bonus Level: The “Startuper”

There should probably be one more level of seniority: the “startuper” — someone who recently failed because they tried to build the perfect, highly-extensible system right away…

Instead of just sticking to the “junior-level” sum function — until they had at least one actual customer. 😅

☁️ Real-World Parallel: Æthernet

This progression is exactly what we saw while building the Æthernet client library.

We started with a minimal concept: adapters that wrap transport methods like Ethernet, Wi-Fi, GSM, satellite.

But the design questions came fast:

What if a client has multiple adapters?
What if one fails? Add a backup policy
What if latency is critical? Add a redundant policy: duplicate each message across all adapters
What if we want backup within groups, and parallel send across groups? Introduce adapter groups

Then came the “infinite design moment”:

What if a client wants to:

Send small messages through LTE (cheap)
Send large messages through fiber (fast)
Route messages differently based on user-defined metadata
Swap policies based on live network metrics

At some point, you realize: this never ends.

So we stopped.

We open-sourced the client libraries.
We let users define their own policies.
Because the most scalable design is knowing where to stop.

🧠 Final Thought

This one task — sum() — exposes almost everything:

Technical depth
Communication style
Architectural insight
Prioritization
Practical vs. ideal tradeoffs

It reveals if someone knows how to build things that work, how to make them better, and — most importantly — how to recognize when to stop.

Cross-Platform Software Development – Part 1: Yes, Bytes Can Be 9 Bits

Nikolay Chirkov — Fri, 02 May 2025 21:28:26 +0000

When we say cross-platform, we often underestimate just how diverse platforms really are. Did you know the last commercial computer using 9-bit bytes was shut down only 30 years ago? That was the PDP-10—still running when C was dominant, C++ was just emerging (but not yet standardized), Java hadn’t launched (just one year before its release), and Python was still in development (two years before version 1.0).

That kind of diversity hasn’t gone away—it’s just shifted. Today:

There are 35+ active CPU architecture families: x86/64, Arm, MIPS, RISC-V, Xtensa, TriCore, SPARC, PIC, AVR, and many more
Some use unusual instruction widths (e.g., 13-bit for Padauk’s $0.03 MCU)
Not all CPUs support floating-point—or even 8-bit operations

And beyond the hardware:

15+ actively used IDEs
10+ build systems (CMake, Bazel, Make, etc.)
10+ CI/CD tools
Multiple documentation systems (e.g., Doxygen)
Dozens of compliance and certification standards (MISRA C++, aerospace, safety, security, etc.)

Even if your library is just int sum(int a, int b), complexity sneaks in. You have to think about integration, testing, versioning, documentation—and possibly even certification or safety compliance.

Over time, we’ve solved many problems that turned out to be avoidable. Why? Because cross-platform development forces you to explore the strange corners of computing. This article series is our way of sharing those lessons.

Why C++?

We’re focusing on C++ because:

It compiles to native code and runs without a virtual machine (unlike Java)
It’s a descendant of C, with a wealth of low-level, highly optimized libraries
It builds for almost any architecture—except the most constrained devices, where pure C, mini-C (Padauk), or assembly is preferred

That makes it the language of choice for serious cross-platform development—at least on CPUs. We’re skipping GPUs, FPGAs, and low-level peripherals (e.g., GPIO, DMA) for now, as they come with their own portability challenges.

Why Not C?

C is still a valid choice for embedded and systems development—but modern C++ offers major advantages. C++17 is supported by all major toolchains and improves development by providing:

Templates that dramatically reduce boilerplate and code size
Compile-time programming (metaprogramming), simplifying toolchains and shifting logic from runtime to compile time
Stronger type systems

Yes, binary size can increase—but with proper design, it’s manageable. Features like exceptions, RTTI, and STL containers can be selectively disabled or replaced. The productivity and maintainability gains often outweigh the cost, especially when building reusable cross-platform libraries.

How to Think About Requirements

You can’t build a library that runs everywhere—but you can plan wisely:

List all platforms you want to support
Choose the smallest subset of toolchains (IDE, build system, CI) that covers most of them
Stick with standard ecosystems (e.g., Git + GitHub) for sharing and integration

Example: Big-endian support

If your library needs to support communication between systems with different endianness (e.g., a little-endian C++ app and a big-endian Java app), it’s better to handle byte order explicitly from the start.

Adding byte-swapping now might increase complexity by, say, 3%. But retrofitting it later—especially after deployment—could cost, say, 30% more in refactoring, debugging, and testing.

Still, ask: Does this broaden our potential market? Supporting cross-endian interaction makes your library usable in more environments—especially where Java (which uses big-endian formats) is involved. It’s often safer and easier to normalize data on the C++ side than to change byte handling in Java.

Requirements Are Multidimensional

Even a single feature—like big-endian support—adds complexity to your CI/CD matrix. Cross-platform code must be tested across combinations of:

CPU architectures
Compilers
Toolchains

But that’s just the beginning. A typical project spans many other dimensions:

Build configurations (debug, release, minimal binary size)
Optional modules (e.g., pluggable hash algorithms)
Hardware features (e.g., FPU availability)
Compile-time flags (e.g., log verbosity, filtering, platform constraints)
Business logic flags—often hundreds of #defines

Each dimension multiplies the test matrix. The challenge isn’t just making code portable—it’s keeping it maintainable.

Supporting a new CPU architecture means expanding your CI/CD infrastructure—especially if using GitHub Actions. Many architectures require local runners, which are harder to manage. Pre-submit tests for such configurations can take tens of minutes per run (see our multi-platform CI config).

Compile-time customization increases complexity further. Our config.h in the Aethernet C++ client toggles options like floating-point support, logging verbosity, and platform-specific constraints. Multiply that by every build configuration and platform, and you get an idea of how quickly things grow.