Nikolay Chirkov

Posted on May 6, 2025

Interviewing Software Developers: From Junior to Architect in a Single Programming Task

Over the years, I’ve interviewed around 100 software developers at Google and roughly the same number across my own companies. One thing has become very clear:

Resumes don’t work.

They’re too noisy. You get flooded with titles, buzzwords, and irrelevant project summaries. So I distilled everything down to one single task. One prompt I can give to anyone — junior or architect — and instantly get a signal.

The task?

Write a library that calculates the sum of a vector of values.

That’s it. No extra requirements. The beauty is that it looks trivial — but the depth reveals itself as the candidate explores edge cases, generalization, scalability, performance, and design.

🪜 Level 1: The Junior Developer

Most junior candidates start like this:

int Sum(int* data, size_t num_elements) {
    int result = 0;
    for (size_t i = 0; i < num_elements; ++i)
        result += data[i];
    return result;
}

It compiles. It runs. But you immediately see:

No const
No null check
Indexing instead of pointer-based iteration
No header splitting or inline consideration
No thoughts about reusability or API quality

Already, you’re learning a lot.

🪜 Level 2: The Mid-Level Developer

The next tier generalizes the code:

template<typename T>
T Sum(const T* data, size_t num_elements);

Then comes overflow protection — separate input/output types:

template<typename O, typename I>
O Sum(const I* data, size_t num_elements) {
    O result{0};
    if (data) {
        for (size_t i = 0; i < num_elements; ++i)
            result += static_cast<O>(data[i]);
    }
    return result;
}

They start thinking in terms of the STL:

template<typename InputIt>
int Sum(InputIt begin, InputIt end);

And even bring in constexpr:

template<typename InputIt>
constexpr int Sum(InputIt begin, InputIt end);

Eventually someone realizes this is already in the standard library (std::accumulate) — and more advanced candidates point out std::reduce, which is reorderable and SIMD/multithread-friendly (and constexpr in C++20).

At this point, we’re talking fluency in STL, value categories, compile-time evaluation, and API design.

🧠 Level 3: The Senior Engineer

Now the conversation shifts.

They start asking:

What’s the maximum number of elements?
Will the data fit in memory?
Is it a single-machine process or distributed?
Is the data streamed from disk?
Is disk the bottleneck?

They consider chunked reads, asynchronous prefetching, thread pool handoff, and single-threaded summing when disk I/O dominates.

Then comes UX: can the operation be paused or aborted?

Now we need a serializable processing state:

template<typename T>
class Summarizer {
public:
    Summarizer(InputIt<T> begin, InputIt<T> end);
    Summarizer(std::ifstream&);
    Summarizer(std::vector<Node> distributed_nodes);

    void Start(size_t max_memory_to_use = 0);
    float GetProgress() const;
    State Pause();
    void Resume(const State&);
};

Now they’re designing:

Persistent resumability
State encoding
Granular progress tracking

They add:

Asynchronous error callbacks (e.g., if input files are missing)
Logging and performance tracing
Memory usage accounting
Numeric precision improvements (e.g., sorting values or using Kahan summation)
Support for partial sort/save for huge datasets

They’ve moved beyond code — this is system architecture.

⚙️ Level 4: The Architect

They start asking questions few others do:

Is this running on CPU or GPU?
Is the data already in GPU memory?
Should the GPU be used for batch summing?
Should the CPU be used first while shaders compile?
Can shaders be precompiled, versioned, and cached?

They propose:

Abstract device interface (CPU/GPU/DSP)
Cross-platform development trade-offs
Execution policy selection at runtime
Binary shader storage, deployed per version
On-device code caching and validation

And memory gets serious:

Does the library allocate memory, or use externally-managed buffers?
Support for map/unmap, pinned memory, DMA

Now we need:

Detailed profiling: cold vs. warm latencies
Per-device throughput models
Smart batching
First-run performance vs. steady-state

Then come platform constraints:

Compile-time configuration to shrink binary size
Support for heapless environments
Support for platform-specific allocators
Encryption of in-flight and at-rest data
Memory zeroing post-use
Compliance with SOC 2 and similar standards

💥 Bonus Level: The “Startuper”

There should probably be one more level of seniority: the “startuper” — someone who recently failed because they tried to build the perfect, highly-extensible system right away…

Instead of just sticking to the “junior-level” sum function — until they had at least one actual customer. 😅

☁️ Real-World Parallel: Æthernet

This progression is exactly what we saw while building the Æthernet client library.

We started with a minimal concept: adapters that wrap transport methods like Ethernet, Wi-Fi, GSM, satellite.

But the design questions came fast:

What if a client has multiple adapters?
What if one fails? Add a backup policy
What if latency is critical? Add a redundant policy: duplicate each message across all adapters
What if we want backup within groups, and parallel send across groups? Introduce adapter groups

Then came the “infinite design moment”:

What if a client wants to:

Send small messages through LTE (cheap)
Send large messages through fiber (fast)
Route messages differently based on user-defined metadata
Swap policies based on live network metrics

At some point, you realize: this never ends.

So we stopped.

We open-sourced the client libraries.
We let users define their own policies.
Because the most scalable design is knowing where to stop.

🧠 Final Thought

This one task — sum() — exposes almost everything:

Technical depth
Communication style
Architectural insight
Prioritization
Practical vs. ideal tradeoffs

It reveals if someone knows how to build things that work, how to make them better, and — most importantly — how to recognize when to stop.

DEV Community