Over the years, I’ve interviewed around 100 software developers at Google and roughly the same number across my own companies. One thing has become very clear:
Resumes don’t work.
They’re too noisy. You get flooded with titles, buzzwords, and irrelevant project summaries. So I distilled everything down to one single task. One prompt I can give to anyone — junior or architect — and instantly get a signal.
The task?
Write a library that calculates the sum of a vector of values.
That’s it. No extra requirements. The beauty is that it looks trivial — but the depth reveals itself as the candidate explores edge cases, generalization, scalability, performance, and design.
🪜 Level 1: The Junior Developer
Most junior candidates start like this:
int Sum(int* data, size_t num_elements) {
    int result = 0;
    for (size_t i = 0; i < num_elements; ++i)
        result += data[i];
    return result;
}
It compiles. It runs. But you immediately see:
- No 
const - No null check
 - Indexing instead of pointer-based iteration
 - No header splitting or inline consideration
 - No thoughts about reusability or API quality
 
Already, you’re learning a lot.
🪜 Level 2: The Mid-Level Developer
The next tier generalizes the code:
template<typename T>
T Sum(const T* data, size_t num_elements);
Then comes overflow protection — separate input/output types:
template<typename O, typename I>
O Sum(const I* data, size_t num_elements) {
    O result{0};
    if (data) {
        for (size_t i = 0; i < num_elements; ++i)
            result += static_cast<O>(data[i]);
    }
    return result;
}
They start thinking in terms of the STL:
template<typename InputIt>
int Sum(InputIt begin, InputIt end);
And even bring in constexpr:
template<typename InputIt>
constexpr int Sum(InputIt begin, InputIt end);
Eventually someone realizes this is already in the standard library (std::accumulate) — and more advanced candidates point out std::reduce, which is reorderable and SIMD/multithread-friendly (and constexpr in C++20).
At this point, we’re talking fluency in STL, value categories, compile-time evaluation, and API design.
🧠 Level 3: The Senior Engineer
Now the conversation shifts.
They start asking:
- What’s the maximum number of elements?
 - Will the data fit in memory?
 - Is it a single-machine process or distributed?
 - Is the data streamed from disk?
 - Is disk the bottleneck?
 
They consider chunked reads, asynchronous prefetching, thread pool handoff, and single-threaded summing when disk I/O dominates.
Then comes UX: can the operation be paused or aborted?
Now we need a serializable processing state:
template<typename T>
class Summarizer {
public:
    Summarizer(InputIt<T> begin, InputIt<T> end);
    Summarizer(std::ifstream&);
    Summarizer(std::vector<Node> distributed_nodes);
    void Start(size_t max_memory_to_use = 0);
    float GetProgress() const;
    State Pause();
    void Resume(const State&);
};
Now they’re designing:
- Persistent resumability
 - State encoding
 - Granular progress tracking
 
They add:
- Asynchronous error callbacks (e.g., if input files are missing)
 - Logging and performance tracing
 - Memory usage accounting
 - Numeric precision improvements (e.g., sorting values or using Kahan summation)
 - Support for partial sort/save for huge datasets
 
They’ve moved beyond code — this is system architecture.
⚙️ Level 4: The Architect
They start asking questions few others do:
- Is this running on CPU or GPU?
 - Is the data already in GPU memory?
 - Should the GPU be used for batch summing?
 - Should the CPU be used first while shaders compile?
 - Can shaders be precompiled, versioned, and cached?
 
They propose:
- Abstract device interface (CPU/GPU/DSP)
 - Cross-platform development trade-offs
 - Execution policy selection at runtime
 - Binary shader storage, deployed per version
 - On-device code caching and validation
 
And memory gets serious:
- Does the library allocate memory, or use externally-managed buffers?
 - Support for map/unmap, pinned memory, DMA
 
Now we need:
- Detailed profiling: cold vs. warm latencies
 - Per-device throughput models
 - Smart batching
 - First-run performance vs. steady-state
 
Then come platform constraints:
- Compile-time configuration to shrink binary size
 - Support for heapless environments
 - Support for platform-specific allocators
 - Encryption of in-flight and at-rest data
 - Memory zeroing post-use
 - Compliance with SOC 2 and similar standards
 
💥 Bonus Level: The “Startuper”
There should probably be one more level of seniority: the “startuper” — someone who recently failed because they tried to build the perfect, highly-extensible system right away…
Instead of just sticking to the “junior-level” sum function — until they had at least one actual customer. 😅
☁️ Real-World Parallel: Æthernet
This progression is exactly what we saw while building the Æthernet client library.
We started with a minimal concept: adapters that wrap transport methods like Ethernet, Wi-Fi, GSM, satellite.
But the design questions came fast:
- What if a client has multiple adapters?
 - What if one fails? Add a backup policy
 - What if latency is critical? Add a redundant policy: duplicate each message across all adapters
 - What if we want backup within groups, and parallel send across groups? Introduce adapter groups
 
Then came the “infinite design moment”:
What if a client wants to:
- Send small messages through LTE (cheap)
 - Send large messages through fiber (fast)
 - Route messages differently based on user-defined metadata
 - Swap policies based on live network metrics
 
At some point, you realize: this never ends.
So we stopped.
We open-sourced the client libraries.
We let users define their own policies.
Because the most scalable design is knowing where to stop.
🧠 Final Thought
This one task — sum() — exposes almost everything:
- Technical depth
 - Communication style
 - Architectural insight
 - Prioritization
 - Practical vs. ideal tradeoffs
 
It reveals if someone knows how to build things that work, how to make them better, and — most importantly — how to recognize when to stop.
              
    
Top comments (0)