interview-aid-Etesis Elay

Posted on May 26

Telstra Melbourne SDE Interview Experience — Full 3-Round Breakdown

Recently completed all 3 interview rounds for the Telstra Melbourne Software Engineer role.

Telstra is Australia’s largest telecom company, and their interview style felt very different from typical big tech interviews. They care much more about reliability, fault tolerance, and real-world scalability rather than just algorithm tricks.

The entire process was extremely practical and heavily focused on “what happens when systems fail.”

Round 1 — Coding: Network Topology Graph

The coding round was based on implementing a network topology system with functions like:

add_node / remove_node
shortest path query
detecting Single Point of Failure (SPF)

The first two were straightforward graph problems.

The real core was identifying Single Point of Failure nodes.

Key Concept: Articulation Point (Tarjan’s Algorithm)

If removing a node disconnects the graph, that node is an articulation point.

The expected solution was essentially Tarjan’s Algorithm using DFS with:

discovery time (disc)
lowest reachable node (low)

The interviewer then pushed further:

“Would this still work for millions of nodes?”

That follow-up was actually more important than the implementation itself.

My discussion points:

Time complexity is still O(V+E)
Recursive DFS could hit stack limits → iterative DFS is safer
Real systems should use incremental recomputation instead of rebuilding the entire graph
Large topologies could be partitioned into subgraphs for distributed processing

This round felt less like LeetCode and more like infrastructure engineering.

Round 2 — System Design: IoT Monitoring Platform

The prompt:

Design an IoT device monitoring system for millions of devices.

At first it sounds like a normal system design problem, but the telecom-specific constraints made it much harder.

The 3 Main Challenges

1. Huge Frequency Differences

Different devices report data at completely different intervals:

Smart meter → every 15 minutes
Pump sensor → every second

Same system, 900x difference in traffic frequency.

2. Unstable Networks

In remote areas, disconnections are normal.

The system must distinguish between:

actual device failure
temporary network instability

3. Alert Fatigue

Too many alerts → operators ignore them
Too few alerts → real incidents get missed

My Design Approach

Edge Layer

local anomaly detection
data aggregation/compression
offline caching during disconnections
threshold-based local alerts

Cloud Layer

global anomaly analysis
cross-device correlation
historical trend analysis
centralized alerting

Architecture

Device → Edge Gateway → Kafka → Stream Processor → TSDB
                                         → Alert Engine
                                         → Dashboard

Alert Strategy

P1 / P2 / P3 severity levels
alert deduplication
delayed alerting for temporary disconnects

The interviewer then asked:

“What happens if the edge layer itself fails?”

So we discussed:

edge HA
persistent local storage
heartbeat monitoring
cloud-side edge health detection

Very operations-heavy discussion overall.

Round 3 — Reliability Engineering (99.999% Uptime)

This was easily the hardest round.

The opening question:

“What’s the difference between four nines and five nines?”

Availability Math

99.99% uptime → ~52 minutes downtime/year
99.999% uptime → ~5 minutes downtime/year

The key point:

Going from four nines to five nines is NOT a linear difficulty increase.

It becomes exponentially harder.

Main Topics Discussed

Active-Active vs Active-Passive

For true five nines:

Active-Active is basically required
Active-Passive still introduces failover downtime

Fault Isolation

Circuit Breakers
Bulkheads
preventing cascading failures

Deployment Strategies

Blue-Green deployments
Canary releases

Monitoring

RED Metrics:

Rate
Errors
Duration

Then the interviewer asked:

“Is RED alone enough?”

I said no.

Also need:

USE Metrics (Utilization / Saturation / Errors)
Business Metrics
Synthetic Monitoring

Most Important Follow-Up Question

“Can systems realistically achieve five nines?”

This question was testing whether you actually understand reliability engineering in practice.

My answer:

True five nines is extremely difficult.

To realistically approach it, you need:

multi-region active-active
multi-active databases
elimination of all single points of failure
fully automated failover

More importantly:

Not every feature actually needs five nines.

Critical operations (payments, telecom core systems, etc.) may require it, while other services can target four nines.

That tradeoff discussion mattered a lot.

Overall Impression

Telstra interviewers think very differently from typical internet-company engineers.

They care less about:

fancy architecture diagrams
theoretical scalability

And much more about:

failure scenarios
degraded network conditions
operational resilience
real-world reliability

This interview process felt very close to actual infrastructure engineering work.

Advice for Future Candidates

Always discuss failure scenarios in system design
Don’t only explain the happy path
Be comfortable talking deeply about tradeoffs
Prepare reliability topics beyond surface-level definitions
Telecom-scale systems are genuinely massive — millions of devices and unstable networks are normal assumptions

Interview Preparation Experience

Honestly, the reliability round was the one I felt least prepared for.

Questions like:

exact four-nines vs five-nines downtime numbers
RED vs USE metrics
failure isolation tradeoffs

can easily expose weak preparation if you only studied system design superficially.

Before the final round, I used Interview Aid for VO interview assistance.

What helped most was that they followed the interview flow in real time and provided guidance during the deeper reliability follow-up questions. It wasn’t generic AI-generated advice — the mentors actually understood what the interviewer was trying to evaluate and what kind of answers telecom/system reliability interviewers expect.

For interviews with heavy follow-up depth like this, having experienced engineers help you structure your thinking makes a surprisingly big difference.

DEV Community