DEV Community

Cover image for Telstra Melbourne SDE Interview Experience — Full 3-Round Breakdown
interview-aid-Etesis Elay
interview-aid-Etesis Elay

Posted on

Telstra Melbourne SDE Interview Experience — Full 3-Round Breakdown

Recently completed all 3 interview rounds for the Telstra Melbourne Software Engineer role.

Telstra is Australia’s largest telecom company, and their interview style felt very different from typical big tech interviews. They care much more about reliability, fault tolerance, and real-world scalability rather than just algorithm tricks.

The entire process was extremely practical and heavily focused on “what happens when systems fail.”


Round 1 — Coding: Network Topology Graph

The coding round was based on implementing a network topology system with functions like:

  • add_node / remove_node
  • shortest path query
  • detecting Single Point of Failure (SPF)

The first two were straightforward graph problems.

The real core was identifying Single Point of Failure nodes.

Key Concept: Articulation Point (Tarjan’s Algorithm)

If removing a node disconnects the graph, that node is an articulation point.

The expected solution was essentially Tarjan’s Algorithm using DFS with:

  • discovery time (disc)
  • lowest reachable node (low)

The interviewer then pushed further:

“Would this still work for millions of nodes?”

That follow-up was actually more important than the implementation itself.

My discussion points:

  • Time complexity is still O(V+E)
  • Recursive DFS could hit stack limits → iterative DFS is safer
  • Real systems should use incremental recomputation instead of rebuilding the entire graph
  • Large topologies could be partitioned into subgraphs for distributed processing

This round felt less like LeetCode and more like infrastructure engineering.


Round 2 — System Design: IoT Monitoring Platform

The prompt:

Design an IoT device monitoring system for millions of devices.

At first it sounds like a normal system design problem, but the telecom-specific constraints made it much harder.

The 3 Main Challenges

1. Huge Frequency Differences

Different devices report data at completely different intervals:

  • Smart meter → every 15 minutes
  • Pump sensor → every second

Same system, 900x difference in traffic frequency.

2. Unstable Networks

In remote areas, disconnections are normal.

The system must distinguish between:

  • actual device failure
  • temporary network instability

3. Alert Fatigue

Too many alerts → operators ignore them
Too few alerts → real incidents get missed


My Design Approach

Edge Layer

  • local anomaly detection
  • data aggregation/compression
  • offline caching during disconnections
  • threshold-based local alerts

Cloud Layer

  • global anomaly analysis
  • cross-device correlation
  • historical trend analysis
  • centralized alerting

Architecture

Device → Edge Gateway → Kafka → Stream Processor → TSDB
                                         → Alert Engine
                                         → Dashboard

Alert Strategy

  • P1 / P2 / P3 severity levels
  • alert deduplication
  • delayed alerting for temporary disconnects

The interviewer then asked:

“What happens if the edge layer itself fails?”

So we discussed:

  • edge HA
  • persistent local storage
  • heartbeat monitoring
  • cloud-side edge health detection

Very operations-heavy discussion overall.


Round 3 — Reliability Engineering (99.999% Uptime)

This was easily the hardest round.

The opening question:

“What’s the difference between four nines and five nines?”

Availability Math

  • 99.99% uptime → ~52 minutes downtime/year
  • 99.999% uptime → ~5 minutes downtime/year

The key point:

Going from four nines to five nines is NOT a linear difficulty increase.

It becomes exponentially harder.


Main Topics Discussed

Active-Active vs Active-Passive

For true five nines:

  • Active-Active is basically required
  • Active-Passive still introduces failover downtime

Fault Isolation

  • Circuit Breakers
  • Bulkheads
  • preventing cascading failures

Deployment Strategies

  • Blue-Green deployments
  • Canary releases

Monitoring

RED Metrics:

  • Rate
  • Errors
  • Duration

Then the interviewer asked:

“Is RED alone enough?”

I said no.

Also need:

  • USE Metrics (Utilization / Saturation / Errors)
  • Business Metrics
  • Synthetic Monitoring

Most Important Follow-Up Question

“Can systems realistically achieve five nines?”

This question was testing whether you actually understand reliability engineering in practice.

My answer:

True five nines is extremely difficult.

To realistically approach it, you need:

  • multi-region active-active
  • multi-active databases
  • elimination of all single points of failure
  • fully automated failover

More importantly:

Not every feature actually needs five nines.

Critical operations (payments, telecom core systems, etc.) may require it, while other services can target four nines.

That tradeoff discussion mattered a lot.


Overall Impression

Telstra interviewers think very differently from typical internet-company engineers.

They care less about:

  • fancy architecture diagrams
  • theoretical scalability

And much more about:

  • failure scenarios
  • degraded network conditions
  • operational resilience
  • real-world reliability

This interview process felt very close to actual infrastructure engineering work.


Advice for Future Candidates

  • Always discuss failure scenarios in system design
  • Don’t only explain the happy path
  • Be comfortable talking deeply about tradeoffs
  • Prepare reliability topics beyond surface-level definitions
  • Telecom-scale systems are genuinely massive — millions of devices and unstable networks are normal assumptions

Interview Preparation Experience

Honestly, the reliability round was the one I felt least prepared for.

Questions like:

  • exact four-nines vs five-nines downtime numbers
  • RED vs USE metrics
  • failure isolation tradeoffs

can easily expose weak preparation if you only studied system design superficially.

Before the final round, I used Interview Aid for VO interview assistance.

What helped most was that they followed the interview flow in real time and provided guidance during the deeper reliability follow-up questions. It wasn’t generic AI-generated advice — the mentors actually understood what the interviewer was trying to evaluate and what kind of answers telecom/system reliability interviewers expect.

For interviews with heavy follow-up depth like this, having experienced engineers help you structure your thinking makes a surprisingly big difference.

Top comments (0)