You walk into a hospital building. Full bars on your phone. You try to place a call. Nothing. You try again. Failure. A third time. It finally connects.
The network planning dashboard says this site has 99.8% coverage. The field data tells a completely different story: 40% of Random Access Channel (RACH) attempts are failing.
Welcome to the RACH trap. And if you are an RF engineer, you have probably seen this more than once.
What Is RACH and Why Should You Care?
Before your phone does anything on a cellular network (call, data session, handover), it needs to perform a Random Access procedure. Think of it as a handshake. Your device sends a preamble on the RACH to the base station, saying: "I am here, let me in."
If that handshake fails, nothing else works. No call setup. No data bearer. No handover. The user sees full bars but experiences dead air.
The critical insight: coverage (RSRP/RSRQ) and access (RACH success rate) are two completely independent metrics. You can have perfect coverage and completely broken access.
The Hospital Case: Debugging Step by Step
Here is the scenario. A 4G site near a major hospital. The OSS dashboard shows:
- RSRP coverage: 99.8% above -110 dBm
- Call setup success rate: reported at 96%
- No alarms, no outages
But user complaints keep coming. Dropped calls. Failed connections. Especially inside the building.
Step 1: Capture at the UE Side
The first thing we did was stop looking at the dashboard and start capturing Layer 3 messages directly from the device. The difference between what the network thinks is happening and what the device actually experiences is where every real diagnosis starts.
We captured:
- RRCConnectionRequest attempts and responses
- RACH preamble transmissions and retransmissions
- Timing Advance values at connection setup
Step 2: Identify the Pattern
The data revealed something the counters never showed:
- 40% of RACH attempts needed 3+ retransmissions before succeeding
- Timing Advance values were abnormally high (12-15 instead of 0-3 for an indoor scenario)
- The serving cell was not the nearest cell. The device was accessing an overshooting cell 2.3 km away
Step 3: Correlate with RF Conditions
This is where most debugging stops too early. Good RSRP does not mean good access. We measured:
- RSRP: -85 dBm (excellent on paper)
- SINR: 3 dB (terrible, indicating interference)
- Number of detected cells: 7 (massive pilot pollution)
The device had strong signal from a distant cell but could not reliably access it because of interference from six other cells at similar power levels.
Step 4: Root Cause
The overshooting cell had:
- Antenna downtilt set 2 degrees too high
- No neighbor relation with the actual closest cell
- RACH power ramping configured with default parameters (not adapted to high-interference environment)
Three configuration errors. Zero alarms triggered in the OSS.
Why Dashboards Miss This
OSS platforms aggregate. They show you averages over 15-minute or 1-hour windows, across all users on a cell. A 40% RACH failure rate for users in one specific building gets diluted into a 96% cell-wide success metric.
Field debugging with UE-side diagnostic tools captures what actually happens at the device level. Every preamble, every retransmission, every timing advance value. That is where you find the root cause.
The gap between "network says fine" and "user says broken" is almost always a Layer 3 problem that only shows up in per-device, per-attempt analysis.
The Fix and Validation
After adjusting the antenna tilt (2 degrees down), adding the missing neighbor cell, and tuning the RACH power ramping step to 4 dB, we re-ran the field test:
- RACH first-attempt success rate went from 58% to 97%
- Average preamble retransmissions dropped from 3.2 to 0.4
- User complaints from that area: zero in the following two weeks
Total time from diagnosis to fix: 4 hours. Total time the problem existed before field debugging: 7 months.
The Takeaway for Engineers
If your coverage KPI says everything is fine but users complain, stop looking at coverage. Start looking at access. Specifically:
- Capture Layer 3 at the device, not at the network counters
- Check RACH retransmission rates, not just success rates
- Measure Timing Advance to verify the device is accessing the right cell
- Count detected cells to identify pilot pollution
- Correlate RSRP with SINR. Good signal with poor quality = interference problem
This is the methodology we use on every field campaign. The tools matter, but the approach matters more.
I write about RF field debugging, Layer 3 analysis, and the gap between what dashboards show and what actually happens on the air interface. Follow the Signal Hunters newsletter for weekly field cases like this one.
Takwa Sebai, Co-founder & CEO at HiCellTek
Top comments (1)
One thing I left out of the article: we initially tried tuning RACH parameters (powerRampingStep, preambleTransMax). It only moved the success rate from 58% to 72%. The real fix came from eliminating the overshooting cell.
Quick rule of thumb for field engineers: if Timing Advance > 5 for an indoor user served by a local cell, stop tuning parameters. You have an overshoot problem.
Have you seen similar RACH traps in campus or industrial environments? Curious about your debugging approach.