Imagine working on several programs that together must route messages across 5 devices in less than 50ms!
That's what I was working on at my workplace for the last couple of months.
A bit of context here- we work on Backend data transfer technology that routes data across hundreds of kilometres using optical fibre.
There comes a time when those links fail and we must route that data through an alternative path. Because we are dealing with terabytes of data every second, this must be done quickly; within milliseconds.
This is how the flow works:
1) The framing device raises an alarm when it stops receiving data.
2) The switching device processes this alarm and sends relevant information about the connection to the switching device of the alternate path.
3) The alternate switching device starts processing data.
All of these steps have to be completed within 50ms. Needless to say, our solution worked but not within the intended timeframe and that's when we got to debugging.
Since our switching devices communicate using ethernet packets, we had to determine whether the number of packets transmitted made any difference. Surprisingly, it did. We expected the message to be broadcast across the system rather than sent specifically to a single device, but that wasn't the case.
Debugging the system
In my opinion, the best way to debug any system (especially embedded ones) is to break them into the smallest possible units and thoroughly examine each one.
We started by adding timers to the operations essential to the switching mechanism in all the devices. Since we were using C, we used the built-in timer.
This method highlighted two issues- the processor we used had too many operations going on and there was an avoidable time delay when the switch received the data and processed it.
How did we resolve it? Well, the first was fairly straightforward. Since the switching mechanism had different threads, our first approach was to keep the process out of the general scheduler and have a single core work on it. However, our processor had only two cores which meant all the other processes hogged up the first core and lesser urgent but important tasks weren't completed on time.
The next best solution was to increase the thread priority which worked reasonably well.
The second issue was challenging to solve because we had to dive into the switch manufacturer's code and figure out how they sent ethernet packets.
After a week of going through the code, we found out that messages received by the switch were stored in a queue (an elegant solution, realization should have struck earlier IMO).
Now, there are two ways to retrieve messages from the queue- either the processor polls or the queue pushes. By default, our processor was set to polling which explained the time delay between receiving and processing the messages. We switched to an interrupt/push-based mechanism and voila! It worked.
Although the above optimizations did work, what made the program execution faster was using highly optimized algorithms and structures like hashmap. Their effect was noteworthy with times reaching as low as 8ms!
Top comments (0)