First, if you are interested in doing real time process control on Linux, go watch this awesome presentation. This is way more important than reading my post! Do it now!
Where was I when all this happened?
From the casual way Sandra speaks of SCHED_FIFO, I feel like this is something just about everyone in the Linux world has known and used in for the last 10 years except for me! Sadly, I am just hearing about it now, but happily I am hearing about it! Here is a quick report on what I have learned.
First, what do I want to do?
The goal of this entire exercise is to run my linux-based AuraUAS autopilot controller application at a solid 100hz. This gives me a time budget of 10 milliseconds (ms) per frame. Note: AuraUAS is not related to the ardupilot or px4 flight stacks, it has a completely independent development history and code base.
Baseline (Naive) Performance Results
I start by running my autopilot code on a beaglebone (single core Arm Cortex @ 1000Mhz) with a relatively slow micro sd card. The slow SD card has a significant impact on the portion of my app that writes the log file. Note: the AuraUAS autopilot code is single-threaded for logical simplicity, easier code validation, avoidance of tricky thread-related bugs, etc.
By default linux uses something called SCHED_CFS (completely fair scheduler.) This scheduler tries to maximize cpu utilization, interactive response times, and give every process an equal opportunity to run. It is a good default scheduler, but obviously a bit more fair than I would like.
Launching my app with default priorities and scheduling options, I get an average frame time of about 7 milliseconds (ms.) I get an occasional big miss in the logging module where it can hang out there for 0.2, 0.3, 0.4 seconds (200-400 ms), presumably stuck writing to the SD card. Even many of my sub-modules have surprisingly variable time consumption. Some of these occasionally get stuck longer than the 10 ms time budget. This is about as lousy as you can get. No one wants random time delays mixed with a wide spread of time jitter delay for the average case. This is the whole reason you don’t use linux (or python) for real time process control.
As a side note, the PX4 flight stack is an example that shows how naive use of a pretty good real-time system (Nuttx) can also lead to similarly non-deterministic timing intervals (possibly even worse than stock linux.) In the end it pays to know your system and focus on timing if you are doing real time process control. Threads and real-time operating systems gain you nothing if you don’t use them strategically. (I don’t mean to be critical here, PX4 also does some amazing and beautiful stuff, they just didn’t care much about timing intervals, which leads to much worse results than you would expect if you actually measure their performance.)
For extra credit, go watch this awesome video presentation by Andrew Tridgell that explains some of the amazing work they are doing with ChibiOS to achieve extremely tight interval timing results. Tridge also includes some information on why Nuttx couldn’t meet their real-time objectives (really interesting stuff for those that care about these things). Go!
The Linux “chrt” Command
While watching Sandra’s presentation, she referenced something called SCHED_FIFO. I paused the video and googled it to determine that it was ‘a thing’ and how to spell it. After the video finished, I sat down and figured out how to invoke processes with FIFO / real time priority. Doing this gives me much better results than I could acheive with the default linux scheduler. The command is chrt (CHange Real-Time attributes:)
sudo chrt -f 99 <my command and options>
The -f says run this with FIFO scheduling, and 99 is the highest
possible system priority. If I run
top in another window, my
process now shows up with a special ‘rt’ code in the priority column.
Now my process is running at a higher priority than those pesky dumb
kernel worker threads. Essentially as long as it has something to do,
my app will run ahead of just about everything else on the system.
(There are still a couple things with similar top rt status, so I
don’t quite ever have exclusive control of the CPU.)
When I run my app this way, all the timings become rock solid, much more like I would expect/hope to see them. All my sub-module timings are showing exactly what I want, no over runs or unexplained extra long intervals periodically. The only exception to this is the logging module. I see that periodically gets stuck for 0.6-0.7 seconds(600-700 ms), even though the overall timing of the system is much, much better, the worst case has now become worse. It is also very clear this is happening exclusively in the logging section when writing to a relatively slow SD card. All the other random/unexplained sub-module delays have gone away.
Writing the log file in a separate process
Most people would simply push their logging work out into a another separate thread, but I have a philosophical aversion to threaded application architectures as part of my every day program design.
Conveniently, a year ago I had setup my code to optionally spew log messages out a UDP/socket port and then wrote a 10 line python script to suck in those UDP packets and write the data to a file. (I have been thinking about the occasional performance hit I take during logging for quite some time!)
Next I activated this remote logging feature, and tested my main
autopilot app with the same chrt invocation. I start up the logging
script with the normal CFS scheduler but at a slightly lower than
default priority using the
Now without needing to write to a log file ever, my autopilot app is pegged at exactly 100.0 frames per second. The average main loop time is 5.56 ms, and the average wait for the sync packet from the sensor head (FMU) is 4.43 ms. The worst case main loop time interval is 9.79 (ms) still just inside my time interval budget.
Once in a while the kernel still figures out something to do that interrupts my high priority process, but the overall performance and interval timing has now met my original goal. This is really exciting (at least for me!) The separate log writing process seems to be keeping up while running in the slack time. The kernel worker threads that handle uart communication and file IO seem to be happy enough.
So all in all, chrt -rf 99 (SCHED_FIFO @ highest real-time priority) seems to be a really good thing. When I push the log writing work out to a separate process, it can eat the blocking/wait time of a slow SD card. Over all, app timing really tightens up and I’m able to meet my real-time process control goals without ever missing a frame.
I can’t end this post without mentioning that AuraUAS is heavily invested in running python code inside the core flight controller main loop. The flight controller strategically mixes C++ and python code modules into a single hybrid application.
I have intentionally chosen to use some non-traditional approaches to make my code simpler, easier to read, and more flexible and extensible. This project has evolved with education and research as one of the core priorities. Even with a significant amount of python code executing every frame, I am still able to achieve solid 100 hz performance with consistent frame intervals.
My overall goal with this project is to combine: the power of linux,
the ease and simplicity of python, and inexpensive hardware to create
an open-source UAV autopilot that performs to the highest standards of
accuracy and timing. The linux
chrt command is a huge step forward
to achieve precise timing intervals and never missing a frame
2018-03-18 10:48:24 -0500 - Written by curt