Low Latency Scheduler (RoboComRT)

low latency scheduler

Introduction

RoboComRT implements a near real-time low latency scheduler for execution on low cost commodity embedded or desktop Linux operating systems and hardware. In the process of developing multi-threaded applications of moderate computational complexity, we have developed a software architecture that executes on a generalized Linux operating system distribution that provides near real time deterministic behavior and low latencies. 24×7 reliable operation has been tested for days on end using epochs as low as 10 milliseconds. This is accomplished with very minimal platform tuning.

We have been able to adapt this generalized software architecture for use in different applications ranging from Linux daemons that execute on hardware server platforms with dual processors with multiple cores, to low end multicore embedded systems. This software architecture is significantly more cost effective than a real time operating system (RTOS) based implementation in terms of operating system costs, software development tool costs, hardware costs, and software development costs. Software development for embedded systems is moving towards Linux over RTOS implementations as embedded Linux has some key advantages in cost and scale and can provide significant advantages in many situations.

Low Latency Scheduler Core

The Linux operating system supports various scheduling policies. The kernel decides which is the next runnable thread to be run by the CPU. The kernel maintains a list of runnable threads. It looks for the thread with the highest priority and selects that thread as the next thread to be run.

The following definition of real time sets the stage for discussing real-time Linux architectures.

A real-time system is one in which the correctness of the computations not only depends upon the logical correctness of the computation but also upon the time at which the result is produced. If the timing constraints of the system are not met, system failure is said to have occurred.

Near real time behavior is accomplished using the RoboComRT software architecture without having to use a custom Linux kernel, or a stripped down embedded Linux distribution. The software architecture decomposes the application into a number of worker threads or tasks. Our goal as developers is to use the minimum number of separate threads required to reduce the application complexity. The separate threads allow for concurrent performance of functions or tasks, and for prioritizing real time threads over others which do not have equal real time priorities.

In our designs, we typically have one specific thread that above all others has the most stringent real time requirements to meet, and this thread is given the highest thread priority from the perspective of the Linux scheduler.

Example usage of this software framework includes a networked TDMA controller scheduling RF spectrum access to a network of radios at 50 ms epoch rates, or smart bridging for an embedded networked radio which occurs at data rates limited to the bandwidth of the radio.

To achieve near real time performance, not only is the software decomposed into separate threads, but each thread can be pinned to a specific CPU core(s). CPU core pinning, referred to as CPU or processor affinity, coupled with CPU core isolation are the two mechanisms behind CPU shielding. CPU shielding is a practice where on a multiprocessor system or on a CPU with multiple cores, real-time tasks can run on one CPU or core while non-real-time tasks run on another. In our design, thread priorities and CPU core affinity values are soft coded in a local configuration file that is read at program boot. This allows the software architecture to be tuned for the specific hardware platform upon which it executes.

The software architecture is deterministic and can guarantee timing behavior measured in units of milliseconds in the face of varying loads. While real time is about predictability, from the performance perspective, RoboComRT when executing on a multicore platform with sufficient resources has proven to be fast with low message latencies.

Base Operating Systems

The operating system base for RoboComRT, which is developed cross platform, could be either Windows or Linux, in either desktop or embedded implementations. Due to the superior underlying time precision and accuracy of the Linux platform relative to Windows, we prefer the Linux operating system for our implementations. Linux also provides an environment where there is a broader range of real time thread priorities and supports CPU shielding – both important technologies for achieving near real time behavior on a non-custom generalized operating system that is not running under an embedded RTOS. This does not preclude operating under Windows, and Windows can be supplemented with additional software/hardware to provide higher levels of time synchronization, but the timing fidelity and requirements of operation on the Windows platform will likely be less than what is achievable under Linux. Linux operating system variations supported include both RedHat and Debian based generalized and embedded Linux distributions.

System Reliability: Multithreaded Issues & Resource Locking

In computer science and real time systems, priority inversion is a problematic scenario in scheduling in which a high priority task is indirectly preempted by a lower priority task effectively “inverting” the relative priorities of the two tasks. This violates the priority model that high priority tasks can only be prevented from running by higher priority tasks and briefly by low priority tasks which will quickly complete their use of a resource shared by the high and low priority tasks. The trouble experienced by the Mars lander “Mars Pathfinder” is a classic example of problems caused by priority inversion in realtime systems.

In computer science, non-blocking synchronization ensures that threads competing for a shared resource do not have their execution indefinitely postponed by mutual exclusion. A non-blocking algorithm is lock-free if there is guaranteed system-wide progress; wait-free if there is also guaranteed per-thread progress.

Multithreaded applications typically use some form of resource locks around shared data accessed by multiple threads or for thread synchronization. This resource locking is the source of many real time system reliability issues. To avoid these types of issues, RoboComRT uses concurrency collections to avoid resource locks. Concurrency collections provide thread-safety using lock-free synchronization mechanisms.  Refer to the post Lock-Free Programming for more information on this topic.

For platforms where lock free implementations are not possible from the platform CPU through atomic operations, we use priority inheritance mutexes to minimize the effects of priority inversions

Architecture Implementation

RoboComRT does not live in isolation. As a software server that is part of a broader network, it provides support for upstream control and status applications (e.g. SNMP Management, web status and control, and a custom binary messaging interface). Similarly, the software application provides support for downstream interfaces to controlled software components on the network.

The following paragraphs describe the various interfaces and features supported by the RoboComRT software framework.

Bootup & Runtime Configuration: Initialization & Dynamic Configuration via XML files

Applications will typically need initial boot up configuration and then runtime dynamic configuration. We favor the use of Extensible Markup Language (XML) documents for configuration as it provides validation ranges and settings for parameters. Our software framework provides World Wide Web Consortium (W3C) validation of read configuration files based on an XSD (XML Schema Definition) that is created for the configuration file content. An XSD file specifies how to formally describe the elements in an Extensible Markup Language (XML) document.

RoboComRT supports initial boot up configuration using a local configuration file typical of Linux based daemons. The content of the local configuration file settings are viewable from the web interface, and may be changed through the web interface or manually editing the local configuration file, and then rebooting the application.

To support dynamic configuration, the application supports importing configurations in XML format using an internal FTPS/FTP client. The SNMP interface or web interface can be used to specify the URL where the application will retrieve a dynamic configuration file from, which is resident on the FTP file server. The application will retrieve the dynamic configuration, verify that the contents of the file are W3C valid, and the implement the dynamic configuration specified in the loaded file.

Network Discovery: ICMP Interface

RoboComRT provides the means for it to be discovered by a manager in the network. This is accomplished by supporting IPv4 ICMP Echo Requests and Replies.

Upstream Management: SNMP Agent

Simple Network Management Protocol (SNMP) is an Internet-standard protocol for managing devices on IP networks. SNMP is a component of the Internet Protocol Suite as defined by the Internet Engineering Task Force (IETF). It consists of a set of standards for network management, including an application layer protocol, a database schema, and a set of data objects.

Simple Network Management Protocol (SNMP) is an Internet-standard protocol for collecting and organizing information about managed devices on IP networks and for modifying that information to change device behavior. It is used for collecting information from, and configuring, network devices, such as servers, printers, hubs, switches, and routers on an Internet Protocol (IP) network.

RoboComRT provides an SNMP agent interface to allow for network management from an upstream SNMP manager. The SNMP agent interface supports SNMP scalars, SNMP read-only and read-write tables, and notifications from the agent to the SNMP manager via SNMP traps. Security for the SNMP communications utilizes the security built into the SNMP v3 protocol.

SNMP agents expose management data on the managed systems as variables. The variables accessible via SNMP are organized in hierarchies. SNMP uses an extensible design which allows applications to define their own hierarchies and metadata. These hierarchies, and other metadata (such as type and description of the variable), are described by management information bases (MIBs).

We have developed custom MIBs to allow our SNMP agent interface to implement custom application specific functionality. We have also created SNMP custom interfaces based on legacy MIBs.

RoboComRT can be customized to provide support for any MIB interface, and we can help create a custom MIB.

Upstream Management: RESTful APIs

To provide command and control of a networked device, RESTful web services or RESTful APIs are becoming a popular technology as an alternative to SNMP Management. RESTful APIs are application programming interfaces that adhere to style of the REST architectural pattern.

RESTful APIs define a set of functions which developers can perform to request and receive responses via the HTTP protocol such as GET and POST. Because RESTful API’s use HTTP, they can be used by practically any programming language and are easy to test.

RoboComRT supports custom RESTful APIs for command and control as an alternative or to supplement an SNMP command interface.

The REST API interface uses the HTTP protocol secured using Basic Authentication over TLS/SSL for upstream command, status, and control.  Both REST API GET and PUT operations are supported with read-only and read-write content based on user assigned role.  User roles for REST API access is provided by two broad categories: users that have read-only access to REST API content (GET operation) and those with read-write access (GET and PUT operations).

External File Exchange: FTPS/FTP Client

Our application provides an integral FTPS/FTP client that permits importing and exporting dynamic configuration content, and the exporting of the current log file to an external FTP server. Commands issued from either the SNMP interface or the web interface are used to initiate FTP file transfers. Protocols supported are FTPS for secured transfers based on Transport Layer Security (TLS) and Secure Sockets Layer (SSL) cryptographic protocols  or non-secured FTP access. The FTP client can be configured to lock down the use of the secured FTPS protocol only.

FTP client support provides the facility for dynamic configuration of the application using content staged on a file server. Dynamic configuration retrieval facilitates viewing the current executing configuration. Export of the current log file provides remote diagnostics support.

Upstream Control: Near Real-time Messaging

The SNMP protocol is based on a request-response model, and therefore is not the optimum protocol to use for near real time command and control. SNMP also has a fair amount of messaging overhead making it less efficient that other protocols.

RoboComRT supports an efficient message based interface to provide more real time messaging as an alternative to SNMP command, control, and status. This interface supports custom messaging using either the UDP or TCP protocols. The content of the message binaries is custom for each use case, but our current messaging provides a template for the types of messaging and can accelerate future developments of this interface. This interface supports responding to incoming messages with responses. In addition, unsolicited message can be generated without requests, such as when generating asynchronous reports (e.g. at 10 times per second). The use of this interface provides an alternative to SNMP when more timely messaging is required without the protocol overhead of SNMP.

Downstream Control: Custom Control Device Interface

RoboComRT supports controlling downstream network elements using network messaging. This messaging will typically consist of custom binary messages in packed format that are delivered to the controlled devices either by way of the UDP or TCP protocols. When using the TCP protocol, the messaging can be made secure using TLS/SSL encryption. The decision on which protocol to use depends upon the use case, and whether the smaller overhead and timeliness of UDP is preferred over the reliability of TCP. UDP is an excellent protocol for control applications and status reporting, where late arriving status is considered old status. Typically, the downstream controlled network devices from this architecture is the most real time critical aspect of the application. Using this interface, downstream controlled elements can be provided command to execute, or status information can be requested. This interface supports status messages generated autonomously from the downstream devices such as device status.

HTTPS/HTTP Web Server

RoboComRT features a secure HTTP(S) 1.1 web server for external control, status, and monitoring,

The web server provides web content designed using the responsive web design (RWD) approach (using Bootstrap) which supports a wide range of devices from desktop computer monitors to mobile phones.

The web interface provides basic browser authentication over SSL and role based access to content. User roles currently supported are base user, administrator, and super-user.  However, role based access can easily be extended to include additional user roles, and per page access security per role.

The web interface provides a full range of functionality including viewing current configuration, importing and exporting dynamic configuration via an integral FTPS/FTP client, viewing the current log file and setting the log verbosity runtime, loading SSL PEM files, viewing runtime statistics, creating and managing both web and SNMP users, etc.

The web framework architecture is inspired by Java scriplets, with the application code handlers written in C++ separate from the web HTML content, separating web presentation from web application programming. This framework is modular and easy to extend for additional functionality.

The web interface provides the means for a user to interact with the software via a web-based Graphical User Interface (GUI). The software will receive inputs from a user for basic direct configuration, control, status, and debug.

This interface adheres to Transport Layer Security requirements by implementing HTTPS. The software provides a built in SSL certificate and private key file. This SSL key material may be replaced using the web interface with a replacement certificate and private key file. The web interface supports key replacement using a separate certificate and key file, or standard SSL Privacy Enhanced Mail (PEM) files, which is the concatenation of the certificate and the private key file as one file.

The web interface provides the mechanism to control access to the web interface by having the ability to enable and disable the HTTP and HTTPS protocols in any combination. The ports used for the HTTPS/HTP protocols are likewise soft configurable.

The HTTPS/HTTP web server that is integrated and combined with the application itself in a one tier architecture. One-tier architectures involves putting all of the required components for a software application or technology on a single server. This embedded web server allows us to not use a database or inter-process communication (IPC) as an intermediary between web server and application, integrating logging and setting of web related thread priorities, simplifying deployment and interface complexity, decreasing latency and enhancing response timing.

Time Synchronization

Multi-node time synchronization can be an important requirement for a network based on individual software/hardware time synchronization to a clock master, such as GPS time. RoboComRT supports time synchronization using the Precision Time Protocol (PTP), and can act as either a PTP slave for an external timing master, or act as the timing master to other components on the network. When operating as as PTP slave, our software will process IEEE 1588-2008 messages and update the local system clock in an incremental fashion to slowly move toward the correct time as specified by the IEEE 1588-2008 messages.

The Precision Time Protocol (PTP) is a protocol used to synchronize clocks in a network. When used in conjunction with hardware support, PTP is capable of sub-microsecond accuracy, which is far better than is normally obtainable with NTP. RoboComRT supports PTPv2 according to the IEEE standard 1588-2008.

The Time Synchronization Software module indicates through the web interface the  whether or not the system is currently synchronized to the external time reference, the current jitter relative to the master, and the largest offset from the clock master. The web interface allows clearing out accumulated values so that analysis can begin from that time forward. The application provides configurable support for notifying an SNMP manager of the status and state of clock synch by way of SNMP traps.

Built In Test (BIT)

Built-in-Test (BIT) is an invaluable component of modular, embedded systems that are used for critical applications such as avionics mission systems, sensors, and weapons. BIT provides a level of confidence in the correct operation of each software component and module at both power-up and during normal operation.

RoboComRT performs a power up BIT, a periodic BIT, and an on demand BIT. Continuous or periodic BIT (CBIT) is performed while the application is active. BIT is a comprehensive set of tests of software functionality extending as close to the edge of our software as possible. On demand BIT can be invoked from commands from the SNMP Manager, from the web interface, or configured to occur periodically.

BIT in test consists of verification that all dependant daemons on the platform are running and single instanced. In addition, BIT will verify the status of error accumulators that provide running counts of issues that he running application has detected from execution categorized into fatal, critical, error, and informational categories. Local configuration is provided specifying error thresholds for the various error categories and specifying the application’s behavior upon detecting BIT failure.

BIT reporting of results in available from the web interface and through SNMP tables for the SNMP Manager. The web interfaces displays the history of BIT errors and allows the web user to clear out the logged entries so that new content can be viewed. RoboComRT can be configured to report BIT failure to the SNMP Manager asynchronously by way of SNMP traps.

Diagnostics

Among the most important tools that application developers need are those which provide application debugging information. RoboComRT was written to provide a high level of diagnostics with the express purpose of locating problems with the software, hardware, or any combination thereof in a system, or a network of systems. The following paragraphs describe the diagnostics built into the software architecture.

Standard Linux Logging

RoboComRT supports standard Linux logging to a log file that is viewable using standard Linux command line and GUI based log viewer tools. A log file provides a timeline of events for the application. The log file produced is plain text to make it easy to read and includes a timestamp field and a category field. The verbosity of the logging is settable from both the web interface, SNMP interface, and boot-up and dynamic XML configuration.

Built into the log file module is internal log rotation based on the size of log file generated rather than what is typical with the Linux log rotate utility which is based on time, not size. Log rotation refers to the practice of archiving an application’s current log, starting a fresh log, and deleting older logs. The Linux operating system usually runs logrotate once a day, and when it runs it checks rules that can be customized on a per-directory or per-log basis. Internal log rotation sets a maximum size on an individual log file, and administers the maximum number of log files that are preserved. This supports the software running unattended without requiring any form of external administration or maintenance.

With internal log rotation, the beginning of each log file contains a header which contains a description of the test conditions that resulted in the log file, such as the software version, a platform hardware summary, the platform software environment settings, program configuration settings such as logging level, etc. This allows each individual log file to contain a description of the conditions associated with the log file making the log file self descriptive. Providing a descriptive header to each log file is not possible using standard Linux based log rotation.

Streaming Graphical Output

Our application architecture supports a real time streaming interface to our RoboComNV network visualization application. The software is instrumented to drive this generalized viewer application with content including communications nodes and node interconnects on a 2D dimensional layout view. The application creates and generates information that is plotted on a series of 2D line graphs. Since RoboComNV is a generalized viewer, the software architecture can be customized to display any combination of nodes and node interconnects and 2D line chart content.

Visualization & Analysis Toolkit

RoboComRT supports the capture of highly precise real time data tracing through the use of our embedded visualization toolkit. The code is instrumented with signal trace information that is captured into fast memory during the trace acquisition phase, and then a file containing this recorded trace information is available for visual analysis post capture.

The toolkit workflow follows  the Verilog or VHDL simulation model of data acquisition into memory storage, followed by post-mortem interactive visualization of the generated trace files.

This workflow differs from the streaming output diagnostics described in the previous paragraph, which is real time, but this toolkit provides much more finer grained detail of signal timing and relationships with microsecond precision.

This toolkit has become increasingly valuable when integrating with other network components.  With the capture of interface signal timing, there is no ambiguity of when and if something happened, as there is an observable record of events that can be captured, remotely transferred and analyzed. The benefits of this toolkit has been proven such that external customers interfacing with our applications based on this software framework, have used the signal capture process to develop their own software, and the captures are used in acceptance test procedures for verification.

Application Deployment

For application deployment to various operating systems, we have developed formal package management installation applications for Windows platforms, desktop and embedded Linux in the form of RPMs or DEBs. The installation applications provide distribution of program binaries, web content, startup and operating configuration, and automatically starts and stops requisite services or daemons.

Example Implementations

The following paragraphs describe projects that were developed using the RoboComRT framework. These projects were broadly based on the communications sector, but the framework is generalized and could be applied to a range of technical domains or sectors.

Example Implementation #1: iNET Link Manager

Among other initiatives, we used the RoboComRT software architecture to create the iNET Link Manager. The iNET Link Manager is implemented as a daemon operating on the Linux desktop platform responsible for providing centralized quality of service (QoS) based TDMA scheduling control of an RF network of radios. The Link Manager is coded to support the cross platforms of Windows and Linux, but has become primarily focused on Linux due to the advantages the Linux kernel provides for real time application execution.

The Link Manager provides near real time highly reliable control of a distributed radio network providing centralized shared channel time division multiple access (TDMA) channel access for the next generation of military test range networked communications. The Link Manager executes as a Linux daemon on a RedHat Linux platform. It features a secure HTTP(S) 1.1 web server and SNMPv3 agent for external control and monitoring, precise time synchronization via Precision Time Protocol (PTP) software, soft dynamic configuration via FTPS client loading of XML content from an external FTP server, and built in diagnostics including generation of real time signal timing via Value Change Dump (VCD) formatted output and standard Linux log files with internal rotation.

The Link Manager, to achieve its real time deadlines (50 ms scheduling), utilizes both multithreading and multi-core approaches to exploit the concurrency in its computational workload. It operates on a Dell R620 server platform with two separate processors, each with 6 physical cores, 12 logical cores utilizing hyper-threading, for a total of 24 accessible cores.

To provide for 24×7 100% reliable scheduling, the Link Manager platform benefits from performance tuning. Performance tuning of the Link Manager platform consists of run time CPU shielding and real time thread priorities for selected application threads. Changes are made runtime by the Link Manager based on thread priorities and CPU cores available on the hardware platform. The Link Manager utilizes explicit threading by task/functional decomposition rather than data decomposition.

The Link Manager utilizes concurrency collections for creating table content which is concurrently accessed from a thin client interface, secondary threads, and a custom binary messaging protocol to an upstream controller.

The custom downstream messaging interface is used to control and monitor ground and airborne radios, providing transmit opportunity messages to the radio to coordinate access to the shared RF spectrum, and to receive queue status and link metrics from the radio necessary for making scheduling decisions.

Refer to the news post iNET Link Manager for additional information on this completed program.

Example Implementation #2: Anti-Jam Line of Sight Maritime Radio Network Stack

We developed the network stack for an R&D effort to create a low latency high throughput software defined radio (SDR). This stack was based on reuse of the RoboComRT software platform and operates as a Linux daemon on Xilinx ZC706 dual-core ARM based hardware platforms operating under embedded Linux. This solution highlights the fact that the RoboComRT software architecture can scale down to a more resource constrained embedded Linux platform.

The software implements smart bridge/firewall technology which limits Ethernet Layer 2 packets transmitted over the RF interface from the wired interface to only packets destined for known host devices passively learned from the network, packets necessary for network discovery and management, or packets filtered by Ethernet frame type.

The software stack provides SNMP agent support with compatibility to legacy radio and configuration persistence to XML formatted files. The technology includes CPU pinning and real time thread priorities to obtain closer to real time performance on a generalized embedded Linux operating system distribution which provides support for user data rates up to 100 Mbps.

The headless daemon provides a full featured and mature web interface for command, control, and status.

Secured interfaces include support for the HTTPS, FTPS, and the SNMPv3 protocols. Distribution, configuration, and updates of the software components to various hardware platforms are by way of a developed Debian (.deb) package.

Refer to the post Maritime Radio SBIR Phase 2 for more information on this implementation.

Example Implementation #3: LTE Aggregator

In telecommunication, Long-Term Evolution (LTE) is a standard for high-speed wireless communication for mobile phones and data terminals. Carrier aggregation (CA) is used in LTE-Advanced in order to increase the bandwidth, and thereby increase the bitrate, and is used in both Frequency Division Duplex (FDD ) and Time Division Duplex (TDD) modes.

We are currently working with a corporate partner to provide carrier aggregation to an LTE airborne deployment using external packet aggregation. This design provides increased data bandwidth and allows for uplink/downlink CA channel combinations which currently exceed the 3GPP specifications of 3 downlinks and 2 uplinks per user equipment (UE).

The RoboComRT software framework forms the basis of the development of this standalone LTE Aggregator application which executes as a Linux daemon on a 64-bit Ubuntu multi-core Intel desktop platform. This application is responsible for providing a gateway interface to host facing devices, and bridging host traffic across an LTE network using tunneling/ip encapsulation and best channel selection based on dynamic link/channel metrics and transmit queue status. This implementation uses a RESTful interface rather than an SNMP interface for upstream network management status and control.

Conclusion

The RoboComRT framework provides a near real time framework that can be customized to various different control tasks, whether in communications, finance, document management, electronic medical records, or other – anywhere a low latency reliable control application is needed operating on a generalized operating system distribution on commodity hardware platforms as an alternative to an embedded RTOS. This framework can be customized efficiently to provide a solution to a variety of technical requirements on both desktop and embedded platforms.

The following features are implemented in RoboComRT:

  • The software framework is customized by both boot up and dynamic configuration in XML formatted files with XSD W3C validation.
  • SNMP agent and RESTful API interfaces are supported for upstream command, control and status monitoring. We can develop both a custom MIB and implement support for the custom MIB in the SNMP agent interface or support a legacy or user provided MIB(s).
  • Secured Communications: SNMP communications is secured using the SNMPv3 protocol, HTTPS, FTPS through TLS/SSL.
  • A custom near real time messaging interface is provided when SNMP overhead is not desired for an upstream controller.
  • For the downstream controlled interface, RoboComRT supports custom messaging in either UDP or TCP/SSL formats.
  • An FTPS/FTP client is integral to the software architecture to allow for importing and exporting dynamic configuration content as XML formatted files, and exporting log files for remote diagnostics.
  • A full featured HTTPS/HTTP interface is provided for GUI command, control, and status monitoring. This interface provides a headless interface that can be accessed remotely over the Internet and can be used as an alternative to the SNMP interface.
  • The HTTP interface provides RESTful API support for external command and control requirements using the HTTP protocol.
  • Diagnostic and testing support is providing by way of the generation of a standard Linux log file. Further diagnostics support is provided for use with our visualization toolkits.
  • Time synchronization of the hardware platform is provided via support for the PTPv2 protocol, either as PTP slave or master.
  • A built in test (BIT) suite is provided that is self validating and verifies the integrity of the system software at boot and during operation.