Process Types and Communication
Gerry Tyra December 2015
An avionics suite can be implemented in a great number of ways, using different services, network paradigms, network topologies and messaging systems. It is presented here that, with only minor abstraction, most of the commonly used methods reduce to a few simple cases.
Abstractly, there are only a few ways that any two processes can communicate. Data comes in, is transformed, and data goes out. The question is: from where, to where, in what protocol, and at what rate? The most basic differentiation is between a client/server relationship and a point-to-point relationship. We will look at both in turn.
All servers are not created equal. The nature of data in an avionics system requires that different approaches be used to provide the correct data to the target clients in an efficient manner. In the real world, resources are never infinite, so whatever efficiencies there are must be recognized and exploited as appropriate.
In my opinion, the truth of the previous statement make a lie of most publication-subscription (PubSub) and/or service oriented architectures (SOA) implementations. The weakness of PubSub and SOA systems is their general dependency on multi-cast messaging. As will be discussed below, this can result in saturation of the network, to no good purpose.
The first type of server actually does need multi-cast. This group handles frequently updated global data that is widely used. The Inertial Navigation System/Global Positioning System (INS/GPS) is an excellent example of this type of server. There is a single source of significant data that is widely used by many other applications. Once the INS/GPS server is running and the multi-cast address is available, any other process can get the periodic position data simply by joining the multi-cast group. Since the data is periodic and fairly concise, UDP is the obvious choice for sending a message. The server sends one message, and the client is waiting for one message. If it doesn't arrive, the next one will shortly.
Now consider adding other “legitimate” multi-cast servers, say the engine(s) status, radar warning, etc. Each of these message groups needs to be distributed to some number of clients for the overall system to function correctly. But, those clients aren't all the same!
It is now appropriate for short digression into how multi-cast works.
When a server creates a multi-cast group, each switch, directly, or indirectly, connected to the server creates a routing table for the IP address associated with the multi-cast group. Any client that joins the group causes the switches between the server and the client to make entries in their respective routing tables for the group. In this way, when the server sends a message to the multi-cast group, the switch connected to the server replicates the message and sends a copy out on every port that leads to a client of the group. If two or more servers use the same group IP address, even with different port numbers, all of the messages will be routed to all of the clients.
The other option is to assign a different IP address to each multi-cast server. In IPv4, there are 8M available multi-cast addresses. But, each of these addresses requires its own routing table in every switch. Not only does this create potential memory problems in the switches (i.e., how many addresses before you saturate the switch), there is the CPU cycle cost sorting through the tables in what is most likely set up as a sparse array. This is just a few more clock cycle on each message, but it adds up. Then, in a SOA, there is the issue of tracking what services are on what addresses. In a PubSub system this can be accomplished by multi-casting a notification of which services/servers are available and on what address. But this is more clock cycles in both the CPUs and on the network.
Just to complicate matters further, consider what happens when something like Open Mission Systems brings in a virtual machine (VM) environment. The VM has to have its own virtual network interfaces and switch, with all of the associated routing tables. And this switch runs at software speeds, not dedicated logic speeds. Even if a specific port in a group is not used in a particular VM, the virtual NIC's stack still has to process and filter the messages.
So when appropriate, multi-cast is a simplifying blessing. Without supporting system analysis, multi-cast for everything is somewhere between irresponsible and just stupidly wasteful. One size does not fit all.
If multi-cast isn't global, then there must be the option of an Uni-cast Server. I would argue that there are four basic types of servers that use uni-cast messaging:
A file server, such as FTP or HTTP. The server receives a TCP connection and a request to send or receive a block of data. TCP assures a long transfer is intact and properly sequenced at the receiving end.
A reliable UDP access server, such as a SQL database. The client wants to read or write some discrete unit of data. Reliable UDP is used to keep the packet format, vs. TCP stream format, with the assurance of an acknowledged message. This also applies to something like a server that provides aperiodic commands. It is only sending to one or two clients, but you really want to verify that the message got through.
Some applications can exist, your mileage may vary, where the client can tolerate non-reliable requests. This would require the server to periodically send data to all established clients, or have the clients periodically request the data. But to be honest, off the top of my head, I can't think of a case where this would be appropriate that isn't already covered by a multi-cast server.
A command server, using SNMP or UDP to control various remote modules. The modules may or may not be subscribing clients, but the servers knows that they are there and sends commands, such as power state commands, as appropriate.
Any client has to be able to talk to the appropriate type of server. However, with the concept of KISS, if the Manager provided the connection information, then the MessageHandler can abstract all of the differences. Again, your mileage may vary, caveat emptor.
One of the questions that must always be asked about a “string of pearls” design is: Why? If it is really linear, how is the granularity justified for the individual processes/services? This can be done. Actually, it is critical that it be done, because it means that you are actually analyzing your data flow and the transformations that occur at each step.
In any case, most point to point data in an aircraft will be via UDP, reliable or not is a function of the data type. Command and control data should definitely be reliable. Periodic data may or may not be sent without the need for an acknowledge. It is all a function of the data type and the ability of the receiving algorithms to bridge missing data.
It needs to be pointed out that while a “string of pearls” makes for a very nice analogy to the flow of data, in the real world the data flow tends to look more like makramay. As long as the data flows consistently in one direction, it is possible to record the state of a process so that it can be restarted. However, if you have looping data, the infinite impulse response can make a clean restart problematic.
One other attribute of a communications link is its potential use of redundancy. For the purposes of this paper consider three levels of use, primary, secondary, and backup.
To appreciate the differences. Consider the primary service/process, and its primary data links, are up and running. It is receiving, processing and sending data. This data may, or may not, have latency criteria. This is not to say that an aircraft will fall out of the sky with the first missed data frame, but late data is assumed to impair overall system performance. If the primary data input or output were to fail, or appear to fail, providing the same data on the secondary connection allows the chain of processes to continue to meet its latency requirement without interruption. But until the primary connection is questioned, the data on the secondary connection can be ignored, or at most, checked for validity.
A lower bandwidth solution is to provide a backup channel, which, once connected, only sends a “heart beat” message to confirm and maintain the connection. This has the advantage of lower bandwidth, but the disadvantage of having to be enabled to transmit a full message set.
Depending on how critical the data is, all three data interfaces can be provided. The primary and secondary operate as described. In the event that either the primary or secondary connection fails, the backup is activated to replace the failed link.
The other consideration is the state of the primary process. Consider the difference between a Finite Impulse Response (FIR) and an Infinite Impulse Response (IIR). If the the primary process were to fail, can the backup take over in an acceptably short period of time (FIR), or is an accumulation of past data critical to resuming correct operation (IIR)? If a given process is a FIR, then a backup process can be created and put to sleep, not waking until the primary has been detected as failing. However, in the IIR case, a secondary process is needed, which is synchronized with the primary, but it either does not output data, or the various consumers ignore that data.
There are many valid approaches to distributing data within an airframe. Many are applicable on the same aircraft for different data types. One size rarely fits all.