Process Management: Static vs. Dynamic
Gerry Tyra November 2015
When an embedded system needs to launch multiple processes at power up, the startup mechanism can be static or dynamic, with various degrees of dynamic. Since one size does not fit all, this paper discusses the trade-offs implied by this spectrum.
The startup of a complex system can be chaos or ordered, simple or complex. The “best” solution depends on the target environment. The intent of this paper is to look at this spectrum of solutions and try to evaluate the benefits and limitations along the way. We will start with completely static initialization and work through the various levels of the KISS Manager implementation.
In the simplest case, the boot loader on each processor node steps through a script, launching each of the processes listed in that script. This is simple, deterministic and the favorite of all Integration and Test teams. And an arbitrary level of redundancy can be built in.
The downside of this approach is summed up in one word: laziness.
At some point in the life of the system, something will change. And when it does, all of the implicit design assumptions about where and how things will execute will cause a raft of reintegration problems. There will be hard coded data, either buried or in define statements. In either case, there is a recompile in the offing, assuming that you still have the working compilers. Otherwise there is a port and/or re-coding.
Using a configuration files for each process would help, if enforced. But a large number of configuration files become their own Configuration Management problem.
So, static implementations work, but they are also rigid.
The next option is to make a more sophisticated loader, or a simple adjunct manager responsible for launching all the local processes. This keeps the overall complexity about the same, but forces process developers to to abstract location and interface data and rely on specifics provided by the loader.
This implementation has most of the advantages of the static, with fewer disadvantages. However, it does require greater up front design and discipline in implementation. There is no improved ability to handle failures.
Conceptually, this could become a mess. Arbitrary reassignment of processes is risky and makes rigorous testing difficult to impossible. Dynamic process management requires careful initial system design to constrain the number of permitted configurations. This may well be the most difficult aspect of the design.
In a dynamic system, the Manager must provided the capability to evaluate the current processing capability of the system and balance that against the resource requirements of the processes currently required to carry out functional requirements. Or, more concisely, be able to get the job done with whatever is left.
Under normal design conditions, this is not a problem. The system resources are sized for worst case conditions, with a safety margin. This makes the initial state effectively equal to the centralized static case.
The difference becomes evident when the system is degraded, whether by simple failure, accident or hostile action. In either of the static cases, there may be backup processes residing on other processors in the system. But, if the nature of the damage takes out the processors hosting both the primary process and its backup, the functionality is lost to the system. However, in a dynamic system, assuming that access to any critical interfaces remain, additional copies of critical processes can be launched on compute resources which would normally not be used for the functions. If adequate resources are not immediately available, the Manager has the option of disabling lower priority processes to free the needed resources.
As an extreme example, and this is a stretch, consider a combat aircraft that has taken battle damage. The Remote Interface Units (RIUs) can still control the aircraft, but the flight control computers have been damaged. In theory, the Manager, could launch the flight control processes on one of the mission system computers, shutting down some of the higher level functionality (dynamic mission planning, sensor fusion, etc.) to free up the resources in the mission computers to take up the load. This would reduce the combat effectiveness of the aircraft, but would help keep it in the air longer.
Static allocation of computational resources is easier to design, build and test. Adding dynamic allocation capabilities is expensive, and pointless, unless a concerted effort is made use the facilities provided to actually build a more robust system.