Tuesday, April 25, 2017
Mission Assured Systems
(Picture from here.)
I was reading Feynman's lecture on Cargo Cult Science and was struck by his description of scientific integrity. It appears to be close to my own field, Mission Assured Systems. (Also called Mission Critical Systems.)
Let me be clear about of MAS are, first. I've been in this field for about fifteen years and I find it fascinating.
Mission Assured Systems are, simply, systems for which the outcome is assured.
Ha! you say. All systems are like that!
Not really.
Let's consider a military system and what I call the Moscow-New York problem. If you're aiming a ballistic missile at Moscow: You do not want it to hit New York by mistake.
(We'll leave as an exercise why one shouldn't be lobbing missiles at Moscow as an exercise for the intelligent reader.)
The problem to be overcome in this example is not guiding the missile to the right place. It is executing a mission for which the circumstances cannot be certain and cannot be tested and being assured the mission will go as planned. Aforesaid missile launch on Moscow cannot be tested. Or, rather, can only be tested once after which further research is pointless.
There are many systems out there that must operate properly in circumstances that may not have been foreseen by the designer. This is not Windows or Linux here-- if the system crashes, you restart. There are business mission assured systems where the scale of loss is so huge as to be worth the effort. Ultra low latency trading is an example here.
I'm more familiar with aerospace.
In aircraft instrumentation, the systems are typically redundant to some degree. For example, in a small plane there are often dual instruments-- or instrument functionality that can back up another instrument that may not be exactly the same time. But they all rely on the pilot. I had a friend that was flying from Boston to Washington, DC, once and was directed by an erroneous air traffic controller right into a thunderstorm cell. His attitude indicator picked that moment to go south.
Let me describe the situation: he had no outside reference indication. He was socked in by clouds and being rolled this way and that by the thunderstorm. This meant he had no human way to determine whether the airplane was even right side up much less level.
(Picture from here.)
The attitude indicator has two pieces to it. It indicates pitch-- the attitude of the plane from front to back, and roll, the attitude of the plane left or right. In this circumstance, the roll indicator failed and the entire instrument face began turning leisurely in a clockwise direction. The interior part-- the pitch indicator-- was still accurate.
My friend had to piece together his roll state from another instrument and thread it through the pitch state provided by the attitude indicator, all while the attitude indicator was turning round and round.
The critical point here is to indicate that the mission assurance of the system was maintained by the pilot and not a machine.
Automated systems, such as our example ballistic missile, do not have that luxury. Other, similar, examples are autolanding systems in aircraft and pretty much any rocket launch system. Even the Saturn V had a automated launch computer: the Launch Vehicle Digital Computer.
In aerospace, the use of automated systems are intended for those conditions where the human being is too slow, imprecise, vulnerable or expensive to handle the job. Since there is no human able to step in, the systems must be mission assured.
How do you get there? How does one make a mission assured system?
There are a number of paths to get there. They depend on the desired outcome. For example, many systems are fail safe systems-- that means that if the system fails it ends up in a safe state. Nuclear power systems are like this: if something goes horribly wrong the system is intended to fail into a state that will not endanger people. We have direct experience on how the mission assurance of these systems have not measured up in actual circumstance. A missile launched and going awry can be detonated-- that is a legitimate "fail safe" system.
But fail safe doesn't always apply. For example, the autoland capability in a 747. Sure, the aircraft fails into a safe state right until it strikes the ground. In these sorts of systems, functionality has to continue in the face of failure. These are fault tolerant systems.
And here's where it starts sounding like Feynman's lecture.
I'm a software engineer so I'm familiar with the assurance of software systems. Hardware systems have a somewhat different approach. However, they all involve software process.
Many people think of software as mere code. Code is certainly the place where the rubber meets the road. Software process is the organized means by which well developed code is architected, designed, implemented and tested-- this isn't just Mickey Rooney saying Lets Put On A Show. It's a bunch of engineers pulling together an organized product that does something extraordinary and responds appropriately to unforeseen circumstances.
So, what is software process? There are several methodologies of process that I've worked in but the one I prefer is the Spiral Model.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment