Fault versus Failure

Ram Chillarege, 2010

Fault and Failure are two terms that are often confused. In every day life they conjure up similar images of something going wrong. Even in technical circles the terms are used interchangeably further distorting the differences. However, in technical terms they are two very different things. I say "things", but, really, only one of them comes close to being a thing - the other being an "event" and less of a thing. Let us study these a little deeper since we need to be more exacting for the purposes of defect analysis and software engineering.

 

Lets start with Failure. "Failure" is when something does not happen as it should, or vice versa something happens that should not happen. Flick a light switch, and there is no light. We just experienced a failure. We do not yet know what the reason is, but we know that we did not receive a service as we should have. That event is a failure.

Now, there could be several reasons for that failure. It could be a broken light switch, a broken bulb, or no electricity in the wires, and the list can be endless. The cause that led up to us experiencing a failure is the fault. The fault, could be a "thing". Theoretically, faults can exists without a "thing", but that's getting into another level of abstraction. For our purposes, lets stay with the notion that the thing that broke is the fault.

In Orthogonal Defect Classification (ODC) part of the classification is on the Fault and part on the Failure. Understanding the differences makes is much easier to follow the ODC discussions on classification.

Here we are, having just distinguished fault and failure. Now, that was easy, wasn't it? Just watch out for our friends in the industry who use them interchangeably, and tell them this story. Its not a fatal error to mix them... just confusing. Now did I say "error"? Where did that come from? How does error relate to fault and failure? Well, that's for another day!