The ability of a system to respond gracefully to an unexpected hardware or software failure. There are many levels of fault tolerance, the lowest being the ability to continue operation in the event of a power failure. Many fault-tolerant computer systemsmirror all operations -- that is, every operation is performed on two or more duplicate systems, so if one fails the other can take over.
Fault tolerance concepts with examples Provides links to information on faults and failures, dependency relations, fault classes, and other fault attributes.
Fault Tolerance Overview Provides an overview of fault tolerance, along with links to products from Reliable Software Technologies.
The Center for Reliable and High-Performance Computing This group focuses on integrating research in the areas of reliable and high-performance computing, high-performance acrchitectures, fault tolerance, and testing. There are links to numerous research groups, and technical reports and publication abstracts.