Resources
Fault Tolerant Computing
Preliminary: Will be updated time to time
Readings for Lecture Notes:
The lecture notes contain the concepts you need to know.
The Supporting Reading materials contain more detailed information about some of the topics mentioned in the Lecture Notes. The Further reading materials mentioned in the slides may provide further insight.
1 Introduction: Lecture Notes 1 (pdf)
Supporting reading: A Conceptual Framework for System Fault Tolerance; Later we will discuss some of the reliability measure in more detail.
2 Digital Systems: Lecture Notes 2
Supporting reading: Use your Logic Design/Computer Architecture text. You can find a quick introduction to Karnaugh maps at karnaugh.pdf. We will talk about faults and testing in more detail later, but if you want, you can take a peek at http://www.cs.colostate.edu/~cs530/digital_testing.pdf
3 Fault Modeling: Lecture Notes 3
Supporting reading: Design for Testability in Digital Integrated circuits, Bob Strunz, Colin Flanagan, Tim Hall http://www.cs.colostate.edu/~cs530/digital_testing.pdf
4 Combinational Circuit Testing: Lecture Notes 4
Supporting reading: Design for Testability in Digital Integrated circuits, Bob Strunz, Colin Flanagan, Tim Hall http://www.cs.colostate.edu/~cs530/digital_testing.pdf
5 Sequential Circuit Modeling: Lecture Notes 5
Supporting reading: Design for Testability in Digital Integrated circuits, Bob Strunz, Colin Flanagan, Tim Hall http://www.cs.colostate.edu/~cs530/digital_testing.pdf
6 Probabilistic Methods: Overview: Lecture Notes 6
Supporting reading: Markov Processes http://www.sics.se/~aeg/report/node10.html, Poisson process http://en.wikipedia.org/wiki/Poisson_process
7 Random Testing: Overview: Random Testing
Supporting reading: Partial Detectability Profile, An Examination of Fault Exposure Ratio
7a Reliability Part 1: Lecture Notes 7a
7b Reliability Part 2: Lecture Notes 7b
Advanced classic paper: The Use of Triple-Modular Redundancy to Improve Computer Reliability Another interesting paper: TMR for process control A related controversial concept for software: N-version programming (has some well-known references).
8 Software Reliability: Lecture Notes 8a, Lecture Notes 8b, Lecture Notes 8c
Read this article written for an encyclopedia. Supporting reading: Software Reliability Handbook by Lakey and Neufelder
9 More Software Reliability: Lecture Notes 8d
Read this article written for an encyclopedia. Supporting reading: Software Reliability Handbook by Lakey and Neufelder
Texts:
Software Reliability Assurance Handbook; Locally available in pdf
Parag Lala: "Fault tolerant and Fault Testable Digital Design" (Prentice hall International).
Reliable Computer Systems: Design and Evaluation, 3rd edition, by D. Siewiorek and R. Swarz, AK Peters, 1998.
D. K. Pradhan, editor, Fault-Tolerant Computer System Design, Prentice-Hall, 1996.
B.W. Johnson, Design and Analysis of Fault-Tolerant Digital Systems, Addison Wesley, 1989.
Essentials of Electronic Testing for Digital, Memory, and Mixed-Signal VLSI Circuits, by Michael L. Bushnell, Vishwani D. Agrawal, Springer 2000.
Testing, Reliability & Security research group web page (includes links to several articles)
Other interesting Tutorials and articles on the web: (this will be updated soon)
Hardware Testing:
Software Testing
Reliability:
Probability Theory Good detailed discussion with examples.
RELIABILITY BLOCK DIAGRAMS from NASA.
Software Reliability:
Coding Theory: