Let It Crash ... Except When You Shouldn't
http://www.infoq.com/presentations/Let-It-Crash
Erlang is a language built around fault tolerance concepts.
Chaos Monkey -- (NetFlix) taking down portions of a system deliberately to test fault tolerance and degraded, but functional performance. Reminds me of work I did at CERL (PLATO) for the zdegraded system variable.
Ran across an interesting blog posting (NetFlix again) about Lessons Learned Using Amazon Web Services.