There is no such system that will not fail, so the question - how?

Talks about failure domains, fault isolation and testing it (using chaos engineering)