There is exactly one place in a system where, if it fails, the entire system stops. To improve reliability, what is the MOST appropriate measure for this place?

1 / 1
Select an answer
CorrectD

Explanation

A question asking for the correct reliability measure for a single point of failure (SPOF).

  • 1if it fails, the entire system stopsThe very definition of a single point of failure (SPOF)
  • 2exactly one placeThere is no alternative, and the whole system depends on that place
  • 3improve reliabilityBuild a configuration that does not stop even when a failure occurs
AIncorrect

Increase the amount of logging from that place.

More logging makes incident investigation easier, but it does not change the structure where the whole system stops if that place fails.

It cannot remove the outage risk itself, so this is incorrect.

BIncorrect

Make the instance at that place larger.

A larger instance increases processing power, but the count is still one.

If that single instance fails, the whole system stops, so the single point of failure is not resolved, and this is incorrect.

CIncorrect

Temporarily stop using that place.

Stopping the use of a required component means that function itself becomes unavailable.

Rather than improving reliability, the service can no longer function, so this is incorrect.

DCorrect

Make that place redundant by placing multiple copies across multiple Availability Zones.

This is correct. A single point of failure (SPOF) is removed by making the place redundant and distributing it across multiple instances and multiple AZs. Even if one instance fails, the rest take over processing, preventing the entire system from stopping. This is the basis of reliable design.

Key Takeaway

"If it fails, everything stops" = a single point of failure (SPOF). The standard measure is redundancy (multiple copies, multiple AZs). Note that scaling up or more logging does not resolve a SPOF.