System Design Interviews

Resources

Distributed Systems Concepts

  • Performance vs. Scalability
    • Performance: “How it is normally.”
    • Scalability: “It’s slow under heavy load.”
    • Scalability: “Adding resources improves performance linearly.”
  • Latency vs. Throughput
  • CAP Theorem (for distributed systems)
    • CAP Theorem: Can only guarantee two properties of:
      • Consistency: Every read receives the most recent write or an error.
      • Availability: Every request receives a non-error response. (Not necessarily the most recent.)
      • Partition Tolerance: The system continues to operate even with arbitrary network failures/delays.
    • Choices of guarantees:
      • CA: Consistency + Availability
        • Networks are never reliable. Almost never choose CA.
      • CP: Consistency + Partition Tolerance
        • Good if you need atomic reads/writes.
        • TODO: Atomic in what way?
      • AP: Availability + Partition Tolerance
        • Good if you need reliable access to the system.
        • TODO: It also talks about a business needing eventual consistency. Why would a business need it specifically?
  • Consistency Patterns
    • Weak Consistency: After a write, reads may or may not see it.
      • Example: real-time use cases like VoIP and multiplayer games.
    • Eventual Consistency: After a write, reads will eventually see it. (Typically milliseconds.)
      • Useful if you need high availability.
      • Example: DNS and email.
    • Strong consistency: After a write, reads will always see it.
      • Useful if you need transactions.
      • TODO: What is a transaction?
      • Example: File systems and RDBMSes.
  • Availability Patterns
    • Active-Passive Failover:
      • Active server normally manages traffic.
      • Only one “IP address” is needed.
        • TODO: Be more clear that “IP address” doesn’t necessarily mean a literal IP address?
      • If active server heartbeats stop, passive server takes over the “IP address”.
    • Active-Active Failover:
      • Load is spread between both servers.
      • Requires “two IP addresses”.
        • TODO: Be more clear that this doesn’t necessarily mean literal IP addresses?
    • Master-Slave Replication:
      • One master, many slaves.
      • Master server can read/write. Slave servers can only read.
      • If master fails, a slave can be promoted to master.
    • Master-Master Replication:
      • All servers are masters and can all read/write.
    • TODO: Talk more about the advantages/disadvantages of each approach?
  • Availability also depends on whether components are in series or parallel.
    • 99.9% availability in series with 99.9% = 99.8%
    • 99.9% availability in parallel with 99.9% = 99.9999%