"Testing Distributed Systems w/ Deterministic Simulation" by Will Wilson

The talk discusses the benefits, challenges, and techniques of simulation testing in distributed systems.

Summary

  • Will, from FoundationDB, discusses the difficulty of debugging distributed systems and introduces simulation testing as a solution.
  • He explains how simulation testing works by creating deterministic simulations of the database to exhaustively debug before actual implementation.
  • The talk covers the software engineering challenges faced and the strategies employed to address them, including creating a custom simulation framework called 'Flow'.
  • Simulation testing allows for artificially increasing failure rates and the manipulation of the Hearst exponent to explore more potential bugs faster than real-world testing.
  • Despite the effectiveness of simulation testing, real hardware testing (through 'sinkhole') is also necessary to catch bugs that simulation can't simulate.

Chapter 1

Introduction to FoundationDB and Simulation Testing

0:00 - 19 sec

Will introduces himself, FoundationDB, and the concept of simulation testing.

Will introduces himself, FoundationDB, and the concept of simulation testing.

  • Will works at FoundationDB, which provides a scalable and fault-tolerant database with ACID transactions.
  • He introduces the topic of simulation testing and its potential to make debugging distributed systems easier.

Chapter 2

The Challenge of Debugging Distributed Systems

0:19 - 40 sec

Will talks about the inherent difficulties in debugging distributed systems compared to simple systems.

Will talks about the inherent difficulties in debugging distributed systems compared to simple systems.

  • Debugging distributed systems is complicated and is only slightly preferable to painful experiences like sticking a fork in one's eye.
  • The complexity of distributed systems is acknowledged, but Will proposes that the real challenge lies in their non-deterministic nature caused by networks.

Chapter 3

Simulation Testing to Address Debugging Challenges

0:59 - 1 min, 0 sec

Details on how simulation testing can address the debugging challenges in distributed systems.

Details on how simulation testing can address the debugging challenges in distributed systems.

  • Will uses a simple packet transfer example between two servers to illustrate how random network conditions can lead to rare but critical bugs.
  • The inability to repeat these conditions reliably in a real-world scenario highlights the need for a controlled testing environment.

Chapter 4

Creating a Deterministic Simulation Environment

1:59 - 3 min, 20 sec

Will discusses the creation of a deterministic simulation environment for FoundationDB.

Will discusses the creation of a deterministic simulation environment for FoundationDB.

  • FoundationDB started by writing a deterministic simulation of their database to debug exhaustively before actual implementation.
  • This simulation, called 'Flow', allows simulating a network of communicating processes and their environment within a single physical process.

Chapter 5

Generating a Simulated Database

5:19 - 2 min, 39 sec

The process of generating a simulated database and the software engineering challenges involved.

The process of generating a simulated database and the software engineering challenges involved.

  • FoundationDB's simulation creates virtual processes within a single physical process to avoid non-determinism.
  • Flow allows writing actor-based concurrency in C++ using a syntactic extension that transforms actor definitions into callback-based code.

Chapter 6

Simulation Testing Approaches and Techniques

7:57 - 4 min, 10 sec

Explains the detailed approaches and techniques used in simulation testing.

Explains the detailed approaches and techniques used in simulation testing.

  • Test files declare objectives and potential failure scenarios, including random clogging and network attrition.
  • Various failure modes, including hardware issues and network disruptions, are simulated to expose bugs.

Chapter 7

The Importance of Determinism in Simulation

12:07 - 3 min, 53 sec

Will highlights the crucial role of determinism in the simulation process.

Will highlights the crucial role of determinism in the simulation process.

  • Determinism is essential to ensure that simulations are repeatable, with the same input leading to the same output.
  • A small percentage of simulation runs are executed twice to ensure determinism.

Chapter 8

Challenges in Debugging with Simulation Testing

16:00 - 5 min, 54 sec

Discusses the debugging challenges that arise with simulation testing.

Discusses the debugging challenges that arise with simulation testing.

  • Debugging code laced with callbacks within a simulation environment is difficult, leaving printf debugging as a primary tool.
  • Deterministic simulation facilitates debugging by ensuring the same sequence of events upon reruns.

Chapter 9

Simulation Limitations and Real-World Testing

21:54 - 10 min, 57 sec

Addresses the limitations of simulation testing and the need for real-world testing on hardware.

Addresses the limitations of simulation testing and the need for real-world testing on hardware.

  • Simulation cannot account for all real-world scenarios, particularly those involving other people's software or hardware-specific issues.
  • FoundationDB uses a real hardware cluster, 'sinkhole', to test against power failures and hardware malfunctions.

Chapter 10

Continuous Improvement in Simulation Testing

32:51 - 7 min, 27 sec

Will discusses the ongoing efforts to improve simulation testing and address its pitfalls.

Will discusses the ongoing efforts to improve simulation testing and address its pitfalls.

  • The concern of training programmers to write bugs that pass simulation tests is akin to antibiotic resistance.
  • Potential solutions include having multiple simulation frameworks, more real-world testing, and additional hardware to reduce debugging cycle time.

More Strange Loop Conference summaries

"Lessons from building GitHub code search" by Luke Francl (Strange Loop 2023)

"Lessons from building GitHub code search" by Luke Francl (Strange Loop 2023)

Strange Loop Conference

Strange Loop Conference

Luke Francl at GitHub discusses the challenges and solutions in building the new GitHub code search, 'Blackbird'.

"The Economics of Programming Languages" by Evan Czaplicki (Strange Loop 2023)

"The Economics of Programming Languages" by Evan Czaplicki (Strange Loop 2023)

Strange Loop Conference

Strange Loop Conference

Evan discusses his journey in creating Elm, the challenges faced, and insights into the funding and economics behind programming languages.

"Making Hard Things Easy" by Julia Evans (Strange Loop 2023)

"Making Hard Things Easy" by Julia Evans (Strange Loop 2023)

Strange Loop Conference

Strange Loop Conference

A detailed exploration of why systems like DNS, HTTP, and Bash can be challenging to master, even when they seem fundamental, and strategies to demystify them.

"Noether: Symmetry in Programming Language Design" by Daira Hopwood (2013)

"Noether: Symmetry in Programming Language Design" by Daira Hopwood (2013)

Strange Loop Conference

Strange Loop Conference

Dara presents the concept and design of a programming language called Neta, emphasizing the importance of symmetry in programming language design.

"The Trouble With Types" by Martin Odersky (2013)

"The Trouble With Types" by Martin Odersky (2013)

Strange Loop Conference

Strange Loop Conference

An in-depth examination of Scala's type system, its challenges, and potential simplifications.