close
Skip to content

apache/cassandra-simulator

Repository files navigation

Standalone Cassandra Simulator

A self-contained deterministic concurrency simulator extracted from the Apache Cassandra test infrastructure. User code running inside sim.simulate(...) executes under a single-threaded scheduler; all calls that might cause threads to interleave differently (like parks, waits, volatile/atomic writes, signals/monitors) are intercepted and turned into scheduler decisions.

CAUTION: highly experimental! Simulator was extracted from Cassandra codebase with help of a coding agent. Even though this was done with a lot of human handholding, I still can not vouch for this as if it was my own code).

Quick start

try (Simulator sim = new Simulator(42L)) {
    sim.simulate((SerializableRunnable) () -> {
        CountDownLatch latch = new CountDownLatch(1);
        new Thread(() -> latch.await()).start();
        latch.countDown();
    });
}

Debug park capture

Pass captureParks = true as the third constructor argument to record the call site at which each simulated thread parks. The site is included in every action description printed by the scheduler, making it easy to follow which line of user code caused a thread to pause.

try (Simulator sim = new Simulator(420L, 0f, /*captureParks*/ true)) {
    sim.simulate((SerializableRunnable) () -> {
        CountDownLatch latch = new CountDownLatch(1);
        CountDownLatch done  = new CountDownLatch(1);

        new Thread(() -> {
            try { latch.await(); }
            catch (InterruptedException e) { Thread.currentThread().interrupt(); return; }
            done.countDown();
        }, "waiter").start();

        new Thread(() -> latch.countDown(), "decrementer").start();

        try { done.await(); }
        catch (InterruptedException e) { Thread.currentThread().interrupt(); }
    });
}

Scheduler output:

[t]Thread[thread-0]
[t]Invoke Thread[waiter,5,sim-420] with Thread[sim-420_waiter:1,5,sim-420]
[t]Invoke Thread[decrementer,5,sim-420] with Thread[sim-420_decrementer:1,5,sim-420]
[tw]Wakeup Thread[sim-420_waiter:1,5,sim-420] parkedAt[org.apache.cassandra.simulator_test.CountDownLatchSimTest.lambda$countDownLatchWakesWaiter$0(CountDownLatchSimTest.java:52)]
[tw]Wakeup Thread[sim-420_thread-0:1,5,sim-420] parkedAt[org.apache.cassandra.simulator_test.CountDownLatchSimTest.lambda$countDownLatchWakesWaiter$81c80a4a$1(CountDownLatchSimTest.java:60)]

Reading the output:

Part Meaning
[t] / [tw] Action modifier flags — t = START_THREAD, w = WAKE_UP_THREAD
Thread[thread-0] The root body thread starting up
Invoke … with Thread[sim-420_waiter:1,5,sim-420] A new InterceptibleThread is being started; the name encodes <group>_<name>:<node>,<priority>,<group>
Wakeup Thread[sim-420_waiter:1,5,sim-420] The scheduler is about to unpark this thread
parkedAt[…] First non-simulator stack frame captured when the thread parked — the exact user-code line that called await() or park()

Two transformation mechanisms work together to cover all call sites:

  • InstanceClassLoader loads application classes through an ASM transformer that rewrites call sites before the class is defined. JDK classes, the simulator framework (org.apache.cassandra.simulator.context.*), and logging classes are excluded — they are delegated to sharedClassLoader and reach the JVM unmodified.
  • Java agent (InterceptAgent, -javaagent:simulator-asm.jar) uses the Instrumentation API to rewrite call sites inside JDK classes that InstanceClassLoader cannot reach: the java.util.concurrent.locks.* family (AQS, ReentrantLock, …), java.lang.Object, java.util.Random, ThreadLocalRandom, and ConcurrentHashMap.

Concepts

A simulation runs an ActionPlan — a set of initial Actions whose transitive consequences drive the simulation forward. Each Action represents one discrete unit of simulation work; executing it yields zero or more child/continuation Actions. The simulation completes once all transitive consequences of the initial actions have completed.

Execution order is controlled by an ActionSchedule together with a RunnableActionScheduler (which ready action runs next) and a FutureActionScheduler (actions deferred to a future simulated time).

Most non-trivial actions derive from SimulatedAction, which runs on an InterceptibleThread — a thread whose blocking calls have been replaced by scheduler decisions. A threaded action simultaneously represents the initial synchronous step of an execution context and any future continuations of that context; when a thread blocks (e.g. enters an unbounded wait), the action has no ready consequences until another action wakes it.

Actions carry Modifier options that control behaviour — applied to the action itself, transitively to all descendants, or filtered to specific consequence verbs. OrderOn constraints enforce sequential ordering or rate limits among groups of actions, supporting both executor-service simulation and correctness constraints (e.g. where a parent action's children must occur in strict order).

Collectively the simulator controls: monitors, LockSupport, blocking data structures, threads and executors, random number generation, and time.

Terminology

  • Interceptible* — an entity that can be intercepted. InterceptibleThread is a thread the simulator is able to pause. It's the subject of interception.
  • Intercepting* — the mechanism that does the intercepting. InterceptingMonitors intercepts synchronized/wait/notify. InterceptingExecutor intercepts task submission. These are the active interception components.
  • Intercepted* — an event that has been intercepted and is now sitting in the scheduler queue. InterceptedWait is a wait that was pulled out of its thread and handed to the simulator to schedule. It's the result of interception.
  • Instrumented* — a abstraction wired up to the simulator. InstrumentedSemaphore implements Cassandra's Semaphore interface and routes its waits through the simulator. InstrumentedCountDownLatch implements Cassandra's CountDownLatch.

Runtime activation conditions

The bytecode transformer rewrites call sites, but each stub performs runtime checks before activating the simulator path. A JDK call is only intercepted when all of the following hold:

  1. Calling thread is an InterceptibleThread. Threads created outside the simulator (plain new Thread(...) never passed through startThread()) are not InterceptibleThread instances; every stub detects this and falls through to the real JDK method.

  2. Thread is actively intercepting (interceptor != null). Between the moment a simulated thread finishes its current action and the moment the scheduler re-dispatches it, the interceptor field is null. park* / wait* stubs check isIntercepting() and fall through when it is false.

  3. For unpark: the target thread must also be an InterceptibleThread. If either the calling thread or the thread being unparked is a plain OS thread, LockSupport.unpark() is called directly.

When any condition fails the stub delegates to the real JDK method unchanged, so code paths that happen to run on non-simulated threads continue to work normally even inside instrumented classes.


What is instrumented

InstanceClassLoader scope — application code loaded by sim.simulate() and any class whose package matches the GLOBAL_METHODS or MONITORS patterns: org.apache.cassandra.** (excluding org.apache.cassandra.simulator.**) and accord.**.

Agent scope — JDK classes listed above: java.util.concurrent.locks.* (call sites to LockSupport.park*), java.lang.Object (hashCode → deterministic id), java.util.Random / ThreadLocalRandom (deterministic seeds), and ConcurrentHashMap (deterministic probing).

Thread creation

Call site Redirected to
new Thread(...).start() InterceptorOfGlobalMethods$Global.startThread() — scheduled as a simulator action

JDK synchronizers (constructor redirect)

Constructors are rewritten at call sites so the simulator's subclass is instantiated instead:

Original Simulator replacement
new CountDownLatch(n) InterceptingCountDownLatch
new ConcurrentHashMap(...) InterceptibleConcurrentHashMap (deterministic hash codes)
new IdentityHashMap(...) InterceptedIdentityHashMap (deterministic hash codes)

Cassandra concurrent-utility factories (integration-test examples)

The integration-test module (integration-test/src/main) includes Cassandra-style concurrent-utility interfaces with @Intercept-annotated factory methods. These are registered at test setup via sim.intercept(Interface.class, Impl.class) — they are not built-in simulator defaults:

Original Simulator replacement
WaitQueue.newWaitQueue() InstrumentedWaitQueue
CassandraCountDownLatch.newCountDownLatch(n) InterceptingAwaitable.InterceptingCountDownLatch
Condition.newOneTimeCondition() InterceptingAwaitable.InterceptingCondition
Semaphore.newSemaphore(n) / newFairSemaphore(n) InstrumentedSemaphore

Executor factories

All zero- and single-int-arg Executors factory methods are redirected by default. Tasks submitted to the returned executor run on InterceptibleThread instances under scheduler control:

Original Simulator replacement
Executors.newSingleThreadExecutor() InterceptingExecutorFactory.pooled(name, 1)
Executors.newFixedThreadPool(int) InterceptingExecutorFactory.pooled(name, n)
Executors.newCachedThreadPool() InterceptingExecutorFactory.sequential(name)
Executors.newScheduledThreadPool(int) InterceptingExecutorFactory.sequential(name)
Executors.newSingleThreadScheduledExecutor() InterceptingExecutorFactory.sequential(name)

Pool size is irrelevant to the simulator's serial scheduler; newCachedThreadPool and the two scheduled variants map to a sequential executor. ThreadFactory-overload variants are also redirected; the supplied factory is silently dropped because the intercepting executor always creates InterceptibleThread instances.

LockSupport (all variants)

LockSupport.park* and unpark call sites are redirected to InterceptorOfSystemMethods$Global in both transformation paths:

  • Call sites in application code — rewritten by InstanceClassLoader / GlobalMethodTransformer
  • Call sites inside java.util.concurrent.locks.* (e.g. AQS) — rewritten by the Java agent

Covered variants:

  • park(), park(Object blocker)
  • parkNanos(long), parkNanos(Object blocker, long nanos)
  • parkUntil(long), parkUntil(Object blocker, long millis)
  • unpark(Thread)

Monitor and Object wait/notify

For classes with the MONITORS flag, every synchronized block entry and exit is intercepted, giving the scheduler an opportunity to switch threads. Inside those blocks:

  • Object.wait(), wait(long), wait(long, int) — intercepted
  • Object.notify(), notifyAll() — intercepted

Thread sleep and timed waits

Original Notes
Thread.sleep(long) / Thread.sleep(long, int) Redirected to simulated sleep
TimeUnit.sleep(long) Redirected to simulated sleep
Clock.waitUntil(long), Awaitable$SyncAwaitable.waitUntil(long) Redirected
com.google.common.util.concurrent.Uninterruptibles.sleepUninterruptibly(...) Redirected

Time sources

Original Condition Notes
System.nanoTime() SYSTEM_CLOCK flag Returns simulated time
System.currentTimeMillis() SYSTEM_CLOCK flag Returns simulated time
TimeUtils.timestampMicros() GLOBAL_CLOCK flag Returns monotonically-increasing simulated micros

Determinism helpers

Original Notes
System.identityHashCode(Object) Returns a deterministic per-object id assigned by the simulator
UUID.randomUUID() Returns a deterministic UUID from the simulator's random source
ThreadLocalRandom.getProbe() / advanceProbe() / localInit() Stubbed/redirected for determinism inside ConcurrentHashMap

Future.get()

Future.get() call sites inside instrumented code are redirected to InterceptorOfGlobalMethods$Global.futureGet(), which spin-polls future.isDone() and yields to the scheduler via parkNanos(1000L) between attempts. This works for any Future implementation — the future itself does not need to be simulation-aware.

The timed variant Future.get(long, TimeUnit) is also redirected; it uses simulated time for the deadline check.


What is NOT instrumented

  • Most JDK internals — only java.util.concurrent.locks.*, java.lang.Object, java.util.Random, ThreadLocalRandom, and ConcurrentHashMap are rewritten by the agent. All other JDK classes are untouched.
  • Code in org.apache.cassandra.simulator.* (excluding simulator.test) — the simulator infrastructure itself is excluded from transformation to avoid infinite recursion.
  • Any class not matching the GLOBAL_METHODS / MONITORS patterns — third-party libraries (other than accord.* and io.netty.util.concurrent.FastThreadLocal) are not instrumented.

Architecture overview

-javaagent:simulator-asm.jar  (InterceptAgent)
  └── Instrumentation API — rewrites at class-load time:
        java.util.concurrent.locks.* → LockSupport call sites redirected
        java.lang.Object             → hashCode → deterministic id
        java.util.Random / ThreadLocalRandom → deterministic seeds
        ConcurrentHashMap            → deterministic probing

sim.simulate(body)
  └── InstanceClassLoader   (loads + ASM-transforms application classes)
        ├── GlobalMethodTransformer  (rewrites call sites listed above)
        └── InterceptClasses         (decides which flags apply per class)

InterceptorOfGlobalMethods$Global  (static dispatch, called by transformed code)
  └── InterceptingGlobalMethods     (concrete implementation wired to the scheduler)
        ├── InterceptingExecutorFactory  (creates InterceptibleThread-backed executors)
        └── InterceptingMonitors         (synchronized / park / wait handling)

ActionSchedule  (cooperative scheduler; runs one action at a time)
  └── RunnableActionScheduler  (Sequential for deterministic tests; RandomUniform for fuzzing)

Building and testing

./gradlew :simulator-core:test

Tests require JDK 17 (JDK 21 support is in progress — ShadowingTransformer strips ACC_FINAL from shadow subclasses to handle classes that became final in JDK 21).

About

Apache cassandra

Resources

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages