A self-contained deterministic concurrency simulator extracted from the Apache Cassandra test
infrastructure. User code running inside sim.simulate(...) executes under a single-threaded
scheduler; all calls that might cause threads to interleave differently (like parks, waits,
volatile/atomic writes, signals/monitors) are intercepted and turned into scheduler decisions.
CAUTION: highly experimental! Simulator was extracted from Cassandra codebase with help of a coding agent. Even though this was done with a lot of human handholding, I still can not vouch for this as if it was my own code).
try (Simulator sim = new Simulator(42L)) {
sim.simulate((SerializableRunnable) () -> {
CountDownLatch latch = new CountDownLatch(1);
new Thread(() -> latch.await()).start();
latch.countDown();
});
}Pass captureParks = true as the third constructor argument to record the call site at which
each simulated thread parks. The site is included in every action description printed by the
scheduler, making it easy to follow which line of user code caused a thread to pause.
try (Simulator sim = new Simulator(420L, 0f, /*captureParks*/ true)) {
sim.simulate((SerializableRunnable) () -> {
CountDownLatch latch = new CountDownLatch(1);
CountDownLatch done = new CountDownLatch(1);
new Thread(() -> {
try { latch.await(); }
catch (InterruptedException e) { Thread.currentThread().interrupt(); return; }
done.countDown();
}, "waiter").start();
new Thread(() -> latch.countDown(), "decrementer").start();
try { done.await(); }
catch (InterruptedException e) { Thread.currentThread().interrupt(); }
});
}Scheduler output:
[t]Thread[thread-0]
[t]Invoke Thread[waiter,5,sim-420] with Thread[sim-420_waiter:1,5,sim-420]
[t]Invoke Thread[decrementer,5,sim-420] with Thread[sim-420_decrementer:1,5,sim-420]
[tw]Wakeup Thread[sim-420_waiter:1,5,sim-420] parkedAt[org.apache.cassandra.simulator_test.CountDownLatchSimTest.lambda$countDownLatchWakesWaiter$0(CountDownLatchSimTest.java:52)]
[tw]Wakeup Thread[sim-420_thread-0:1,5,sim-420] parkedAt[org.apache.cassandra.simulator_test.CountDownLatchSimTest.lambda$countDownLatchWakesWaiter$81c80a4a$1(CountDownLatchSimTest.java:60)]
Reading the output:
| Part | Meaning |
|---|---|
[t] / [tw] |
Action modifier flags — t = START_THREAD, w = WAKE_UP_THREAD |
Thread[thread-0] |
The root body thread starting up |
Invoke … with Thread[sim-420_waiter:1,5,sim-420] |
A new InterceptibleThread is being started; the name encodes <group>_<name>:<node>,<priority>,<group> |
Wakeup Thread[sim-420_waiter:1,5,sim-420] |
The scheduler is about to unpark this thread |
parkedAt[…] |
First non-simulator stack frame captured when the thread parked — the exact user-code line that called await() or park() |
Two transformation mechanisms work together to cover all call sites:
InstanceClassLoaderloads application classes through an ASM transformer that rewrites call sites before the class is defined. JDK classes, the simulator framework (org.apache.cassandra.simulator.context.*), and logging classes are excluded — they are delegated tosharedClassLoaderand reach the JVM unmodified.- Java agent (
InterceptAgent,-javaagent:simulator-asm.jar) uses theInstrumentationAPI to rewrite call sites inside JDK classes thatInstanceClassLoadercannot reach: thejava.util.concurrent.locks.*family (AQS,ReentrantLock, …),java.lang.Object,java.util.Random,ThreadLocalRandom, andConcurrentHashMap.
A simulation runs an ActionPlan — a set of initial Actions whose transitive
consequences drive the simulation forward. Each Action represents one discrete unit of
simulation work; executing it yields zero or more child/continuation Actions. The simulation
completes once all transitive consequences of the initial actions have completed.
Execution order is controlled by an ActionSchedule together with a
RunnableActionScheduler (which ready action runs next) and a
FutureActionScheduler (actions deferred to a future simulated time).
Most non-trivial actions derive from SimulatedAction, which runs on an
InterceptibleThread — a thread whose blocking calls have been replaced by scheduler
decisions. A threaded action simultaneously represents the initial synchronous step of an
execution context and any future continuations of that context; when a thread blocks (e.g.
enters an unbounded wait), the action has no ready consequences until another action wakes it.
Actions carry Modifier options that control behaviour — applied to the action itself,
transitively to all descendants, or filtered to specific consequence verbs. OrderOn
constraints enforce sequential ordering or rate limits among groups of actions, supporting
both executor-service simulation and correctness constraints (e.g. where a parent action's
children must occur in strict order).
Collectively the simulator controls: monitors, LockSupport, blocking data structures,
threads and executors, random number generation, and time.
Interceptible*— an entity that can be intercepted.InterceptibleThreadis a thread the simulator is able to pause. It's the subject of interception.Intercepting*— the mechanism that does the intercepting.InterceptingMonitorsinterceptssynchronized/wait/notify.InterceptingExecutorintercepts task submission. These are the active interception components.Intercepted*— an event that has been intercepted and is now sitting in the scheduler queue.InterceptedWaitis a wait that was pulled out of its thread and handed to the simulator to schedule. It's the result of interception.Instrumented*— a abstraction wired up to the simulator.InstrumentedSemaphoreimplements Cassandra's Semaphore interface and routes its waits through the simulator.InstrumentedCountDownLatchimplements Cassandra's CountDownLatch.
The bytecode transformer rewrites call sites, but each stub performs runtime checks before activating the simulator path. A JDK call is only intercepted when all of the following hold:
-
Calling thread is an
InterceptibleThread. Threads created outside the simulator (plainnew Thread(...)never passed throughstartThread()) are notInterceptibleThreadinstances; every stub detects this and falls through to the real JDK method. -
Thread is actively intercepting (
interceptor != null). Between the moment a simulated thread finishes its current action and the moment the scheduler re-dispatches it, theinterceptorfield isnull.park*/wait*stubs checkisIntercepting()and fall through when it is false. -
For
unpark: the target thread must also be anInterceptibleThread. If either the calling thread or the thread being unparked is a plain OS thread,LockSupport.unpark()is called directly.
When any condition fails the stub delegates to the real JDK method unchanged, so code paths that happen to run on non-simulated threads continue to work normally even inside instrumented classes.
InstanceClassLoader scope — application code loaded by sim.simulate() and any class whose
package matches the GLOBAL_METHODS or MONITORS patterns:
org.apache.cassandra.** (excluding org.apache.cassandra.simulator.**) and accord.**.
Agent scope — JDK classes listed above: java.util.concurrent.locks.* (call sites to
LockSupport.park*), java.lang.Object (hashCode → deterministic id), java.util.Random /
ThreadLocalRandom (deterministic seeds), and ConcurrentHashMap (deterministic probing).
| Call site | Redirected to |
|---|---|
new Thread(...).start() |
InterceptorOfGlobalMethods$Global.startThread() — scheduled as a simulator action |
Constructors are rewritten at call sites so the simulator's subclass is instantiated instead:
| Original | Simulator replacement |
|---|---|
new CountDownLatch(n) |
InterceptingCountDownLatch |
new ConcurrentHashMap(...) |
InterceptibleConcurrentHashMap (deterministic hash codes) |
new IdentityHashMap(...) |
InterceptedIdentityHashMap (deterministic hash codes) |
The integration-test module (integration-test/src/main) includes Cassandra-style
concurrent-utility interfaces with @Intercept-annotated factory methods. These are
registered at test setup via sim.intercept(Interface.class, Impl.class) — they are
not built-in simulator defaults:
| Original | Simulator replacement |
|---|---|
WaitQueue.newWaitQueue() |
InstrumentedWaitQueue |
CassandraCountDownLatch.newCountDownLatch(n) |
InterceptingAwaitable.InterceptingCountDownLatch |
Condition.newOneTimeCondition() |
InterceptingAwaitable.InterceptingCondition |
Semaphore.newSemaphore(n) / newFairSemaphore(n) |
InstrumentedSemaphore |
All zero- and single-int-arg Executors factory methods are redirected by default.
Tasks submitted to the returned executor run on InterceptibleThread instances under
scheduler control:
| Original | Simulator replacement |
|---|---|
Executors.newSingleThreadExecutor() |
InterceptingExecutorFactory.pooled(name, 1) |
Executors.newFixedThreadPool(int) |
InterceptingExecutorFactory.pooled(name, n) |
Executors.newCachedThreadPool() |
InterceptingExecutorFactory.sequential(name) |
Executors.newScheduledThreadPool(int) |
InterceptingExecutorFactory.sequential(name) |
Executors.newSingleThreadScheduledExecutor() |
InterceptingExecutorFactory.sequential(name) |
Pool size is irrelevant to the simulator's serial scheduler; newCachedThreadPool and
the two scheduled variants map to a sequential executor.
ThreadFactory-overload variants are also redirected; the supplied factory is silently
dropped because the intercepting executor always creates InterceptibleThread instances.
LockSupport.park* and unpark call sites are redirected to InterceptorOfSystemMethods$Global
in both transformation paths:
- Call sites in application code — rewritten by
InstanceClassLoader/GlobalMethodTransformer - Call sites inside
java.util.concurrent.locks.*(e.g. AQS) — rewritten by the Java agent
Covered variants:
park(),park(Object blocker)parkNanos(long),parkNanos(Object blocker, long nanos)parkUntil(long),parkUntil(Object blocker, long millis)unpark(Thread)
For classes with the MONITORS flag, every synchronized block entry and exit is intercepted,
giving the scheduler an opportunity to switch threads. Inside those blocks:
Object.wait(),wait(long),wait(long, int)— interceptedObject.notify(),notifyAll()— intercepted
| Original | Notes |
|---|---|
Thread.sleep(long) / Thread.sleep(long, int) |
Redirected to simulated sleep |
TimeUnit.sleep(long) |
Redirected to simulated sleep |
Clock.waitUntil(long), Awaitable$SyncAwaitable.waitUntil(long) |
Redirected |
com.google.common.util.concurrent.Uninterruptibles.sleepUninterruptibly(...) |
Redirected |
| Original | Condition | Notes |
|---|---|---|
System.nanoTime() |
SYSTEM_CLOCK flag |
Returns simulated time |
System.currentTimeMillis() |
SYSTEM_CLOCK flag |
Returns simulated time |
TimeUtils.timestampMicros() |
GLOBAL_CLOCK flag |
Returns monotonically-increasing simulated micros |
| Original | Notes |
|---|---|
System.identityHashCode(Object) |
Returns a deterministic per-object id assigned by the simulator |
UUID.randomUUID() |
Returns a deterministic UUID from the simulator's random source |
ThreadLocalRandom.getProbe() / advanceProbe() / localInit() |
Stubbed/redirected for determinism inside ConcurrentHashMap |
Future.get() call sites inside instrumented code are redirected to
InterceptorOfGlobalMethods$Global.futureGet(), which spin-polls future.isDone() and
yields to the scheduler via parkNanos(1000L) between attempts. This works for any
Future implementation — the future itself does not need to be simulation-aware.
The timed variant Future.get(long, TimeUnit) is also redirected; it uses simulated
time for the deadline check.
- Most JDK internals — only
java.util.concurrent.locks.*,java.lang.Object,java.util.Random,ThreadLocalRandom, andConcurrentHashMapare rewritten by the agent. All other JDK classes are untouched. - Code in
org.apache.cassandra.simulator.*(excludingsimulator.test) — the simulator infrastructure itself is excluded from transformation to avoid infinite recursion. - Any class not matching the
GLOBAL_METHODS/MONITORSpatterns — third-party libraries (other thanaccord.*andio.netty.util.concurrent.FastThreadLocal) are not instrumented.
-javaagent:simulator-asm.jar (InterceptAgent)
└── Instrumentation API — rewrites at class-load time:
java.util.concurrent.locks.* → LockSupport call sites redirected
java.lang.Object → hashCode → deterministic id
java.util.Random / ThreadLocalRandom → deterministic seeds
ConcurrentHashMap → deterministic probing
sim.simulate(body)
└── InstanceClassLoader (loads + ASM-transforms application classes)
├── GlobalMethodTransformer (rewrites call sites listed above)
└── InterceptClasses (decides which flags apply per class)
InterceptorOfGlobalMethods$Global (static dispatch, called by transformed code)
└── InterceptingGlobalMethods (concrete implementation wired to the scheduler)
├── InterceptingExecutorFactory (creates InterceptibleThread-backed executors)
└── InterceptingMonitors (synchronized / park / wait handling)
ActionSchedule (cooperative scheduler; runs one action at a time)
└── RunnableActionScheduler (Sequential for deterministic tests; RandomUniform for fuzzing)
./gradlew :simulator-core:test
Tests require JDK 17 (JDK 21 support is in progress — ShadowingTransformer strips
ACC_FINAL from shadow subclasses to handle classes that became final in JDK 21).
