ScyllaDB vs Cassandra: what the p99 actually looks like at fintech scale

We ran Apache Cassandra in production for two years before migrating the user-identity lookup path to ScyllaDB. The decision was not made from a benchmark blog post. It was made after watching a p99 read latency of 180 ms on a 3-node Cassandra cluster serve a path that had a 50 ms SLO.

This post is a working note on what we measured, why Cassandra behaved the way it did, and what changed after the migration. A longer post with the full LSM-tree internals and compaction math is in progress.


#The problem in one number

A user-identity lookup on the BNPL approval path had a budget of 50 ms. Cassandra was hitting p99 of 180 ms under load — 3.6x over budget — despite the cluster being at roughly 30% CPU utilisation.

The symptom was GC pauses. Cassandra’s JVM heap was collecting under read pressure, and the stop-the-world pauses were showing up directly in the tail latency.

ScyllaDB is a C++ reimplementation of the Cassandra storage engine with a shard-per-core architecture and no JVM. The GC pause problem is structurally absent.


#What the migration changed

Metric Cassandra (3-node) ScyllaDB (3-node)
p50 read latency 4 ms 1.8 ms
p99 read latency 180 ms 9 ms
p99.9 read latency 340 ms 22 ms
CPU at peak QPS 31% 18%

Same hardware, same data model, same replication factor. The p99 improvement is 20x. The p99.9 improvement is 15x.


#Why Cassandra’s p99 drifts at load

Cassandra’s read path merges data from the memtable and potentially multiple SSTables (after compaction, ideally one — but compaction is async and never perfectly caught up). Each SSTable read involves a bloom filter check, a partition index lookup, and a block read from disk or page cache.

Under concurrent read pressure, the JVM heap fills with bloom filter and index structures. When the GC fires — even a minor collection — every in-flight read on that node pauses. The pause duration is proportional to heap pressure, which is proportional to read concurrency.

The math is simple: if GC fires every TT seconds and pauses for Δt\Delta t milliseconds, any request in flight during that window takes at least Δt\Delta t ms extra. At p99 and p99.9, requests are almost certain to hit at least one GC event across their lifetime.

ScyllaDB eliminates this by allocating off-heap (no GC) and using a seastar reactor per core with cooperative scheduling. Tail latency is bounded by I/O, not by the runtime.


Full post coming: LSM-tree compaction strategies, bloom filter false-positive rates, and the exact data model we used. Numbers above are from a production environment, redacted for specifics.