ScyllaDB vs Cassandra: what the p99 actually looks like at fintech scale
We ran Apache Cassandra in production for two years before migrating the user-identity lookup path to ScyllaDB. The decision was not made from a benchmark blog post. It was made after watching a p99 read latency of 180 ms on a 3-node Cassandra cluster serve a path that had a 50 ms SLO.
This post is a working note on what we measured, why Cassandra behaved the way it did, and what changed after the migration. A longer post with the full LSM-tree internals and compaction math is in progress.
#The problem in one number
A user-identity lookup on the BNPL approval path had a budget of 50 ms. Cassandra was hitting p99 of 180 ms under load — 3.6x over budget — despite the cluster being at roughly 30% CPU utilisation.
The symptom was GC pauses. Cassandra’s JVM heap was collecting under read pressure, and the stop-the-world pauses were showing up directly in the tail latency.
ScyllaDB is a C++ reimplementation of the Cassandra storage engine with a shard-per-core architecture and no JVM. The GC pause problem is structurally absent.
#What the migration changed
| Metric | Cassandra (3-node) | ScyllaDB (3-node) |
|---|---|---|
| p50 read latency | 4 ms | 1.8 ms |
| p99 read latency | 180 ms | 9 ms |
| p99.9 read latency | 340 ms | 22 ms |
| CPU at peak QPS | 31% | 18% |
Same hardware, same data model, same replication factor. The p99 improvement is 20x. The p99.9 improvement is 15x.
#Why Cassandra’s p99 drifts at load
Cassandra’s read path merges data from the memtable and potentially multiple SSTables (after compaction, ideally one — but compaction is async and never perfectly caught up). Each SSTable read involves a bloom filter check, a partition index lookup, and a block read from disk or page cache.
Under concurrent read pressure, the JVM heap fills with bloom filter and index structures. When the GC fires — even a minor collection — every in-flight read on that node pauses. The pause duration is proportional to heap pressure, which is proportional to read concurrency.
The math is simple: if GC fires every seconds and pauses for milliseconds, any request in flight during that window takes at least ms extra. At p99 and p99.9, requests are almost certain to hit at least one GC event across their lifetime.
ScyllaDB eliminates this by allocating off-heap (no GC) and using a seastar reactor per core with cooperative scheduling. Tail latency is bounded by I/O, not by the runtime.
Full post coming: LSM-tree compaction strategies, bloom filter false-positive rates, and the exact data model we used. Numbers above are from a production environment, redacted for specifics.