<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>Animesh Kumar Mishra</title>
  
  <subtitle>Math is substrate. Everything else is emergent.</subtitle>
  <link href="https://animesh.cloud/atom.xml" rel="self"/>
  
  <link href="https://animesh.cloud/"/>
  <updated>2026-05-30T19:31:00.213Z</updated>
  <id>https://animesh.cloud/</id>
  
  <author>
    <name>Animesh Kumar Mishra</name>
    
  </author>
  
  <generator uri="https://hexo.io/">Hexo</generator>
  
  <entry>
    <title>Data compression math: what each database actually stores on disk</title>
    <link href="https://animesh.cloud/2025/05/13/data-compression-math/"/>
    <id>https://animesh.cloud/2025/05/13/data-compression-math/</id>
    <published>2025-05-13T18:30:00.000Z</published>
    <updated>2026-05-30T19:31:00.213Z</updated>
    
    <content type="html"><![CDATA[<p>A <code>transaction_status</code> column holds three values: <code>APPROVED</code>, <code>DECLINED</code>, <code>PENDING</code>. In a careless schema each row stores a VARCHAR(20) — call it 8 bytes average. At 10 million rows that is 80 MB. Shannon’s entropy bound says the minimum is 1.16 bits per symbol. That is 1.5 MB. The gap is 53x, and it exists entirely because of the choices the storage engine makes (or doesn’t make) between serialisation and disk.</p><p>This post is the math behind those choices across four storage architectures.</p><hr><h2 id="the-floor-shannon-entropy"><a class="header-anchor" href="#the-floor-shannon-entropy">#</a>The floor: Shannon entropy</h2><p>For a random variable <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>X</mi></mrow><annotation encoding="application/x-tex">X</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord mathnormal" style="margin-right:0.0785em;">X</span></span></span></span> with <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>n</mi></mrow><annotation encoding="application/x-tex">n</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal">n</span></span></span></span> outcomes and probabilities <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>p</mi><mn>1</mn></msub><mo separator="true">,</mo><mo>…</mo><mo separator="true">,</mo><msub><mi>p</mi><mi>n</mi></msub></mrow><annotation encoding="application/x-tex">p_1, \ldots, p_n</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.1944em;"></span><span class="mord"><span class="mord mathnormal">p</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="minner">…</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord"><span class="mord mathnormal">p</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">n</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span>:</p><p class="katex-block "><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mi>H</mi><mo stretchy="false">(</mo><mi>X</mi><mo stretchy="false">)</mo><mo>=</mo><mo>−</mo><munderover><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>n</mi></munderover><msub><mi>p</mi><mi>i</mi></msub><msub><mrow><mi>log</mi><mo>⁡</mo></mrow><mn>2</mn></msub><msub><mi>p</mi><mi>i</mi></msub><mspace width="1em"/><mtext>bits/symbol</mtext></mrow><annotation encoding="application/x-tex">H(X) = -\sum_{i=1}^{n} p_i \log_2 p_i \quad \text{bits/symbol}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal" style="margin-right:0.0813em;">H</span><span class="mopen">(</span><span class="mord mathnormal" style="margin-right:0.0785em;">X</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:2.9291em;vertical-align:-1.2777em;"></span><span class="mord">−</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.6514em;"><span style="top:-1.8723em;margin-left:0em;"><span class="pstrut" style="height:3.05em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">i</span><span class="mrel mtight">=</span><span class="mord mtight">1</span></span></span></span><span style="top:-3.05em;"><span class="pstrut" style="height:3.05em;"></span><span><span class="mop op-symbol large-op">∑</span></span></span><span style="top:-4.3em;margin-left:0em;"><span class="pstrut" style="height:3.05em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">n</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:1.2777em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord"><span class="mord mathnormal">p</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mop"><span class="mop">lo<span style="margin-right:0.0139em;">g</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.207em;"><span style="top:-2.4559em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2441em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord"><span class="mord mathnormal">p</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:1em;"></span><span class="mord text"><span class="mord">bits/symbol</span></span></span></span></span></span></p><p>For the status column with distribution APPROVED=70%, DECLINED=20%, PENDING=10%:</p><p class="katex-block "><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mi>H</mi><mo>=</mo><mo>−</mo><mo stretchy="false">(</mo><mn>0.70</mn><msub><mrow><mi>log</mi><mo>⁡</mo></mrow><mn>2</mn></msub><mn>0.70</mn><mo>+</mo><mn>0.20</mn><msub><mrow><mi>log</mi><mo>⁡</mo></mrow><mn>2</mn></msub><mn>0.20</mn><mo>+</mo><mn>0.10</mn><msub><mrow><mi>log</mi><mo>⁡</mo></mrow><mn>2</mn></msub><mn>0.10</mn><mo stretchy="false">)</mo><mo>≈</mo><mn>1.16</mn><mtext> bits/symbol</mtext></mrow><annotation encoding="application/x-tex">H = -(0.70 \log_2 0.70 + 0.20 \log_2 0.20 + 0.10 \log_2 0.10) \approx 1.16 \text{ bits/symbol}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord mathnormal" style="margin-right:0.0813em;">H</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord">−</span><span class="mopen">(</span><span class="mord">0.70</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mop"><span class="mop">lo<span style="margin-right:0.0139em;">g</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.207em;"><span style="top:-2.4559em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2441em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord">0.70</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.9386em;vertical-align:-0.2441em;"></span><span class="mord">0.20</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mop"><span class="mop">lo<span style="margin-right:0.0139em;">g</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.207em;"><span style="top:-2.4559em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2441em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord">0.20</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord">0.10</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mop"><span class="mop">lo<span style="margin-right:0.0139em;">g</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.207em;"><span style="top:-2.4559em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2441em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord">0.10</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">≈</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord">1.16</span><span class="mord text"><span class="mord"> bits/symbol</span></span></span></span></span></span></p><p>No lossless compressor beats this. Everything below is about how close each engine gets, and what it gives up to get there.</p><hr><h2 id="rdbms-postgresql"><a class="header-anchor" href="#rdbms-postgresql">#</a>RDBMS (PostgreSQL)</h2><p>PostgreSQL stores rows in 8 KB heap pages. A row with a long text or JSONB column triggers <strong>TOAST</strong> (The Oversized-Attribute Storage Technique) once its size crosses the <code>toast_tuple_target</code> threshold (default: 2 KB).</p><p>TOAST compresses the attribute inline before spilling it to a separate table. Three algorithms are available:</p><table><thead><tr><th>Algorithm</th><th>Compression ratio (typical text)</th><th>Decompression throughput</th></tr></thead><tbody><tr><td>pglz (default, PG &lt;14)</td><td>2.0–2.5x</td><td>~500 MB/s</td></tr><tr><td>LZ4 (PG 14+)</td><td>2.0–3.0x</td><td>~4 GB/s</td></tr><tr><td>ZSTD (PG 15+)</td><td>3.0–5.0x</td><td>~1.5 GB/s</td></tr></tbody></table><p>LZ4 is the right default for any latency-sensitive path. ZSTD wins on bulk analytics where decompression throughput is not the bottleneck.</p><p><strong>Dictionary encoding</strong> — which PostgreSQL does not do natively for row-oriented heaps but Citus columnar and any Parquet-backed foreign table does — changes the picture entirely for low-cardinality columns. The status column with three values needs <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo stretchy="false">⌈</mo><msub><mrow><mi>log</mi><mo>⁡</mo></mrow><mn>2</mn></msub><mn>3</mn><mo stretchy="false">⌉</mo><mo>=</mo><mn>2</mn></mrow><annotation encoding="application/x-tex">\lceil \log_2 3 \rceil = 2</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mopen">⌈</span><span class="mop"><span class="mop">lo<span style="margin-right:0.0139em;">g</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.207em;"><span style="top:-2.4559em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2441em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord">3</span><span class="mclose">⌉</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">2</span></span></span></span> bits per row in a dictionary scheme, versus 8 bytes raw. On 10M rows:</p><p class="katex-block "><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mtext>savings</mtext><mo>=</mo><mfrac><mrow><mn>8</mn><mo>×</mo><mn>8</mn><mtext> bits</mtext><mo>−</mo><mn>2</mn><mtext> bits</mtext></mrow><mrow><mn>8</mn><mo>×</mo><mn>8</mn><mtext> bits</mtext></mrow></mfrac><mo>≈</mo><mn>96.9</mn><mi mathvariant="normal">%</mi></mrow><annotation encoding="application/x-tex">\text{savings} = \frac{8 \times 8 \text{ bits} - 2 \text{ bits}}{8 \times 8 \text{ bits}} \approx 96.9\%</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8623em;vertical-align:-0.1944em;"></span><span class="mord text"><span class="mord">savings</span></span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:2.1408em;vertical-align:-0.7693em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.3714em;"><span style="top:-2.314em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">8</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">×</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mord">8</span><span class="mord text"><span class="mord"> bits</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.677em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">8</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">×</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mord">8</span><span class="mord text"><span class="mord"> bits</span></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mord">2</span><span class="mord text"><span class="mord"> bits</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.7693em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">≈</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.8056em;vertical-align:-0.0556em;"></span><span class="mord">96.9%</span></span></span></span></span></p><p>This is why columnar formats compress <code>enum</code>-like columns near-perfectly even before applying a secondary byte-level compressor on top.</p><hr><h2 id="scylladb"><a class="header-anchor" href="#scylladb">#</a>ScyllaDB</h2><p>ScyllaDB compresses at the <strong>SSTable chunk level</strong>. Each SSTable is a sequence of fixed-size chunks; each chunk is compressed independently. The default chunk size is 4 KB.</p><p>Chunk-level independence has a cost: every single-row read must decompress the full chunk that contains it.</p><p class="katex-block "><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mtext>read amplification (rows)</mtext><mo>=</mo><mfrac><mtext>chunk_size_bytes</mtext><mtext>avg_row_size_bytes</mtext></mfrac></mrow><annotation encoding="application/x-tex">\text{read amplification (rows)} = \frac{\text{chunk\_size\_bytes}}{\text{avg\_row\_size\_bytes}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord text"><span class="mord">read amplification (rows)</span></span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:2.3904em;vertical-align:-0.996em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.3944em;"><span style="top:-2.314em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord text"><span class="mord">avg_row_size_bytes</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.7em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord text"><span class="mord">chunk_size_bytes</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.996em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span></span></p><p>At chunk = 4 KB and avg row = 128 B: <strong>32 rows decompressed per point read</strong>. Increasing the chunk to 64 KB improves compression ratio by 15–25% (longer runs → better LZ back-references) but raises read amplification to 512x.</p><p>Supported algorithms and typical ratios on event-stream data:</p><table><thead><tr><th>Algorithm</th><th>Ratio (time-series)</th><th>Ratio (UUID-heavy)</th><th>Notes</th></tr></thead><tbody><tr><td>LZ4</td><td>3–5x</td><td>1.5–2x</td><td>Default; lowest CPU cost</td></tr><tr><td>Snappy</td><td>2.5–4x</td><td>1.4–1.8x</td><td>Slightly lower ratio than LZ4</td></tr><tr><td>ZSTD</td><td>4–8x</td><td>2–3x</td><td>Best ratio; higher CPU on write path</td></tr><tr><td>Deflate</td><td>3–6x</td><td>2–2.5x</td><td>Slowest; avoid unless storage-constrained</td></tr></tbody></table><p>The write path keeps data uncompressed in the memtable (RAM) and compresses only on SSTable flush. Reads access the page cache; Scylla caches <strong>decompressed</strong> chunks by default (unlike Cassandra which can cache compressed). At high read concurrency the cache hit rate dominates latency far more than the compression ratio.</p><hr><h2 id="graph-databases"><a class="header-anchor" href="#graph-databases">#</a>Graph databases</h2><p>A graph <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>G</mi><mo>=</mo><mo stretchy="false">(</mo><mi>V</mi><mo separator="true">,</mo><mi>E</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">G = (V, E)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord mathnormal">G</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mopen">(</span><span class="mord mathnormal" style="margin-right:0.2222em;">V</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord mathnormal" style="margin-right:0.0576em;">E</span><span class="mclose">)</span></span></span></span> needs to store adjacency. Three representations:</p><p><strong>Adjacency matrix</strong>: <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="normal">∣</mi><mi>V</mi><msup><mi mathvariant="normal">∣</mi><mn>2</mn></msup></mrow><annotation encoding="application/x-tex">|V|^2</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.0641em;vertical-align:-0.25em;"></span><span class="mord">∣</span><span class="mord mathnormal" style="margin-right:0.2222em;">V</span><span class="mord"><span class="mord">∣</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span></span></span></span> bits. At <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="normal">∣</mi><mi>V</mi><mi mathvariant="normal">∣</mi><mo>=</mo><mn>1</mn><mtext>M</mtext></mrow><annotation encoding="application/x-tex">|V| = 1\text{M}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord">∣</span><span class="mord mathnormal" style="margin-right:0.2222em;">V</span><span class="mord">∣</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord">1</span><span class="mord text"><span class="mord">M</span></span></span></span></span> nodes: <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msup><mn>10</mn><mn>12</mn></msup></mrow><annotation encoding="application/x-tex">10^{12}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8141em;"></span><span class="mord">1</span><span class="mord"><span class="mord">0</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">12</span></span></span></span></span></span></span></span></span></span></span></span> bits = 122 GB even for a bit-packed binary matrix. Viable only for dense graphs.</p><p><strong>Adjacency list</strong>: <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>O</mi><mo stretchy="false">(</mo><mi mathvariant="normal">∣</mi><mi>V</mi><mi mathvariant="normal">∣</mi><mo>+</mo><mi mathvariant="normal">∣</mi><mi>E</mi><mi mathvariant="normal">∣</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">O(|V| + |E|)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal" style="margin-right:0.0278em;">O</span><span class="mopen">(</span><span class="mord">∣</span><span class="mord mathnormal" style="margin-right:0.2222em;">V</span><span class="mord">∣</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord">∣</span><span class="mord mathnormal" style="margin-right:0.0576em;">E</span><span class="mord">∣</span><span class="mclose">)</span></span></span></span> with pointer overhead per node — typically 24–48 bytes per node entry plus 4–8 bytes per edge.</p><p><strong>Compressed Sparse Row (CSR)</strong>: two flat arrays.</p><ul><li><code>row_ptr[0..V]</code>: V+1 integers; <code>row_ptr[i]</code> is the index into <code>col_idx</code> where node i’s neighbors begin</li><li><code>col_idx[0..E-1]</code>: E integers; the actual neighbor IDs</li></ul><p class="katex-block "><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mtext>CSR size</mtext><mo>=</mo><mo stretchy="false">(</mo><mi mathvariant="normal">∣</mi><mi>V</mi><mi mathvariant="normal">∣</mi><mo>+</mo><mn>1</mn><mo>+</mo><mi mathvariant="normal">∣</mi><mi>E</mi><mi mathvariant="normal">∣</mi><mo stretchy="false">)</mo><mo>×</mo><mn>4</mn><mtext> bytes (int32)</mtext></mrow><annotation encoding="application/x-tex">\text{CSR size} = (|V| + 1 + |E|) \times 4 \text{ bytes (int32)}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord text"><span class="mord">CSR size</span></span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mopen">(</span><span class="mord">∣</span><span class="mord mathnormal" style="margin-right:0.2222em;">V</span><span class="mord">∣</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.7278em;vertical-align:-0.0833em;"></span><span class="mord">1</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord">∣</span><span class="mord mathnormal" style="margin-right:0.0576em;">E</span><span class="mord">∣</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">×</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord">4</span><span class="mord text"><span class="mord"> bytes (int32)</span></span></span></span></span></span></p><p>For <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="normal">∣</mi><mi>V</mi><mi mathvariant="normal">∣</mi><mo>=</mo><mn>1</mn><mtext>M</mtext></mrow><annotation encoding="application/x-tex">|V| = 1\text{M}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord">∣</span><span class="mord mathnormal" style="margin-right:0.2222em;">V</span><span class="mord">∣</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord">1</span><span class="mord text"><span class="mord">M</span></span></span></span></span>, <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="normal">∣</mi><mi>E</mi><mi mathvariant="normal">∣</mi><mo>=</mo><mn>10</mn><mtext>M</mtext></mrow><annotation encoding="application/x-tex">|E| = 10\text{M}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord">∣</span><span class="mord mathnormal" style="margin-right:0.0576em;">E</span><span class="mord">∣</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord">10</span><span class="mord text"><span class="mord">M</span></span></span></span></span>: <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo stretchy="false">(</mo><mn>1,000,001</mn><mo>+</mo><mn>10,000,000</mn><mo stretchy="false">)</mo><mo>×</mo><mn>4</mn><mo>=</mo><mn>44</mn><mtext> MB</mtext></mrow><annotation encoding="application/x-tex">(1{,}000{,}001 + 10{,}000{,}000) \times 4 = 44 \text{ MB}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mopen">(</span><span class="mord">1</span><span class="mord"><span class="mpunct">,</span></span><span class="mord">000</span><span class="mord"><span class="mpunct">,</span></span><span class="mord">001</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord">10</span><span class="mord"><span class="mpunct">,</span></span><span class="mord">000</span><span class="mord"><span class="mpunct">,</span></span><span class="mord">000</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">×</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">4</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord">44</span><span class="mord text"><span class="mord"> MB</span></span></span></span></span>, versus 122 GB for the adjacency matrix. CSR is what graph analytics engines (GraphX, cuGraph, igraph) use internally.</p><p>Neo4j uses a different approach: <strong>index-free adjacency</strong> — each node record stores a pointer to its first relationship record, and each relationship record is a doubly-linked list node. This trades memory compactness for O(degree) traversal without an index lookup. Compressed against CSR, Neo4j’s native format uses more bytes per edge but achieves sub-millisecond hop traversal because no secondary index is touched.</p><p><strong>Delta encoding</strong> on sorted neighbor lists further reduces CSR storage. If neighbors of node <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>i</mi></mrow><annotation encoding="application/x-tex">i</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6595em;"></span><span class="mord mathnormal">i</span></span></span></span> are sorted, store differences rather than absolute IDs:</p><p class="katex-block "><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><msub><mi>δ</mi><mi>j</mi></msub><mo>=</mo><msub><mtext>neighbor</mtext><mi>j</mi></msub><mo>−</mo><msub><mtext>neighbor</mtext><mrow><mi>j</mi><mo>−</mo><mn>1</mn></mrow></msub></mrow><annotation encoding="application/x-tex">\delta_j = \text{neighbor}_j - \text{neighbor}_{j-1}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.9805em;vertical-align:-0.2861em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.0379em;">δ</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-left:-0.0379em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0572em;">j</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1.0747em;vertical-align:-0.3802em;"></span><span class="mord"><span class="mord text"><span class="mord">neighbor</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2175em;"><span style="top:-2.4559em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0572em;">j</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.3802em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:1.0747em;vertical-align:-0.3802em;"></span><span class="mord"><span class="mord text"><span class="mord">neighbor</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2175em;"><span style="top:-2.4559em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.0572em;">j</span><span class="mbin mtight">−</span><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.3802em;"><span></span></span></span></span></span></span></span></span></span></span></p><p>For a social graph where average node ID delta is small (local community structure), varints encoding <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>δ</mi><mi>j</mi></msub></mrow><annotation encoding="application/x-tex">\delta_j</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.9805em;vertical-align:-0.2861em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.0379em;">δ</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-left:-0.0379em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0572em;">j</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em;"><span></span></span></span></span></span></span></span></span></span> reduces the <code>col_idx</code> array by 60–70% over fixed int32. This is what WebGraph and similar large-scale graph compression formats do.</p><hr><h2 id="vector-databases"><a class="header-anchor" href="#vector-databases">#</a>Vector databases</h2><p>This is where compression math gets structurally different. The compression is <strong>lossy</strong> — you trade exact distance preservation for storage and compute savings.</p><p><strong>Baseline: float32</strong></p><p>A <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>d</mi></mrow><annotation encoding="application/x-tex">d</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6944em;"></span><span class="mord mathnormal">d</span></span></span></span>-dimensional embedding stored as float32: <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>d</mi><mo>×</mo><mn>4</mn></mrow><annotation encoding="application/x-tex">d \times 4</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.7778em;vertical-align:-0.0833em;"></span><span class="mord mathnormal">d</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">×</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">4</span></span></span></span> bytes.</p><table><thead><tr><th>Model</th><th>Dimensions</th><th>Bytes/vector</th><th>10M vectors</th></tr></thead><tbody><tr><td>text-embedding-3-small</td><td>1536</td><td>6,144 B</td><td>61.4 GB</td></tr><tr><td>text-embedding-3-large</td><td>3072</td><td>12,288 B</td><td>122.9 GB</td></tr><tr><td>Cohere Embed v3</td><td>1024</td><td>4,096 B</td><td>41.0 GB</td></tr></tbody></table><p>Before the HNSW index. Add roughly <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>M</mi><mo>×</mo><mn>8</mn></mrow><annotation encoding="application/x-tex">M \times 8</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.7667em;vertical-align:-0.0833em;"></span><span class="mord mathnormal" style="margin-right:0.109em;">M</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">×</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">8</span></span></span></span> bytes per vector for the graph layer (M=16 is common), so another 1.3 GB for 10M vectors.</p><p><strong>Scalar quantization (SQ8)</strong></p><p>Map each float32 dimension to uint8 via per-dimension affine transform:</p><p class="katex-block "><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><msubsup><mi>x</mi><mi>q</mi><mrow><mo stretchy="false">(</mo><mi>d</mi><mo stretchy="false">)</mo></mrow></msubsup><mo>=</mo><mi mathvariant="normal">clamp</mi><mo>⁡</mo><mtext> ⁣</mtext><mrow><mo fence="true">(</mo><mi mathvariant="normal">round</mi><mo>⁡</mo><mtext> ⁣</mtext><mrow><mo fence="true">(</mo><mfrac><mrow><msup><mi>x</mi><mrow><mo stretchy="false">(</mo><mi>d</mi><mo stretchy="false">)</mo></mrow></msup><mo>−</mo><munder><mrow><mi>min</mi><mo>⁡</mo></mrow><mi>d</mi></munder></mrow><mrow><munder><mrow><mi>max</mi><mo>⁡</mo></mrow><mi>d</mi></munder><mo>−</mo><munder><mrow><mi>min</mi><mo>⁡</mo></mrow><mi>d</mi></munder></mrow></mfrac><mo>×</mo><mn>255</mn><mo fence="true">)</mo></mrow><mo separator="true">,</mo><mtext> </mtext><mn>0</mn><mo separator="true">,</mo><mtext> </mtext><mn>255</mn><mo fence="true">)</mo></mrow></mrow><annotation encoding="application/x-tex">x_q^{(d)} = \operatorname{clamp}\!\left(\operatorname{round}\!\left(\frac{x^{(d)} - \min_d}{\max_d - \min_d} \times 255\right),\, 0,\, 255\right)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.3211em;vertical-align:-0.3831em;"></span><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.938em;"><span style="top:-2.453em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0359em;">q</span></span></span><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mopen mtight">(</span><span class="mord mathnormal mtight">d</span><span class="mclose mtight">)</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.3831em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:2.515em;vertical-align:-0.95em;"></span><span class="mop"><span class="mord mathrm">clamp</span></span><span class="mspace" style="margin-right:-0.1667em;"></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="minner"><span class="mopen delimcenter" style="top:0em;"><span class="delimsizing size3">(</span></span><span class="mop"><span class="mord mathrm">round</span></span><span class="mspace" style="margin-right:-0.1667em;"></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="minner"><span class="mopen delimcenter" style="top:0em;"><span class="delimsizing size3">(</span></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.565em;"><span style="top:-2.314em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mop"><span class="mop">max</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em;"><span style="top:-2.55em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">d</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord">−</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mop"><span class="mop">min</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em;"><span style="top:-2.55em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">d</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.677em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.888em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mopen mtight">(</span><span class="mord mathnormal mtight">d</span><span class="mclose mtight">)</span></span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mop"><span class="mop">min</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em;"><span style="top:-2.55em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">d</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.836em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">×</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mord">255</span><span class="mclose delimcenter" style="top:0em;"><span class="delimsizing size3">)</span></span></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord">0</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord">255</span><span class="mclose delimcenter" style="top:0em;"><span class="delimsizing size3">)</span></span></span></span></span></span></span></p><p>Store <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mrow><mi>min</mi><mo>⁡</mo></mrow><mi>d</mi></msub></mrow><annotation encoding="application/x-tex">\min_d</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8179em;vertical-align:-0.15em;"></span><span class="mop"><span class="mop">min</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em;"><span style="top:-2.55em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">d</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span> and <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mrow><mi>max</mi><mo>⁡</mo></mrow><mi>d</mi></msub></mrow><annotation encoding="application/x-tex">\max_d</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.5806em;vertical-align:-0.15em;"></span><span class="mop"><span class="mop">max</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em;"><span style="top:-2.55em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">d</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span> per dimension (2 floats × <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>d</mi></mrow><annotation encoding="application/x-tex">d</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6944em;"></span><span class="mord mathnormal">d</span></span></span></span> = negligible). Each vector: <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>d</mi></mrow><annotation encoding="application/x-tex">d</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6944em;"></span><span class="mord mathnormal">d</span></span></span></span> bytes. Compression: <strong>4x</strong>. Recall@10 degradation on MSMARCO-style benchmarks: typically 0.5–1.5%.</p><p><strong>Product quantization (PQ)</strong></p><p>Split the <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>d</mi></mrow><annotation encoding="application/x-tex">d</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6944em;"></span><span class="mord mathnormal">d</span></span></span></span>-dimensional space into <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>m</mi></mrow><annotation encoding="application/x-tex">m</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal">m</span></span></span></span> sub-spaces of <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>d</mi><mi mathvariant="normal">/</mi><mi>m</mi></mrow><annotation encoding="application/x-tex">d/m</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal">d</span><span class="mord">/</span><span class="mord mathnormal">m</span></span></span></span> dimensions. For each sub-space, learn <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>k</mi></mrow><annotation encoding="application/x-tex">k</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6944em;"></span><span class="mord mathnormal" style="margin-right:0.0315em;">k</span></span></span></span> centroids offline via k-means. At inference, store the centroid index per sub-space: <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mrow><mi>log</mi><mo>⁡</mo></mrow><mn>2</mn></msub><mi>k</mi></mrow><annotation encoding="application/x-tex">\log_2 k</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.9386em;vertical-align:-0.2441em;"></span><span class="mop"><span class="mop">lo<span style="margin-right:0.0139em;">g</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.207em;"><span style="top:-2.4559em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2441em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord mathnormal" style="margin-right:0.0315em;">k</span></span></span></span> bits.</p><p>With <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>k</mi><mo>=</mo><mn>256</mn></mrow><annotation encoding="application/x-tex">k = 256</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6944em;"></span><span class="mord mathnormal" style="margin-right:0.0315em;">k</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">256</span></span></span></span> (8 bits per sub-space), each vector stores <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>m</mi></mrow><annotation encoding="application/x-tex">m</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal">m</span></span></span></span> bytes:</p><p class="katex-block "><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mtext>compression ratio</mtext><mo>=</mo><mfrac><mrow><mi>d</mi><mo>×</mo><mn>4</mn></mrow><mi>m</mi></mfrac></mrow><annotation encoding="application/x-tex">\text{compression ratio} = \frac{d \times 4}{m}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8623em;vertical-align:-0.1944em;"></span><span class="mord text"><span class="mord">compression ratio</span></span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:2.0574em;vertical-align:-0.686em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.3714em;"><span style="top:-2.314em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord mathnormal">m</span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.677em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord mathnormal">d</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">×</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mord">4</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.686em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span></span></p><p>For <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>d</mi><mo>=</mo><mn>1536</mn></mrow><annotation encoding="application/x-tex">d = 1536</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6944em;"></span><span class="mord mathnormal">d</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">1536</span></span></span></span>, <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>m</mi><mo>=</mo><mn>96</mn></mrow><annotation encoding="application/x-tex">m = 96</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal">m</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">96</span></span></span></span>: ratio = <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mfrac><mn>6144</mn><mn>96</mn></mfrac><mo>=</mo><mn>64</mn><mo>×</mo></mrow><annotation encoding="application/x-tex">\frac{6144}{96} = 64\times</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.1901em;vertical-align:-0.345em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8451em;"><span style="top:-2.655em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">96</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">6144</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.7278em;vertical-align:-0.0833em;"></span><span class="mord">64</span><span class="mord">×</span></span></span></span>. The 10M float32 corpus shrinks from 61.4 GB to <strong>960 MB</strong>.</p><p>Distance at query time uses <strong>asymmetric distance computation (ADC)</strong>: precompute a lookup table of distances from the query sub-vector to all <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>k</mi></mrow><annotation encoding="application/x-tex">k</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6944em;"></span><span class="mord mathnormal" style="margin-right:0.0315em;">k</span></span></span></span> centroids in each sub-space (<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>m</mi><mo>×</mo><mi>k</mi></mrow><annotation encoding="application/x-tex">m \times k</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6667em;vertical-align:-0.0833em;"></span><span class="mord mathnormal">m</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">×</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.6944em;"></span><span class="mord mathnormal" style="margin-right:0.0315em;">k</span></span></span></span> entries), then approximate each database vector’s distance as a sum of <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>m</mi></mrow><annotation encoding="application/x-tex">m</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal">m</span></span></span></span> table lookups — no multiply-accumulate, just integer adds and memory reads.</p><p class="katex-block "><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mi mathvariant="normal">∥</mi><mi>q</mi><mo>−</mo><mi>x</mi><msup><mi mathvariant="normal">∥</mi><mn>2</mn></msup><mo>≈</mo><munderover><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>m</mi></munderover><mtext>dist_table</mtext><mo stretchy="false">[</mo><mi>i</mi><mo stretchy="false">]</mo><mo stretchy="false">[</mo><mtext>code</mtext><mo stretchy="false">[</mo><mi>i</mi><mo stretchy="false">]</mo><mo stretchy="false">[</mo><mi>x</mi><mo stretchy="false">]</mo><mo stretchy="false">]</mo></mrow><annotation encoding="application/x-tex">\|q - x\|^2 \approx \sum_{i=1}^{m} \text{dist\_table}[i][\text{code}[i][x]]</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord">∥</span><span class="mord mathnormal" style="margin-right:0.0359em;">q</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:1.1141em;vertical-align:-0.25em;"></span><span class="mord mathnormal">x</span><span class="mord"><span class="mord">∥</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8641em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">≈</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:2.9291em;vertical-align:-1.2777em;"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.6514em;"><span style="top:-1.8723em;margin-left:0em;"><span class="pstrut" style="height:3.05em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">i</span><span class="mrel mtight">=</span><span class="mord mtight">1</span></span></span></span><span style="top:-3.05em;"><span class="pstrut" style="height:3.05em;"></span><span><span class="mop op-symbol large-op">∑</span></span></span><span style="top:-4.3em;margin-left:0em;"><span class="pstrut" style="height:3.05em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">m</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:1.2777em;"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord text"><span class="mord">dist_table</span></span><span class="mopen">[</span><span class="mord mathnormal">i</span><span class="mclose">]</span><span class="mopen">[</span><span class="mord text"><span class="mord">code</span></span><span class="mopen">[</span><span class="mord mathnormal">i</span><span class="mclose">]</span><span class="mopen">[</span><span class="mord mathnormal">x</span><span class="mclose">]]</span></span></span></span></span></p><p>PQ recall degrades more than SQ8 — typically 3–8% at Recall@10 for <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>m</mi><mo>=</mo><mi>d</mi><mi mathvariant="normal">/</mi><mn>16</mn></mrow><annotation encoding="application/x-tex">m = d/16</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal">m</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal">d</span><span class="mord">/16</span></span></span></span>. The tradeoff is navigated by adjusting <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>m</mi></mrow><annotation encoding="application/x-tex">m</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal">m</span></span></span></span> (more sub-spaces = better recall, larger index).</p><p><strong>Binary quantization (BQ)</strong></p><p class="katex-block "><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><msubsup><mi>x</mi><mi>b</mi><mrow><mo stretchy="false">(</mo><mi>d</mi><mo stretchy="false">)</mo></mrow></msubsup><mo>=</mo><mn mathvariant="double-struck">1</mn><mo stretchy="false">[</mo><msup><mi>x</mi><mrow><mo stretchy="false">(</mo><mi>d</mi><mo stretchy="false">)</mo></mrow></msup><mo>&gt;</mo><mn>0</mn><mo stretchy="false">]</mo><mo>∈</mo><mo stretchy="false">{</mo><mn>0</mn><mo separator="true">,</mo><mn>1</mn><mo stretchy="false">}</mo></mrow><annotation encoding="application/x-tex">x_b^{(d)} = \mathbb{1}[x^{(d)} &gt; 0] \in \{0, 1\}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.3461em;vertical-align:-0.3013em;"></span><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.0448em;"><span style="top:-2.3987em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">b</span></span></span><span style="top:-3.2198em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mopen mtight">(</span><span class="mord mathnormal mtight">d</span><span class="mclose mtight">)</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.3013em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1.188em;vertical-align:-0.25em;"></span><span class="mord">1</span><span class="mopen">[</span><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.938em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mopen mtight">(</span><span class="mord mathnormal mtight">d</span><span class="mclose mtight">)</span></span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">&gt;</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord">0</span><span class="mclose">]</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">∈</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mopen">{</span><span class="mord">0</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord">1</span><span class="mclose">}</span></span></span></span></span></p><p>Storage: <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>d</mi><mi mathvariant="normal">/</mi><mn>8</mn></mrow><annotation encoding="application/x-tex">d/8</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal">d</span><span class="mord">/8</span></span></span></span> bytes per vector. <strong>32x compression</strong>. Distance: Hamming distance via <code>POPCNT</code>, which CPUs execute in a single instruction per 64-bit word.</p><p class="katex-block "><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mtext>hamming</mtext><mo stretchy="false">(</mo><msub><mi>q</mi><mi>b</mi></msub><mo separator="true">,</mo><msub><mi>x</mi><mi>b</mi></msub><mo stretchy="false">)</mo><mo>=</mo><mtext>popcount</mtext><mo stretchy="false">(</mo><msub><mi>q</mi><mi>b</mi></msub><mo>⊕</mo><msub><mi>x</mi><mi>b</mi></msub><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">\text{hamming}(q_b, x_b) = \text{popcount}(q_b \oplus x_b)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord text"><span class="mord">hamming</span></span><span class="mopen">(</span><span class="mord"><span class="mord mathnormal" style="margin-right:0.0359em;">q</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em;"><span style="top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">b</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">b</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord text"><span class="mord">popcount</span></span><span class="mopen">(</span><span class="mord"><span class="mord mathnormal" style="margin-right:0.0359em;">q</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em;"><span style="top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">b</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">⊕</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">b</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span></span></p><p>BQ works well only for embedding models with approximately symmetric dimension distributions (Matryoshka representations, Cohere Embed v3 trained with BQ in mind). On general-purpose embeddings it loses 5–15% Recall@10. Qdrant, Weaviate, and Vespa all support it; Pinecone does not as of this writing.</p><hr><h2 id="summary"><a class="header-anchor" href="#summary">#</a>Summary</h2><table><thead><tr><th>Engine</th><th>Primary technique</th><th>Typical ratio</th><th>Lossy?</th></tr></thead><tbody><tr><td>PostgreSQL (row)</td><td>LZ4 / ZSTD on TOAST values</td><td>2–5x</td><td>No</td></tr><tr><td>PostgreSQL (columnar)</td><td>Dictionary + LZ4/ZSTD</td><td>10–50x on low-cardinality</td><td>No</td></tr><tr><td>ScyllaDB</td><td>Chunk-level LZ4/ZSTD on SSTables</td><td>3–8x</td><td>No</td></tr><tr><td>Graph (CSR + delta)</td><td>Delta-varint encoding</td><td>3–5x over adjacency list</td><td>No</td></tr><tr><td>Vector (SQ8)</td><td>Per-dimension affine quantization</td><td>4x</td><td>Yes (&lt; 2% recall loss)</td></tr><tr><td>Vector (PQ)</td><td>Sub-space centroid codes</td><td>16–128x</td><td>Yes (3–8% recall loss)</td></tr><tr><td>Vector (BQ)</td><td>Sign quantization + Hamming</td><td>32x</td><td>Yes (model-dependent)</td></tr></tbody></table><p>The RDBMS and ScyllaDB cases are lossless — the data that comes out matches what went in, the compressor is just exploiting redundancy in the byte stream. Vector quantization is structurally different: you are approximating the metric space, and the approximation quality is measurable (Recall@k) and tunable. Getting the tuning right requires knowing both your distance distribution and your recall SLO before you pick <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>m</mi></mrow><annotation encoding="application/x-tex">m</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal">m</span></span></span></span> or the quantization scheme.</p><hr><p><em>Primary sources: <a href="https://github.com/facebook/zstd">Zstd compression levels</a>, ScyllaDB <a href="https://docs.scylladb.com/stable/operating-scylla/admin-tools/sstable/compression.html">Compression documentation</a>, <a href="https://inria.hal.science/inria-00514462/document">Product Quantization — Jégou et al. 2011</a>, <a href="https://qdrant.tech/documentation/guides/quantization/">Qdrant quantization docs</a>.</em></p>]]></content>
    
    
    <summary type="html">Shannon gives you the floor. Every database engine then makes different tradeoffs between compression ratio, read amplification, and CPU cost. Here is the math for four storage architectures.</summary>
    
    
    
    <category term="Data" scheme="https://animesh.cloud/categories/data/"/>
    
    
    <category term="compression" scheme="https://animesh.cloud/tags/compression/"/>
    
    <category term="information-theory" scheme="https://animesh.cloud/tags/information-theory/"/>
    
    <category term="postgresql" scheme="https://animesh.cloud/tags/postgresql/"/>
    
    <category term="scylladb" scheme="https://animesh.cloud/tags/scylladb/"/>
    
    <category term="vector-db" scheme="https://animesh.cloud/tags/vector-db/"/>
    
    <category term="graph-db" scheme="https://animesh.cloud/tags/graph-db/"/>
    
  </entry>
  
  <entry>
    <title>Math is substrate</title>
    <link href="https://animesh.cloud/2025/04/20/math-is-substrate/"/>
    <id>https://animesh.cloud/2025/04/20/math-is-substrate/</id>
    <published>2025-04-20T18:30:00.000Z</published>
    <updated>2026-05-30T19:09:09.693Z</updated>
    
    <content type="html"><![CDATA[<p>There is a view of mathematics as a tool — a language humans invented to describe patterns they observed. Useful, precise, but ultimately a map drawn by mapmakers. The territory is physical; the map is symbolic.</p><p>I don’t hold that view. I think the map <em>is</em> the territory.</p><hr><h2 id="the-hierarchy"><a class="header-anchor" href="#the-hierarchy">#</a>The hierarchy</h2><p>Physics is applied mathematics. Every physical law is a mathematical statement: <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>F</mi><mo>=</mo><mi>m</mi><mi>a</mi></mrow><annotation encoding="application/x-tex">F = ma</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord mathnormal" style="margin-right:0.1389em;">F</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal">ma</span></span></span></span>, <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>E</mi><mo>=</mo><mi>m</mi><msup><mi>c</mi><mn>2</mn></msup></mrow><annotation encoding="application/x-tex">E = mc^2</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord mathnormal" style="margin-right:0.0576em;">E</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.8141em;"></span><span class="mord mathnormal">m</span><span class="mord"><span class="mord mathnormal">c</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span></span></span></span>, <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>i</mi><mi mathvariant="normal">ℏ</mi><mi mathvariant="normal">∂</mi><mi>ψ</mi><mi mathvariant="normal">/</mi><mi mathvariant="normal">∂</mi><mi>t</mi><mo>=</mo><mover accent="true"><mi>H</mi><mo>^</mo></mover><mi>ψ</mi></mrow><annotation encoding="application/x-tex">i\hbar \partial\psi/\partial t = \hat{H}\psi</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal">i</span><span class="mord">ℏ</span><span class="mord" style="margin-right:0.0556em;">∂</span><span class="mord mathnormal" style="margin-right:0.0359em;">ψ</span><span class="mord">/</span><span class="mord" style="margin-right:0.0556em;">∂</span><span class="mord mathnormal">t</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1.1412em;vertical-align:-0.1944em;"></span><span class="mord accent"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.9468em;"><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord mathnormal" style="margin-right:0.0813em;">H</span></span><span style="top:-3.2523em;"><span class="pstrut" style="height:3em;"></span><span class="accent-body" style="left:-0.1944em;"><span class="mord">^</span></span></span></span></span></span></span><span class="mord mathnormal" style="margin-right:0.0359em;">ψ</span></span></span></span>. Remove the math and there is no physics — just hand-waving about objects moving and energy changing. The math is not a description of the physics; the math <em>is</em> the physics, made legible.</p><p>Chemistry is applied physics. Molecular geometry is quantum mechanical probability distributions. The bond angles in water (<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mn>104.5</mn><mi mathvariant="normal">°</mi></mrow><annotation encoding="application/x-tex">104.5°</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6944em;"></span><span class="mord">104.5°</span></span></span></span>) are not arbitrary — they are the solution to a Schrödinger equation for the electron configuration of oxygen. The Gibbs free energy <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="normal">Δ</mi><mi>G</mi><mo>=</mo><mi mathvariant="normal">Δ</mi><mi>H</mi><mo>−</mo><mi>T</mi><mi mathvariant="normal">Δ</mi><mi>S</mi></mrow><annotation encoding="application/x-tex">\Delta G = \Delta H - T\Delta S</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord">Δ</span><span class="mord mathnormal">G</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.7667em;vertical-align:-0.0833em;"></span><span class="mord">Δ</span><span class="mord mathnormal" style="margin-right:0.0813em;">H</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord mathnormal" style="margin-right:0.1389em;">T</span><span class="mord">Δ</span><span class="mord mathnormal" style="margin-right:0.0576em;">S</span></span></span></span> decides which reactions happen. Thermodynamics is just statistical mechanics applied to many-body quantum systems. Chemistry is physics at the molecular scale, which is mathematics at the molecular scale.</p><p>Biology is applied chemistry. The double helix is stable because of hydrogen bond energetics, which are electrostatic, which are quantum mechanical, which are mathematical. Protein folding — the problem that occupied structural biology for fifty years — is an optimisation problem over an energy landscape defined by physical forces that are defined by mathematics. Evolution is a search algorithm over genotype space. The logistic growth equation <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>d</mi><mi>N</mi><mi mathvariant="normal">/</mi><mi>d</mi><mi>t</mi><mo>=</mo><mi>r</mi><mi>N</mi><mo stretchy="false">(</mo><mn>1</mn><mo>−</mo><mi>N</mi><mi mathvariant="normal">/</mi><mi>K</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">dN/dt = rN(1 - N/K)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal">d</span><span class="mord mathnormal" style="margin-right:0.109em;">N</span><span class="mord">/</span><span class="mord mathnormal">d</span><span class="mord mathnormal">t</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal" style="margin-right:0.0278em;">r</span><span class="mord mathnormal" style="margin-right:0.109em;">N</span><span class="mopen">(</span><span class="mord">1</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal" style="margin-right:0.109em;">N</span><span class="mord">/</span><span class="mord mathnormal" style="margin-right:0.0715em;">K</span><span class="mclose">)</span></span></span></span> emerges from the same conservation-of-resources math as any constrained optimisation.</p><p>Life is emergent biology. Consciousness, culture, language, cities — all of it is patterns running on biological substrate, which is chemical substrate, which is physical substrate, which is mathematical substrate.</p><hr><h2 id="what-this-means-practically"><a class="header-anchor" href="#what-this-means-practically">#</a>What this means practically</h2><p>If you accept this view, then learning mathematics is not a career investment. It is literacy. An engineer who cannot read mathematics is in the same position as an engineer who cannot read — they can do a great deal by pattern-matching and imitation, but they cannot go to first principles when imitation fails.</p><p>Most engineering failures I have seen were not failures of implementation. They were failures of model. Someone built the wrong thing, correctly. The wrong model was usually not a software architecture mistake; it was a mathematical mistake — an assumption that did not hold, a distribution that was not stationary, a latency budget that did not add up.</p><p>The habit I am trying to build on this blog: before writing the code, write the equation. If you cannot write the equation, you do not understand the system.</p><hr><p><em>This is a note, not an argument. I am not claiming to have proved anything. I am reporting what working in production data systems for fourteen years has made me believe.</em></p>]]></content>
    
    
    <summary type="html">A note on why I think mathematics is not a language we invented to describe nature, but the structure nature is made of.</summary>
    
    
    
    <category term="Life" scheme="https://animesh.cloud/categories/life/"/>
    
    
    <category term="philosophy" scheme="https://animesh.cloud/tags/philosophy/"/>
    
    <category term="mathematics" scheme="https://animesh.cloud/tags/mathematics/"/>
    
    <category term="first-principles" scheme="https://animesh.cloud/tags/first-principles/"/>
    
  </entry>
  
  <entry>
    <title>Inside a BNPL fraud score: the 300 ms budget and where it goes</title>
    <link href="https://animesh.cloud/2025/03/07/bnpl-fraud-score-300ms/"/>
    <id>https://animesh.cloud/2025/03/07/bnpl-fraud-score-300ms/</id>
    <published>2025-03-07T18:30:00.000Z</published>
    <updated>2026-05-30T19:09:09.678Z</updated>
    
    <content type="html"><![CDATA[<p>A BNPL checkout approval has one constraint that shapes every architectural decision: the latency budget. The user tapped “Pay Later.” Their thumb is already moving toward the confirm button. Between 150 and 300 milliseconds from now, a spinner that stays visible too long stops feeling like “loading” and starts feeling like “something is wrong.”</p><p>Inside that window the risk engine must fetch a feature vector, score it, apply rule vetoes, and commit a decision your compliance team can defend twelve months later.</p><hr><h2 id="where-the-300-ms-goes"><a class="header-anchor" href="#where-the-300-ms-goes">#</a>Where the 300 ms goes</h2><table><thead><tr><th>Step</th><th>Budget</th><th>Bottleneck</th></tr></thead><tbody><tr><td>Feature fetch (Redis)</td><td>5–12 ms</td><td>Network RTT + serialisation</td></tr><tr><td>Feature fetch (Postgres)</td><td>8–25 ms</td><td>Index scan + connection pool</td></tr><tr><td>Model inference</td><td>2–8 ms</td><td>Serialised feature vector</td></tr><tr><td>Rule engine</td><td>&lt; 1 ms</td><td>In-memory</td></tr><tr><td>Audit write (async)</td><td>off critical path</td><td>Kafka producer</td></tr><tr><td>Network + overhead</td><td>10–20 ms</td><td>Service mesh, TLS</td></tr><tr><td><strong>Total p99 budget</strong></td><td><strong>&lt; 300 ms</strong></td><td></td></tr></tbody></table><p>The model is not the bottleneck. The feature fetch is.</p><hr><h2 id="the-feature-staleness-tradeoff"><a class="header-anchor" href="#the-feature-staleness-tradeoff">#</a>The feature staleness tradeoff</h2><p>A Redis lookup costs ~1–3 ms and returns a feature vector that is only as fresh as the upstream stream processor. If the user made a transaction 4 seconds ago that would change their velocity feature, and the Kafka consumer lag is 6 seconds, the model scores a stale vector.</p><p>A Postgres lookup costs 8–25 ms and returns fresh data — but at p99, under connection pool pressure, it can spike to 80 ms and breach the SLO.</p><p>The decision: which features are read from Redis (fast, slightly stale) and which from Postgres (slow, fresh) is the primary engineering decision in a real-time risk system. It is not a data science decision. It is a latency-vs-staleness tradeoff that only the engineer who has seen both p99s can make correctly.</p><p>The math is not complicated:</p><p class="katex-block "><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mtext>expected loss from staleness</mtext><mo>=</mo><mi>P</mi><mo stretchy="false">(</mo><mtext>fraud</mtext><mo>∣</mo><mtext>stale feature</mtext><mo stretchy="false">)</mo><mo>×</mo><mtext>loss per fraud</mtext><mo>−</mo><mi>P</mi><mo stretchy="false">(</mo><mtext>fraud</mtext><mo>∣</mo><mtext>fresh feature</mtext><mo stretchy="false">)</mo><mo>×</mo><mtext>loss per fraud</mtext></mrow><annotation encoding="application/x-tex">\text{expected loss from staleness} = P(\text{fraud} \mid \text{stale feature}) \times \text{loss per fraud} - P(\text{fraud} \mid \text{fresh feature}) \times \text{loss per fraud}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em;"></span><span class="mord text"><span class="mord">expected loss from staleness</span></span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal" style="margin-right:0.1389em;">P</span><span class="mopen">(</span><span class="mord text"><span class="mord">fraud</span></span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">∣</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord text"><span class="mord">stale feature</span></span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">×</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em;"></span><span class="mord text"><span class="mord">loss per fraud</span></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal" style="margin-right:0.1389em;">P</span><span class="mopen">(</span><span class="mord text"><span class="mord">fraud</span></span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">∣</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord text"><span class="mord">fresh feature</span></span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">×</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em;"></span><span class="mord text"><span class="mord">loss per fraud</span></span></span></span></span></span></p><p>If the staleness window is 6 seconds and the fraud velocity feature changes meaningfully in 6 seconds for fewer than 0.1% of sessions, the expected loss is smaller than the p99 latency cost of going to Postgres.</p><hr><h2 id="what-the-audit-trail-requires"><a class="header-anchor" href="#what-the-audit-trail-requires">#</a>What the audit trail requires</h2><p>Every decision must be auditable: which model version, which features, which rules fired, what the score was, what the outcome was. This sounds obvious until you realise that if you store the feature vector at decision time, you own it — and if a regulator asks why a user was declined eighteen months later, you need to reconstruct the exact state of the world at that moment.</p><p>The audit write is off the critical path (Kafka producer, fire-and-forget with at-least-once delivery). The consumer writes to an append-only Postgres table. The feature snapshot is stored as JSONB alongside the decision.</p><hr><p><em>This is a working note. A longer post with the full data model, the rule engine architecture, and the false-positive economics is in progress.</em></p>]]></content>
    
    
    <summary type="html">A BNPL approval has one hard constraint: the user&#39;s thumb is already moving. Here is how the latency budget gets spent, and why the feature fetch — not the model — is the bottleneck.</summary>
    
    
    
    <category term="Fintech" scheme="https://animesh.cloud/categories/fintech/"/>
    
    
    <category term="bnpl" scheme="https://animesh.cloud/tags/bnpl/"/>
    
    <category term="fraud-detection" scheme="https://animesh.cloud/tags/fraud-detection/"/>
    
    <category term="risk-engine" scheme="https://animesh.cloud/tags/risk-engine/"/>
    
    <category term="feature-store" scheme="https://animesh.cloud/tags/feature-store/"/>
    
    <category term="latency" scheme="https://animesh.cloud/tags/latency/"/>
    
  </entry>
  
  <entry>
    <title>ScyllaDB vs Cassandra: what the p99 actually looks like at fintech scale</title>
    <link href="https://animesh.cloud/2025/02/11/scylladb-vs-cassandra-latency/"/>
    <id>https://animesh.cloud/2025/02/11/scylladb-vs-cassandra-latency/</id>
    <published>2025-02-11T18:30:00.000Z</published>
    <updated>2026-05-30T19:09:09.658Z</updated>
    
    <content type="html"><![CDATA[<p>We ran Apache Cassandra in production for two years before migrating the user-identity lookup path to ScyllaDB. The decision was not made from a benchmark blog post. It was made after watching a p99 read latency of 180 ms on a 3-node Cassandra cluster serve a path that had a 50 ms SLO.</p><p>This post is a working note on what we measured, why Cassandra behaved the way it did, and what changed after the migration. A longer post with the full LSM-tree internals and compaction math is in progress.</p><hr><h2 id="the-problem-in-one-number"><a class="header-anchor" href="#the-problem-in-one-number">#</a>The problem in one number</h2><p>A user-identity lookup on the BNPL approval path had a budget of 50 ms. Cassandra was hitting p99 of 180 ms under load — 3.6x over budget — despite the cluster being at roughly 30% CPU utilisation.</p><p>The symptom was GC pauses. Cassandra’s JVM heap was collecting under read pressure, and the stop-the-world pauses were showing up directly in the tail latency.</p><p>ScyllaDB is a C++ reimplementation of the Cassandra storage engine with a shard-per-core architecture and no JVM. The GC pause problem is structurally absent.</p><hr><h2 id="what-the-migration-changed"><a class="header-anchor" href="#what-the-migration-changed">#</a>What the migration changed</h2><table><thead><tr><th>Metric</th><th>Cassandra (3-node)</th><th>ScyllaDB (3-node)</th></tr></thead><tbody><tr><td>p50 read latency</td><td>4 ms</td><td>1.8 ms</td></tr><tr><td>p99 read latency</td><td>180 ms</td><td>9 ms</td></tr><tr><td>p99.9 read latency</td><td>340 ms</td><td>22 ms</td></tr><tr><td>CPU at peak QPS</td><td>31%</td><td>18%</td></tr></tbody></table><p>Same hardware, same data model, same replication factor. The p99 improvement is 20x. The p99.9 improvement is 15x.</p><hr><h2 id="why-cassandra’s-p99-drifts-at-load"><a class="header-anchor" href="#why-cassandra’s-p99-drifts-at-load">#</a>Why Cassandra’s p99 drifts at load</h2><p>Cassandra’s read path merges data from the memtable and potentially multiple SSTables (after compaction, ideally one — but compaction is async and never perfectly caught up). Each SSTable read involves a bloom filter check, a partition index lookup, and a block read from disk or page cache.</p><p>Under concurrent read pressure, the JVM heap fills with bloom filter and index structures. When the GC fires — even a minor collection — every in-flight read on that node pauses. The pause duration is proportional to heap pressure, which is proportional to read concurrency.</p><p>The math is simple: if GC fires every <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>T</mi></mrow><annotation encoding="application/x-tex">T</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord mathnormal" style="margin-right:0.1389em;">T</span></span></span></span> seconds and pauses for <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="normal">Δ</mi><mi>t</mi></mrow><annotation encoding="application/x-tex">\Delta t</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord">Δ</span><span class="mord mathnormal">t</span></span></span></span> milliseconds, any request in flight during that window takes at least <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="normal">Δ</mi><mi>t</mi></mrow><annotation encoding="application/x-tex">\Delta t</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord">Δ</span><span class="mord mathnormal">t</span></span></span></span> ms extra. At p99 and p99.9, requests are almost certain to hit at least one GC event across their lifetime.</p><p>ScyllaDB eliminates this by allocating off-heap (no GC) and using a seastar reactor per core with cooperative scheduling. Tail latency is bounded by I/O, not by the runtime.</p><hr><p><em>Full post coming: LSM-tree compaction strategies, bloom filter false-positive rates, and the exact data model we used. Numbers above are from a production environment, redacted for specifics.</em></p>]]></content>
    
    
    <summary type="html">We ran both in production. The headline numbers are real but the context around them matters more than the benchmarks vendors publish.</summary>
    
    
    
    <category term="Data" scheme="https://animesh.cloud/categories/data/"/>
    
    
    <category term="scylladb" scheme="https://animesh.cloud/tags/scylladb/"/>
    
    <category term="latency" scheme="https://animesh.cloud/tags/latency/"/>
    
    <category term="cassandra" scheme="https://animesh.cloud/tags/cassandra/"/>
    
    <category term="nosql" scheme="https://animesh.cloud/tags/nosql/"/>
    
    <category term="lsm-tree" scheme="https://animesh.cloud/tags/lsm-tree/"/>
    
  </entry>
  
  <entry>
    <title>The KV cache miss your load balancer caused</title>
    <link href="https://animesh.cloud/2025/01/14/llm-inference-kv-cache-routing/"/>
    <id>https://animesh.cloud/2025/01/14/llm-inference-kv-cache-routing/</id>
    <published>2025-01-14T18:30:00.000Z</published>
    <updated>2025-01-14T18:30:00.000Z</updated>
    
    <content type="html"><![CDATA[<p>The prefill for a 6,000-token enterprise system prompt on Qwen3-32B takes about 4.3 seconds cold. With KV cache, the second request for that same prefix takes 0.6 seconds. You lose 3.7 seconds every time a request lands on a pod that doesn’t hold the cache.</p><p>In an eight-pod cluster with round-robin routing, that miss rate is 87.5%. Most of your cluster’s prefill compute is re-deriving attention for tokens you already computed — on a different pod, five milliseconds ago.</p><p><a href="https://github.com/llm-d/llm-d">llm-d</a> is a Kubernetes inference scheduler that makes routing KV-cache-aware. On 16 H100s running Qwen3-32B, it moved p90 TTFT from 92 seconds to 0.54 seconds with no hardware changes. 170x. The rest of this post is the explanation.</p><hr><h2 id="the-cost-of-a-kv-cache-miss"><a class="header-anchor" href="#the-cost-of-a-kv-cache-miss">#</a>The cost of a KV cache miss</h2><p>A transformer computes attention over the full context at every forward pass. For each new output token, it needs the key and value projections for every prior token in the sequence. The KV cache stores those projections so they don’t have to be recomputed.</p><p>For a model with <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>n</mi><mi>L</mi></msub></mrow><annotation encoding="application/x-tex">n_L</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.5806em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathnormal">n</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3283em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">L</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span> layers, <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>n</mi><mi>h</mi></msub></mrow><annotation encoding="application/x-tex">n_h</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.5806em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathnormal">n</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">h</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span> attention heads (after GQA/MQA), and head dimension <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>d</mi><mi>h</mi></msub></mrow><annotation encoding="application/x-tex">d_h</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8444em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathnormal">d</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">h</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span>, caching a prefix of <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>L</mi></mrow><annotation encoding="application/x-tex">L</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord mathnormal">L</span></span></span></span> tokens costs:</p><p class="katex-block "><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mtext>KV memory</mtext><mo>=</mo><mn>2</mn><mo>⋅</mo><msub><mi>n</mi><mi>L</mi></msub><mo>⋅</mo><msub><mi>n</mi><mi>h</mi></msub><mo>⋅</mo><msub><mi>d</mi><mi>h</mi></msub><mo>⋅</mo><mi>L</mi><mo>⋅</mo><mtext>bytes_per_element</mtext></mrow><annotation encoding="application/x-tex">\text{KV memory} = 2 \cdot n_L \cdot n_h \cdot d_h \cdot L \cdot \text{bytes\_per\_element}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em;"></span><span class="mord text"><span class="mord">KV memory</span></span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">2</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">⋅</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.5945em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathnormal">n</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3283em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">L</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">⋅</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.5945em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathnormal">n</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">h</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">⋅</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.8444em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathnormal">d</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">h</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">⋅</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord mathnormal">L</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">⋅</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:1.0044em;vertical-align:-0.31em;"></span><span class="mord text"><span class="mord">bytes_per_element</span></span></span></span></span></span></p><p>The factor of 2 is keys and values. In practice, vLLM blocks KV memory in 128-token chunks and manages it as a paged allocator — the block is the unit of both storage and cache lookup.</p><p>The computation savings on a cache hit are direct: if a request carries a prefix of <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>L</mi><mi>p</mi></msub></mrow><annotation encoding="application/x-tex">L_p</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.9694em;vertical-align:-0.2861em;"></span><span class="mord"><span class="mord mathnormal">L</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">p</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em;"><span></span></span></span></span></span></span></span></span></span> cached tokens and <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>L</mi><mi>n</mi></msub></mrow><annotation encoding="application/x-tex">L_n</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8333em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathnormal">L</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">n</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span> new tokens, the fraction of prefill work that disappears is:</p><p class="katex-block "><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mi>η</mi><mo>=</mo><mfrac><msub><mi>L</mi><mi>p</mi></msub><mrow><msub><mi>L</mi><mi>p</mi></msub><mo>+</mo><msub><mi>L</mi><mi>n</mi></msub></mrow></mfrac></mrow><annotation encoding="application/x-tex">\eta = \frac{L_p}{L_p + L_n}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.1944em;"></span><span class="mord mathnormal" style="margin-right:0.0359em;">η</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:2.3324em;vertical-align:-0.9721em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.3603em;"><span style="top:-2.314em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord mathnormal">L</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">p</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mord"><span class="mord mathnormal">L</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">n</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.677em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord mathnormal">L</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">p</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em;"><span></span></span></span></span></span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.9721em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span></span></p><p>With a 6,000-token system prompt and 1,200-token query: <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>η</mi><mo>=</mo><mn>6000</mn><mi mathvariant="normal">/</mi><mn>7200</mn><mo>=</mo><mn>0.83</mn></mrow><annotation encoding="application/x-tex">\eta = 6000 / 7200 = 0.83</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.1944em;"></span><span class="mord mathnormal" style="margin-right:0.0359em;">η</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord">6000/7200</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">0.83</span></span></span></span>. Eighty-three percent of the prefill computation is free if the cache hits. Anthropic prices this directly: $3.00 per million uncached input tokens vs. $0.30 cached — a 10x cost difference that is a direct readout of the compute ratio.</p><hr><h2 id="why-naive-routing-destroys-cache-locality"><a class="header-anchor" href="#why-naive-routing-destroys-cache-locality">#</a>Why naive routing destroys cache locality</h2><p>vLLM’s prefix cache is in-process. Each pod maintains its own KV block index: a hash map from a 64-bit hash of each 128-token block’s content to a GPU memory address. A cache hit requires that the pod serving the request holds the exact prefix in its local cache.</p><p>In a cluster of <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>k</mi></mrow><annotation encoding="application/x-tex">k</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6944em;"></span><span class="mord mathnormal" style="margin-right:0.0315em;">k</span></span></span></span> pods with round-robin routing, a prefix cached on exactly one pod gets a hit with probability:</p><p class="katex-block "><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mi>P</mi><mo stretchy="false">(</mo><mtext>hit</mtext><mo>∣</mo><mtext>round-robin</mtext><mo stretchy="false">)</mo><mo>=</mo><mfrac><mn>1</mn><mi>k</mi></mfrac></mrow><annotation encoding="application/x-tex">P(\text{hit} \mid \text{round-robin}) = \frac{1}{k}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal" style="margin-right:0.1389em;">P</span><span class="mopen">(</span><span class="mord text"><span class="mord">hit</span></span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">∣</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord text"><span class="mord">round-robin</span></span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:2.0074em;vertical-align:-0.686em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.3214em;"><span style="top:-2.314em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.0315em;">k</span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.677em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">1</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.686em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span></span></p><p>For <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>k</mi><mo>=</mo><mn>8</mn></mrow><annotation encoding="application/x-tex">k = 8</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6944em;"></span><span class="mord mathnormal" style="margin-right:0.0315em;">k</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">8</span></span></span></span>: 12.5%. Every other request recomputes from scratch. With cache-aware routing that steers each request to the pod with the highest prefix match:</p><p class="katex-block "><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mi>P</mi><mo stretchy="false">(</mo><mtext>hit</mtext><mo>∣</mo><mtext>precise</mtext><mo stretchy="false">)</mo><mo>≈</mo><mn>1.0</mn></mrow><annotation encoding="application/x-tex">P(\text{hit} \mid \text{precise}) \approx 1.0</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal" style="margin-right:0.1389em;">P</span><span class="mopen">(</span><span class="mord text"><span class="mord">hit</span></span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">∣</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord text"><span class="mord">precise</span></span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">≈</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">1.0</span></span></span></span></span></p><p>provided the cluster has enough total KV capacity for the working set. At 73% utilization in the benchmark below, it does.</p><p>Expected savings shift from <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mn>0.125</mn><mo>×</mo><mn>0.83</mn><mo>=</mo><mn>10</mn><mi mathvariant="normal">%</mi></mrow><annotation encoding="application/x-tex">0.125 \times 0.83 = 10\%</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.7278em;vertical-align:-0.0833em;"></span><span class="mord">0.125</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">×</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">0.83</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.8056em;vertical-align:-0.0556em;"></span><span class="mord">10%</span></span></span></span> to <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mn>1.0</mn><mo>×</mo><mn>0.83</mn><mo>=</mo><mn>83</mn><mi mathvariant="normal">%</mi></mrow><annotation encoding="application/x-tex">1.0 \times 0.83 = 83\%</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.7278em;vertical-align:-0.0833em;"></span><span class="mord">1.0</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">×</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">0.83</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.8056em;vertical-align:-0.0556em;"></span><span class="mord">83%</span></span></span></span> of prefill work per request — an 8x difference in expectation. The measured TTFT improvement is 170x because of queueing amplification: a cache miss causes a full prefill run, which holds GPU memory longer, which reduces available batch slots, which delays other requests, which fills the queue. The system cascades. The vLLM wait queue in the benchmark averaged 27 requests under random routing and 0.1 requests under precise routing — the entire difference in TTFT is driven by queuing, not raw compute.</p><hr><h2 id="what-llm-d-does"><a class="header-anchor" href="#what-llm-d-does">#</a>What llm-d does</h2><p>llm-d adds a routing layer above vLLM on Kubernetes. Three components:</p><p><strong>InferencePool</strong> is a Kubernetes CRD grouping pods that serve the same model — a “KV-aware Service.” It is the unit of routing policy and is being standardized in the <a href="https://gateway-api.sigs.k8s.io/">Kubernetes Gateway API Inference Extension</a> SIG.</p><p><strong>Proxy</strong> is a standard L7 proxy (Envoy, Istio, or cloud-managed ALB) that handles connection management and TLS. It delegates the routing decision to the EPP via Envoy’s <code>ext-proc</code> external processing protocol.</p><p><strong>Endpoint Picker (EPP)</strong> is the scheduler. It runs a filter → score → pick pipeline over candidate pods, using real-time pod metrics and KV cache state, and returns the selected pod address to the Proxy.</p><p>The EPP’s cache awareness comes from a continuous event stream: vLLM emits a <code>KVEvents</code> stream — one event per block create or evict — and the EPP consumes it to maintain a <strong>KV-Block Index</strong>: a map from block hash to the set of pods holding that block and the memory tier (GPU HBM, CPU DRAM, or disk). When a request arrives, the EPP tokenizes its prefix, hashes each 128-token block, and queries the index: what fraction of this request’s KV state is already resident on each candidate pod?</p><p>The metadata overhead is negligible. Managing the full KV cache of an 8× H200 DeepSeek R1 cluster — 365 GB of KV VRAM — requires 339 KB of index state on the scheduler side. Data-to-metadata ratio: over 1,000,000:1. The scheduler has a complete real-time map of every cached block across the cluster for essentially nothing.</p><hr><h2 id="precise-vs-approximate"><a class="header-anchor" href="#precise-vs-approximate">#</a>Precise vs. approximate</h2><p>The EPP ships two scheduling modes:</p><p><strong>Approximate</strong> builds a routing history: if past requests with this prefix hash went to pod A, steer future ones there. No <code>KVEvents</code> stream required; works with any vLLM version. Cost: the index is a guess. Pods evict blocks under memory pressure without the scheduler knowing, so affinity decisions can become stale.</p><p><strong>Precise</strong> consumes the live <code>KVEvents</code> stream for an exact real-time view. The scheduler knows which blocks are resident where and computes a true cache affinity score per pod. Cost: vLLM must support the <code>KVEvents</code> API (supported in current vLLM), and the EPP maintains more state.</p><p>The benchmark: 8 pods, 16 H100s, Qwen3-32B TP=2, 307,328-token KV cache per pod. Workload: 150 enterprise customers × 5 users each, 6,000-token system prompts, 1,200-token queries, 3–60 QPS. Total KV demand: 73% of cluster capacity.</p><table><thead><tr><th>Scheduler</th><th>Throughput (tok/s)</th><th>TTFT p90 (s)</th><th>TTFT mean (s)</th><th>Queue depth (mean)</th></tr></thead><tbody><tr><td>precise</td><td><strong>8730</strong></td><td><strong>0.54</strong></td><td><strong>0.30</strong></td><td><strong>0.1</strong></td></tr><tr><td>approximate</td><td>6944</td><td>31.1</td><td>13.3</td><td>8.1</td></tr><tr><td>load-aware</td><td>4429</td><td>94.9</td><td>47.0</td><td>28.9</td></tr><tr><td>random</td><td>4429</td><td>92.6</td><td>45.3</td><td>27.3</td></tr></tbody></table><p><em>Source: <a href="https://llm-d.ai/blog/v0.5-release">llm-d v0.5 release — Precise Inference Scheduling</a></em></p><p>Load-aware vs. random are nearly identical: when you don’t know where the cache is, how you distribute load doesn’t matter much. Approximate gets the queue to 8 (from 28) by making better guesses most of the time, but collapses at the p90 tail when guesses go stale. Precise holds at 0.54s because the scheduler never guesses.</p><p>The gap between approximate and precise — 31s vs. 0.54s TTFT — is not a marginal improvement. It is a different operating regime.</p><hr><h2 id="what-this-maps-to"><a class="header-anchor" href="#what-this-maps-to">#</a>What this maps to</h2><p>This is a cache locality problem of the kind backend engineers have been solving for twenty years: hot tier, cold tier, miss penalty, routing policy. The KV cache is the hot tier. Every miss is a full recompute. The router is the only component that can enforce locality, because neither the model server (which doesn’t know about other pods’ caches) nor the Kubernetes Service (which doesn’t know about LLM semantics) can make the decision.</p><p>The architecture is composable: EPP plugs into Envoy’s <code>ext-proc</code> protocol, which makes it a drop-in addition to any existing Kubernetes networking stack. No changes to the model serving containers; no specialized networking fabric required for the routing layer itself (disaggregated prefill/decode needs RDMA, but that’s a separate feature).</p><p>For engineers who’ve spent time on Redis cache-aside patterns, consistent hashing for cache affinity, or the latency math of tiered storage — the concepts transfer directly. The penalty for a miss is just measured in seconds instead of microseconds.</p><hr><p><em>llm-d is Apache 2.0, CNCF Sandbox (March 2026). Source: <a href="https://github.com/llm-d/llm-d">github.com/llm-d/llm-d</a>. Benchmarks reproduced from the v0.5 release post at <a href="https://llm-d.ai">llm-d.ai</a>. Numbers above are from the precise-scheduling benchmark on the public benchmark platform at <a href="https://prism.llm-d.ai">prism.llm-d.ai</a>.</em></p>]]></content>
    
    
    <summary type="html">On an 8-pod vLLM cluster, p90 TTFT went from 92 seconds to 0.54 seconds with no hardware changes. The only variable was whether the router knew where the KV cache was. This is a cache locality problem. Here is the math.</summary>
    
    
    
    <category term="AI" scheme="https://animesh.cloud/categories/ai/"/>
    
    
    <category term="llm-inference" scheme="https://animesh.cloud/tags/llm-inference/"/>
    
    <category term="kv-cache" scheme="https://animesh.cloud/tags/kv-cache/"/>
    
    <category term="kubernetes" scheme="https://animesh.cloud/tags/kubernetes/"/>
    
    <category term="vllm" scheme="https://animesh.cloud/tags/vllm/"/>
    
    <category term="scheduling" scheme="https://animesh.cloud/tags/scheduling/"/>
    
  </entry>
  
</feed>
