Public benchmark
Latency benchmark for NeutralAI masking overhead
This page tracks the reproducible warm-path latency impact of NeutralAI masking, the extra cost of semantic observe mode, and the TTFT overhead from output redaction. It is a product benchmark, not a provider network benchmark.
500-token overhead P95
102 ms
500-token neutralization P95
102 ms
Semantic observe delta P95
26.0 ms
TTFT impact P95
0.03 ms
Throughput
12.1 RPS
Production observation
Live gateway measurement from 2026-05-08. This isolates NeutralAI security and masking work only, not downstream model generation or provider stream time.
Masking/security mean
41.4 ms
PII detection mean
17.3 ms
n=5
Current launch target status
We publish the target and the current measured warm-path result together. That keeps the benchmark useful for customers and honest for the team.
Masking overhead target (500 tokens, P95)
50.0 ms
Current measured result: 102 ms
TTFT impact target (P95)
20.0 ms
Current measured result: 0.03 ms
Latency by payload size
Measurements are warm-path synthetic prompts with a small amount of embedded PII and progressively larger neutral filler text.
| Payload | Disabled P50 | Disabled P95 | Disabled P99 | Observe delta P95 |
|---|---|---|---|---|
| 100 tokens | 24.4 ms | 26.2 ms | 26.4 ms | 18.1 ms |
| 500 tokens | 98.4 ms | 102 ms | 102 ms | 26.0 ms |
| 2000 tokens | 353 ms | 366 ms | 369 ms | 38.1 ms |
| 5000 tokens | 896 ms | 911 ms | 912 ms | 702 ms |
Methodology note
Cold start is excluded from the public number. The no-proxy baseline is only a harness floor, not a provider latency claim. TTFT here measures streaming redaction overhead only. Production observation means NeutralAI gateway overhead only. Public numbers link back to the checked-in benchmark artifact generated on 2026-05-07.