Public benchmark

Latency benchmark for NeutralAI masking overhead

This page tracks the reproducible warm-path latency impact of NeutralAI masking, the extra cost of semantic observe mode, and the TTFT overhead from output redaction. It is a product benchmark, not a provider network benchmark.

500-token overhead P95

102 ms

500-token neutralization P95

102 ms

Semantic observe delta P95

26.0 ms

TTFT impact P95

0.03 ms

Throughput

12.1 RPS

Production observation

Live gateway measurement from 2026-05-08. This isolates NeutralAI security and masking work only, not downstream model generation or provider stream time.

Masking/security mean

41.4 ms

PII detection mean

17.3 ms

n=5

Current launch target status

We publish the target and the current measured warm-path result together. That keeps the benchmark useful for customers and honest for the team.

Optimization still required

Masking overhead target (500 tokens, P95)

50.0 ms

Current measured result: 102 ms

TTFT impact target (P95)

20.0 ms

Current measured result: 0.03 ms

Latency by payload size

Measurements are warm-path synthetic prompts with a small amount of embedded PII and progressively larger neutral filler text.

PayloadDisabled P50Disabled P95Disabled P99Observe delta P95
100 tokens24.4 ms26.2 ms26.4 ms18.1 ms
500 tokens98.4 ms102 ms102 ms26.0 ms
2000 tokens353 ms366 ms369 ms38.1 ms
5000 tokens896 ms911 ms912 ms702 ms

Methodology note

Cold start is excluded from the public number. The no-proxy baseline is only a harness floor, not a provider latency claim. TTFT here measures streaming redaction overhead only. Production observation means NeutralAI gateway overhead only. Public numbers link back to the checked-in benchmark artifact generated on 2026-05-07.