Public benchmark

Latency benchmark for NeutralAI masking overhead

This page tracks the reproducible warm-path latency impact of NeutralAI masking, the extra cost of semantic observe mode, and the TTFT overhead from output redaction. It is a product benchmark, not a provider network benchmark.

500-token overhead P95

102 ms

500-token neutralization P95

102 ms

Semantic observe delta P95

26.0 ms

TTFT impact P95

0.03 ms

Throughput

12.1 RPS

Production observation

Live gateway measurement from 2026-05-08. This isolates NeutralAI security and masking work only, not downstream model generation or provider stream time.

Masking/security mean

41.4 ms

PII detection mean

17.3 ms

n=5

Current launch target status

We publish the target and the current measured warm-path result together. That keeps the benchmark useful for customers and honest for the team.

Optimization still required

Masking overhead target (500 tokens, P95)

50.0 ms

Current measured result: 102 ms

TTFT impact target (P95)

20.0 ms

Current measured result: 0.03 ms

Latency by payload size

Measurements are warm-path synthetic prompts with a small amount of embedded PII and progressively larger neutral filler text.

Payload	Disabled P50	Disabled P95	Disabled P99	Observe delta P95
100 tokens	24.4 ms	26.2 ms	26.4 ms	18.1 ms
500 tokens	98.4 ms	102 ms	102 ms	26.0 ms
2000 tokens	353 ms	366 ms	369 ms	38.1 ms
5000 tokens	896 ms	911 ms	912 ms	702 ms

Methodology note

Cold start is excluded from the public number. The no-proxy baseline is only a harness floor, not a provider latency claim. TTFT here measures streaming redaction overhead only. Production observation means NeutralAI gateway overhead only. Public numbers link back to the checked-in benchmark artifact generated on 2026-05-07.

View accuracy benchmark Open API reference