ChatGPT Image Metadata: What DALL-E Puts in Your Downloads

ChatGPT images carry a C2PA manifest and a SynthID pixel watermark. Here's what's in the file, where it lives, and how to remove ChatGPT image metadata.

TL;DR: Every image generated through ChatGPT or the OpenAI API since May 19, 2026 carries two independent signals that mark it as AI. The first is a C2PA manifest — a JUMBF superbox embedded in the PNG (or WebP) file containing a claim_generator string like ChatGPT c2pa-rs/0.28.4 or OpenAI-API c2pa-rs/0.28.4, an c2pa.ai_generative_training assertion, a creation timestamp, and an OpenAI X.509 signature. C2PA adds roughly 3% file overhead on PNG and up to 30% on WebP. The second signal is SynthID, a Google DeepMind pixel-level watermark embedded in the image bitmap itself — it survives screenshots, resaves, and format conversions. Metadata stripping in Metadata Cleaner removes the C2PA manifest plus EXIF, XMP, and PNG tEXt/iTXt chunks in one pass. The pixel watermark is a different layer and does not come off with metadata tools.

Where the AI signals live in a ChatGPT image file

The download button on a ChatGPT image hands you either a PNG or a WebP — those are the two output formats the model supports, controlled by the output_format parameter on the API, with PNG as the default. Inside that file, OpenAI writes several layers of identifying information, and they sit in genuinely different parts of the byte stream.

PNG files are a sequence of chunks: after the eight-byte signature, every chunk is a four-byte length, a four-byte type code, the payload, and a four-byte CRC. The mandatory chunks for a renderable PNG are IHDR, IDAT, and IEND. Everything else is optional. Descriptive metadata lives in three text chunks: tEXt (Latin-1, uncompressed), zTXt (Latin-1, zlib-compressed), and iTXt (UTF-8). The PNG 1.2 specification places XMP packets in an iTXt chunk with the keyword XML:com.adobe.xmp, and EXIF can live in an eXIf chunk since PNG 1.5.

C2PA lives somewhere else entirely. The Coalition for Content Provenance and Authenticity defines a JUMBF (JPEG Universal Metadata Box Format) superbox embedded as a single contiguous block inside the host file. In a PNG that's a caBX chunk; in a WebP it's the C2PA chunk in the RIFF chain; in a JPEG it's an APP11 marker segment. The C2PA 2.4 Technical Specification lays this out. Inside the JUMBF box sits a CBOR-encoded assertion store, a COSE signature, and a credential chain. The C2PA payload in OpenAI's PNG outputs is small — about 3% file overhead. On WebP it's heavier because WebP files are themselves smaller, so the same bytes show up as roughly 30% of the file. The percentage difference is a property of the host format, not the payload changing.

The third signal doesn't show up in exiftool at all: SynthID. SynthID modulates pixel values in a frequency-domain pattern keyed to a private detector. It isn't a chunk or a tag — it lives in the actual pixel values of the IDAT payload, which is the reason it has its own section below.

Close-up of colorful text on a computer screen, showcasing cybersecurity concepts Photo by Pixabay on Pexels.

What's actually in the C2PA manifest

A C2PA manifest is a cryptographically signed provenance record. It states what the asset is, who created it, what tool generated it, and lays out a chain of claims any validator can check. For an OpenAI-generated image, the manifest is signed by an OpenAI certificate that chains up to a publicly listed C2PA root.

If you drop a ChatGPT-generated PNG onto contentcredentials.org/verify or openai.com/verify, the validator pulls the manifest out, checks the signature, walks the assertion store, and prints the fields. The same data is readable in the terminal with c2patool from the open-source Content Authenticity Initiative tools.

The claim_generator field names the software. For an image generated from chat, it reads something like ChatGPT c2pa-rs/0.28.4 — naming the host product and the version of the Rust C2PA library OpenAI builds against. From the REST API, the same field reads OpenAI-API c2pa-rs/0.28.4. The library version moves whenever OpenAI rebuilds, but the prefix is the tell — it identifies which surface produced the image. A digital_source_type assertion resolves to a URN ending in trainedAlgorithmicMedia — the IPTC vocabulary code for "synthetic, generated by a trained AI model." The IPTC defines this whole vocabulary as part of its photo metadata standards. A creation timestamp encoded in ISO 8601 sits beside it. A title field is usually a UUID-like instance ID, not a human-readable name. There is a hash of the source bitmap — that hash is how platforms detect tampering, because any modification to the pixel data invalidates it.

Three things the manifest does not contain by default in the consumer chat flow: the prompt text, the account ID, or the model checkpoint name. The chat UI does not propagate the prompt into the manifest. The API can carry a prompt assertion when you generate via images.generate with the right request shape — that's a long-standing point of confusion. Chat-generated: no prompt in file. API-generated with prompt-passthrough: the prompt can land there.

The signature chain is what makes the whole thing verifiable. OpenAI publishes the certificate; the image carries the manifest; the manifest carries an OpenAI signature over the assertion store and the bitmap hash. Any validator that trusts the OpenAI root can confirm the file came from OpenAI's pipeline at the moment the signature was issued. Strip the C2PA box and the signed claim goes with it; the file is then unsigned and indistinguishable from any other image at the C2PA layer. The pixels are still the pixels — which is the distinction the next section turns on.

A man using a computer with a large screen in a dark room, focused on technology Photo by Alberlan Barros on Pexels.

The SynthID pixel-level watermark

On May 19, 2026, OpenAI announced that every image generated through ChatGPT, Codex, and the OpenAI API would carry a second signal in parallel with C2PA: a SynthID invisible watermark, embedded in partnership with Google DeepMind. The framing positioned the two as complementary layers — C2PA carrying detailed context any validator can read, SynthID providing a signal that survives the kinds of transformations that strip context.

SynthID is not metadata. It is a perturbation pattern applied to the pixel values during image generation, designed to be statistically detectable by a private classifier while remaining imperceptible to a human viewer. There is a public preprint on SynthID-Image at internet scale that describes the approach. The signal is robust to JPEG re-encoding, modest cropping, brightness and contrast changes, color-space conversions, and the resave-on-upload that social platforms apply at ingest. Screenshots also survive — the headline characteristic, because screenshotting has been the default tactic for stripping C2PA for two years.

What the watermark does not survive cleanly is heavy editing. Inpainting, large crops, aggressive downsampling, and adversarial reconstruction can degrade the signal until it falls below the detector's confidence threshold. The detector returns a probability, not a yes/no — so the practical question is whether the score crosses whatever threshold the verifier sets.

For a creator stripping metadata, the implication is straightforward. Metadata Cleaner walks the chunk chain and removes the C2PA manifest, the EXIF, the XMP, the tEXt and iTXt chunks, and anything else beside the IDAT. The pixel watermark is inside the IDAT — part of the image itself. Removing it would mean altering pixels enough to disrupt the pattern, which means visibly degrading the image. Tools that claim to remove it generally do so by aggressive JPEG round-trips and upsampling, which leaves visible artifacts. That's not a tradeoff most creators want.

This is the same pattern we covered in the Day 7 piece on the difference between metadata and audio watermarks — stripping the descriptive layer is easy; removing the signal embedded into the content itself is a different problem.

Modern laptop displayed on a granite countertop with a dark, minimalistic backdrop Photo by Airam Dato-on on Pexels.

How to remove ChatGPT image metadata in the browser

Stripping a ChatGPT image's metadata means walking the host file's chunk chain and dropping everything that isn't required to render the picture. For a PNG, keep IHDR, all IDAT chunks, and IEND; drop the C2PA caBX box, any eXIf, any tEXt / zTXt / iTXt text chunks (including the XMP packet keyed XML:com.adobe.xmp), and any private chunks. For a WebP, the same logic applies to the RIFF chain: keep VP8 / VP8L / VP8X plus ALPH / ICCP for transparency and color, drop the C2PA, EXIF, and XMP RIFF chunks. Pixel data passes through byte-for-byte.

Metadata Cleaner does this in the browser. The file never uploads. Drop the PNG or WebP into the page, click Clean, and the cleaned image lands back in your downloads. Running exiftool -G1 -a cleaned.png should show no XMP, no EXIF, no JUMBF, no text chunks. File size drops by the C2PA overhead plus any XMP and EXIF that were present — usually a few kilobytes on PNG, a noticeable fraction on WebP. The Adobe Content Credentials project maintains the spec the verifiers read against, so the absence of a result on contentcredentials.org/verify confirms no manifest is left.

What stripping the file doesn't reach is the part the May 19 announcement was designed to address. The SynthID watermark in the pixel data is unaffected by chunk-level cleaning. The C2PA layer goes; the pixel signal stays. A platform running the SynthID detector can still identify the image as AI-generated at high confidence. That isn't a bug in the cleaner — it's the layered design of the disclosure stack. Removing the descriptive provenance is a privacy and competitive decision (most creators don't want every downstream viewer to see "made by OpenAI" on the file); the pixel watermark is a separate question that would require altering the image itself, which is not what a metadata tool does. If you've shipped AI work into the platform metadata detection systems we covered in Day 8, the C2PA strip changes what the upload pipeline reads on ingest, but second-pass detection at platforms that license SynthID is unaffected.

The same chunk-walking pattern applies to other AI image tools — we covered the comparable behaviour for Midjourney, DALL-E, and Stable Diffusion exports in Day 5, and for Adobe Photoshop's Content Credentials in Day 15. The underlying C2PA mechanism is the same; what differs is which chunk type the manifest sits in and which vendor-specific tags surround it. For the broader explainer on what C2PA actually is and why every commercial AI image tool ships it now, see the Day 10 piece on what C2PA metadata is.

FAQ

Does every ChatGPT image carry C2PA? Yes. Every image generated through ChatGPT, the OpenAI API, and related products carries a C2PA manifest in the output file. The May 19, 2026 update added the SynthID pixel watermark as a second, parallel signal — it didn't replace the C2PA layer.

Will removing the C2PA manifest break the image? No. The C2PA box sits beside the pixel chunks in the file structure. Removing it leaves the pixel data byte-identical and the file fully renderable. Validators that look for a manifest report "no provenance" rather than "broken provenance."

Can I just convert to JPEG to strip it? Sometimes. Converting through an encoder that doesn't carry the JUMBF box forward will drop the C2PA manifest — ImageMagick conversions are widely reported to do this. The conversion also re-encodes the pixels, which can degrade visible quality. A chunk-level strip on the original format is cleaner.

Does exiftool show the C2PA fields? Partially. ExifTool reads the JUMBF group and surfaces tag names within it; per its documentation, C2PA tags are not currently writable, but the JUMBF group can be deleted with -JUMBF:all=. That's a workable strip path for command-line users. The browser flow in Metadata Cleaner is the same operation without installing anything.

What if the image was made with DALL-E 3 specifically? DALL-E 3 was the prior generation of OpenAI's image stack and is still reachable via API for legacy compatibility. The C2PA manifest shape is the same — same JUMBF box, same IPTC trainedAlgorithmicMedia assertion, same OpenAI signing certificate. The claim_generator string will name the older version. SynthID was added to all production OpenAI image surfaces on May 19, 2026.

Try Metadata Cleaner free — it runs in your browser, strips the C2PA manifest and every other text chunk in one pass, and never uploads your file.