AI image generators tag their output with hidden provenance data. Here is what Midjourney, DALL-E, and Firefly embed, and how to strip it before sharing.

TL;DR: Every major AI image generator now writes provenance metadata into the files it produces. OpenAI's DALL-E and ChatGPT images carry a signed C2PA manifest, stored in a JUMBF box inside the JPEG APP11 marker segment, alongside an invisible SynthID pixel watermark. Midjourney skips C2PA and instead fills standard EXIF and IPTC fields: the full prompt, parameters, and Job ID land in the Description field, and the IPTC Digital Source Type is set to trainedAlgorithmicMedia. None of Midjourney's tags are cryptographically signed, so they strip cleanly. A metadata cleaner removes the C2PA, EXIF, XMP, and IPTC layers in one pass, but it cannot touch a SynthID watermark woven into the pixels themselves.

When you download a picture from Midjourney or save one out of ChatGPT, the visible image is only part of what you receive. Embedded in the file, sometimes cryptographically signed and sometimes just sitting in plain text fields, is a record of how that image was made: which model produced it, what you typed to get it, and a machine-readable flag declaring it synthetic. This is deliberate. Generators add this data to support content provenance, and regulators are starting to require it. But if you are publishing AI work commercially, the same record can leak your exact prompt, your account name, or a job ID that ties dozens of images back to you. We built Metadata Cleaner to handle exactly this kind of embedded data, so it is worth understanding what each platform actually writes.

What does "metadata" mean for an AI image?

An AI-generated file can carry up to four distinct metadata layers, and they are not interchangeable. The first is EXIF, the same tag block a camera uses, repurposed to store fields like Software and Artist. The second is XMP, Adobe's extensible packet, which is where the standardized AI-origin flags now live. The third is IPTC photo metadata, the press-industry schema that defines the Digital Source Type vocabulary. The fourth, and the most technically distinct, is a C2PA manifest: a signed, tamper-evident provenance record rather than a loose set of tags.

The reason this matters is that stripping one layer does not touch the others. A tool that only clears EXIF will leave an XMP packet and a C2PA manifest fully intact. If you want the file to be genuinely anonymous, every layer has to go. For a deeper primer on the camera side of this, see our explainer on what EXIF data is, and for how the three classic schemas differ, our breakdown of IPTC vs. EXIF vs. XMP.

How are OpenAI's DALL-E and ChatGPT images tagged?

OpenAI takes the most thorough approach of the major generators. Images created or edited with DALL-E 3 in ChatGPT, through the API, or via Codex carry a C2PA Content Credential, the open provenance standard from the Coalition for Content Provenance and Authenticity. The credential records the application and tool that made the image, the actions taken on it, such as a format conversion or an edit, and a claim generator string identifying OpenAI's software. Because users can now edit generated images inside ChatGPT, OpenAI designed the manifest to carry provenance forward through those edits rather than dropping it.

OpenAI does not stop at metadata. As the company describes its provenance work, it also embeds Google DeepMind's SynthID, an invisible watermark baked into the pixels. The two systems are intentionally redundant: C2PA carries rich, readable context, while SynthID preserves a signal even when the metadata is stripped. Keep that pairing in mind, because it changes what "removing the metadata" can and cannot accomplish, a point we return to at the end.

A wall of source code, the same kind of structured text a C2PA manifest holds inside an AI image, only there it is serialized as binary CBOR and cryptographically signed. Photo by Markus Spiske on Pexels

How does Midjourney tag its images?

Midjourney takes a noticeably lighter approach. As of early 2026, Midjourney does not embed a C2PA manifest in the images you download from Discord or the web app. Instead, it writes generation data into ordinary EXIF and IPTC fields. The Description field typically holds the complete prompt text, the parameters you passed, the Job ID, and any image reference URLs. The Author field stores your Midjourney username. And the file's IPTC Digital Source Type is set to trainedAlgorithmicMedia, the standardized value that marks an image as fully AI-generated.

The practical consequence is twofold. First, anyone who opens the file in a metadata inspector can read your exact prompt and the username attached to your account, which is a real privacy exposure for creators selling work or protecting a process. Second, because none of these fields are cryptographically signed, they behave like any other EXIF or IPTC tag: they can be edited or removed without breaking anything. Midjourney has been a member of the Content Authenticity Initiative since 2023, so signed credentials may arrive in a future version, but today the data is plain and strippable.

What is the IPTC "Digital Source Type" field?

The one tag the whole industry is converging on is the IPTC Digital Source Type. It is a controlled vocabulary maintained by the International Press Telecommunications Council that classifies how an image was created, and it is written into the XMP packet rather than EXIF. For fully AI-generated work the value is trainedAlgorithmicMedia, identified by the URI http://cv.iptc.org/newscodes/digitalsourcetype/trainedAlgorithmicMedia. A partly synthetic image, such as an inpainted or outpainted edit, gets compositeSynthetic, while output from a non-AI algorithm, like a fractal, gets algorithmicMedia.

This field is becoming load-bearing. Getty Images, Shutterstock, and Adobe Stock all read it to label or filter submissions, and Meta has said it uses embedded IPTC metadata to apply "AI Info" labels across Facebook, Instagram, and Threads. With the EU AI Act's transparency requirements taking effect in August 2026, expect the Digital Source Type to show up in more generators and to be checked by more platforms automatically. If a single tag determines whether your image gets a visible "AI" badge on a social feed, it is worth knowing it is there.

What does a C2PA manifest actually contain?

The C2PA layer is worth understanding in detail because it is structurally unlike the other three. A manifest is not a flat list of tags. It is a bundle of assertions, such as the actions performed and the claim generator, wrapped in a claim that is then hashed and signed with a certificate. Tampering with the pixels or the assertions breaks the signature, which is what makes the credential tamper-evident rather than merely informational.

Mechanically, the manifest is serialized as CBOR, a compact binary format, and embedded in a JUMBF box, the JPEG Universal Metadata Box Format. In a JPEG that box rides in the APP11 marker segment; in a PNG it lives in a dedicated chunk alongside the usual iTXt and tEXt text chunks. This is why an EXIF-only cleaner misses it entirely: the manifest is not in the EXIF APP1 segment at all. Removing it means recognizing and dropping the JUMBF box specifically. If you have worked with Adobe's Content Credentials, this is the same provenance plumbing, and we cover the consumer-facing version in our guide to removing C2PA Content Credentials.

A close-up of generated digital art: visually finished, yet carrying an invisible provenance record the moment it leaves the model. Photo by Merlin Lightpainting on Pexels

How do you remove AI image metadata?

The good news is that every metadata layer described above, the EXIF fields, the XMP packet with its Digital Source Type, the IPTC block, and the C2PA manifest, is removable, because each one is data stored in the file rather than in the pixels. The workflow is the same whether the image came from Midjourney, DALL-E, or Firefly. Open metadatacleaner.app, drop in the PNG or JPEG, and let the tool parse the file. It reads the C2PA manifest, EXIF, XMP, and IPTC blocks, then strips them in a single pass, including the prompt text, the Job ID, the username, and the signed provenance manifest. Download the cleaned copy, and before you publish, confirm that no Content Credentials or EXIF fields remain. Everything runs in your browser, so the image never leaves your device. If you want to audit a file independently first, a command-line utility such as ExifTool will dump the raw tags so you can see exactly what was there.

One note on scope: this works on image files. Metadata Cleaner handles images, video, and audio, so a generated MP4 from a video model is covered too, but it is not a PDF tool. If your AI art is wrapped inside a document, export the image out first.

Does this metadata survive when you upload the image?

Sometimes, and that unpredictability is the problem. Many social platforms re-encode every image on upload, which incidentally discards loose EXIF, IPTC, and XMP tags. So the prompt sitting in a Midjourney Description field will often vanish on its own the moment you post to a feed that re-compresses uploads. The trouble is that this is a side effect, not a guarantee, and it does not apply everywhere. Cloud storage, direct file transfers, messaging apps that send originals, email attachments, and stock marketplaces all tend to preserve the file byte-for-byte, prompt and all.

The picture is shifting for C2PA specifically. Rather than stripping manifests, a growing number of platforms now deliberately preserve Content Credentials so the provenance chain stays intact, which means the signed manifest may persist exactly where you assumed an upload would clean it. And a SynthID watermark, as we are about to cover, survives the re-encode regardless. Counting on the destination platform to scrub your file is therefore a coin flip. The reliable move is to strip the metadata yourself before the image ever leaves your machine.

What does stripping metadata not remove?

Here is the honest limit, and it is an important one. Removing metadata clears the declared provenance, the readable manifest and tags, but it does nothing to a watermark embedded in the pixels. OpenAI's images carry SynthID, and SynthID is not metadata. It lives in the frequency-domain patterns and pixel values of the image itself, so you can delete every byte of metadata and the watermark remains. It is also built to survive JPEG compression, scaling, cropping, color shifts, and format conversion, the very operations that would casually wipe a metadata tag.

So a clean honest summary is this: stripping metadata makes your file anonymous to anyone reading its tags, removes the prompt and account identifiers Midjourney embeds, and clears the C2PA manifest OpenAI signs. It does not make an image undetectable as AI-generated when a platform checks for a pixel watermark like SynthID, and it is not a way to launder provenance or defeat the EU AI Act's labeling intent. Metadata removal is a privacy and hygiene step, not an invisibility cloak.

For most creators that distinction is exactly what they need. You want your prompt, your username, and your job history out of a file before it goes public, and you want control over the loose tags that follow an image around the web. That is squarely what a metadata cleaner does. Try Metadata Cleaner free and check what your AI images are actually carrying before you share them.

How AI Tools Embed Metadata in Generated Images