FOR AI LABS

Label what you generate. Track the training data you register.

Hashproof signs generated output with durable C2PA provenance, records the training-data lineage you register so you can check membership against it by perceptual similarity, and produces EU AI Act disclosure reports from your stored corpus. The standard regulators and platforms are converging on.

What is going wrong

Three pressures arriving at once: regulation, rights, and durability.

Disclosure is becoming mandatory

EU AI Act Article 50 requires that synthetic content be machine-readable as such, with enforcement beginning August 2026. Voluntary labeling that any tool can strip is easy to lose and hard to evidence when a regulator asks for a durable, checkable record.

Training-data lineage is opaque

When a rights holder asks whether their work was in a training set, or a downstream user asks what a model was trained on, most labs cannot answer with evidence. The provenance of training corpora is rarely captured in a checkable form.

Output labels do not survive the internet

A generated image loses its metadata the first time it is screenshotted or re-encoded by a platform. A label that only lives in EXIF is gone within one hop, exactly when downstream systems need to know the content is synthetic.

What Hashproof brings

Four primitives spanning output labeling and training-data lineage.

Store synthetic-content manifests

Store the C2PA manifests your generation pipeline signs, including AI-generation assertions such as c2pa.ai_generative. Stored manifests carrying those assertions are counted in EU AI Act compliance reports and stay retrievable by ID.

Track training-data lineage

Register datasets and ingest the fingerprints of your training samples, then check membership: does an asset perceptually match something you registered. A match is a similarity match against the corpus you declared — not proof of what a model actually trained on — and a non-match means only that nothing you registered was close enough, not that the asset was never used.

Labels that survive re-encoding

Soft-binding resolution can often reconnect a screenshotted or re-compressed generation back to candidate manifests. Downstream platforms can often recover the synthetic label by resolving the asset back to those candidates, even after re-encoding — a similarity match, not a guarantee.

Disclosure reports from the corpus

Generate EU AI Act Article 50 reports directly from your stored manifests. Each finding cites its manifest, so disclosure becomes a query against the record, not a manual audit.

Where it sits in the lifecycle

From training-data registration to disclosure. Your training and inference stacks stay in place.

01
Register data
Register training datasets and ingest perceptual fingerprints through the API. The corpus becomes queryable for membership without exposing the underlying assets.
02
Store output manifests
Your generation pipeline stores a C2PA manifest for each output, carrying the AI-generation assertions your signing tooling adds. The stored record is retrievable and verifiable through the API.
03
Answer queries
You can answer membership queries from rights holders and auditors against your registered datasets, and verify any generated asset against its manifest — including re-encoded copies that soft binding can often match back to candidate manifests.
04
Report
The compliance endpoint produces Article 50 disclosure reports from the stored corpus, with each finding cited by manifest ID, ready for a regulator or a partner.

How Hashproof is different

Disclosure infrastructure that does not ask you to expose the model.

The standard regulators are converging on

C2PA is referenced across emerging AI-content rules and backed by a Linux Foundation membership spanning model providers, platforms, and hardware. Building on it means your disclosures verify in the same tools regulators and platforms use.

Provenance without exposing the model

Manifests record that content is synthetic, and dataset registration lets you check membership against the corpus you declared, without revealing weights, prompts, or proprietary pipeline detail. Selective disclosure lets verification reveal only the fields you choose and replace the rest with their hashes. A zero-knowledge proof that confirms valid provenance while revealing nothing else is in development, available today as a labeled beta preview.

A substrate, not a content moderator

Hashproof labels and verifies. It does not judge outputs or gate generation. The provenance layer states facts about an asset; policy decisions stay yours.

Compliance fit

One provenance layer across disclosure, transparency, and interoperability requirements.

EU AI Act Article 50: Article 50 requires machine-readable marking of synthetic content and provider transparency. Hashproof produces disclosure reports from your stored manifests. Article 50 obligations apply from 2 August 2026.
Training-data transparency: Dataset registration and membership queries let you check whether an asset matches the training data you registered — a similarity match against your declared corpus, not a record of the actual training run. This supports transparency obligations and rights-holder inquiries.
Provenance interoperability: Manifests follow the C2PA 2.x specification and can federate with peer registrars, so a disclosure made once is checkable everywhere the standard is read.

For the reporting surface, see the compliance reporting page.

Build your Article 50 disclosure record.

Start on the Free tier and sign a generation pipeline end-to-end. Volume lives on Scale and Enterprise; a standard DPA is published for all customers.