Soft bindings: finding provenance after the bytes change
C2PA supports two ways to connect an asset to its manifest. A hard binding is a cryptographic hash of the exact bytes, optionally embedded in the file itself. It is exact and unforgiving: change one byte and the binding no longer matches. A soft binding is a perceptual fingerprint that survives transformations a human would consider lossless.
Hard bindings are the right default. But assets in the wild get re-encoded by every platform they pass through, cropped for thumbnails, and stripped of metadata on upload. By the time a file reaches a verifier, its embedded manifest is often gone and its bytes no longer hash to anything on record. Soft bindings are how you recover provenance anyway.
How the resolver works
Hashproof computes a 64-bit perceptual hash using a two-dimensional discrete cosine transform. The DCT concentrates the structural information of an image into low-frequency coefficients, which are stable under re-compression and resizing. The hash captures those coefficients and discards the high-frequency detail that re-encoding tends to alter.
At verification time, we compute the same fingerprint for the incoming file and search the stored-manifest index by Hamming distance, which counts how many bits differ between two hashes. Candidates within a threshold are returned, ranked, with a confidence score derived from the distance.
Choosing a threshold
The default threshold is 10 bits on a 64-bit hash. We tuned that against a re-encoding gauntlet modeled on common content-delivery transforms: quality reduction, format conversion, and moderate resizing. Below 10 bits, legitimate re-encodings start getting missed; well above it, unrelated images begin to collide.
The threshold is configurable per request on JSON and GET fingerprint lookups; the multipart upload path uses the default. A pipeline processing noisy sources, like screenshots of screenshots, can widen it and accept more candidates for human review. A high-precision workflow can tighten it. The right value is a function of your tolerance for false negatives versus false positives, not a universal constant.
What soft bindings do not do
A soft binding is a match, not a proof. It tells you an incoming asset is perceptually close to one you have a manifest for. The signature on that manifest is still what establishes authenticity. Soft bindings reconnect an orphaned file to its record; the cryptography is what makes the record trustworthy.
They also do not defend against an adversary who deliberately edits content to change its meaning while staying under the threshold. Perceptual hashing answers "is this the same asset," not "has this asset been manipulated." Manipulation detection is a separate analysis, which is why forensic verification exists alongside resolution.