1. What this proves and doesn't
Layer 1 makes three claims. The benchmark runs occurred on the hardware described. The per-seed metrics in the whitepaper match what that hardware actually produced. No byte of the published bundle has been altered since publication. That is the full scope.
Layer 1 does not prove that the technique works on data outside this benchmark. It does not prove the kernel itself is correct. Higher layers answer those questions.
| Layer 1 (this) | Layer 2 audit | Layer 3+ pilot, license | |
|---|---|---|---|
| Runs occurred on attested hardware | ✓ | ✓ | ✓ |
| Metrics match the published whitepaper | ✓ | ✓ | ✓ |
| Verification record has not been altered | ✓ | ✓ | ✓ |
| Multi-seed consistency | ✓ | ✓ | ✓ |
| Kernel works on novel data | — | ✓ | ✓ |
| Generalization beyond UCF-101 | — | ✓ | ✓ |
| Kernel internal logic verified | — | — | Layer 4 only |
Honest framing. Layer 1 proves the runs happened and the numbers are real. It does not prove the technique generalizes. That is Layer 2's job.
2. The bundle format
The published artifact is two files. receipts.tar.gz holds
the evidence. receipts.sigstore.json is the signature
envelope. An optional timestamp.ots carries the Bitcoin
anchor.
Inside the tarball is a manifest.json that indexes every
file with its SHA-256 hash. Then six per-result receipt JSON files.
Then per-seed metric files for both the SGM-Sparse run and the dense
baseline. NVIDIA NRAS attestation reports tie each run to specific
H100 hardware. Environment specs record the container digest, CUDA
version, driver, and command line. SHA-256 hashes of the initial
weight tensors lock in the starting state.
The full layout is in docs/receipts_format.md in the
validation repo, including the exact JSON schema for each receipt and
the manifest. Every numerical metric in the whitepaper has a
corresponding receipt. The mapping lives in
claims/claims_to_receipts.json.
3. The signing chain
The bundle is bound by three independent trust roots. Each is verifiable with off-the-shelf tools. None of them are built by COSIMO. Defeating any one is hard. Defeating all three at once is the threat model Layer 1 cannot defend against. That is what an in-person Layer 2 audit is for.
3a. Fulcio certificate #fulcio
Sigstore Fulcio issues short-lived X.509 certificates bound to OIDC identities. Long-lived signing keys can be stolen and used silently for years. Fulcio avoids that. The publisher proves their identity to a trusted OIDC issuer at signing time, receives a certificate valid for ten minutes, and signs the artifact. The certificate itself is logged in Rekor.
For COSIMO's bundles, the OIDC issuer is Google Workspace. The
identity is validation@cosimo.ai. To verify, you check
that the certificate is signed by Fulcio's root, that the OIDC issuer
is https://accounts.google.com, and that the identity
ends in @cosimo.ai. The
verification recipe does this in
one cosign verify-blob call.
3b. Rekor transparency log #rekor
Rekor is a public, append-only Merkle-tree log of every signature Fulcio issues. Each entry is included via an inclusion proof. That proof is a path of hashes from the entry up to the log's signed root. The root is published periodically and witnessed externally.
This prevents silent tampering. If an attacker briefly compromised a signing identity and forged a bundle, the forged signature would appear in Rekor. It would be tied to a specific point in time and unforgeable. The legitimate publisher, or any auditor, can detect the unauthorized signature later. Reviewers can also detect divergence by spotting two conflicting signatures of artifacts with the same name.
3c. Manifest hash chain #manifest-hash-chain
Inside the bundle, manifest.json lists every file with
its SHA-256 hash. The verifier walks this list, fetches each file
individually, recomputes the hash via SubtleCrypto, and
compares. Any discrepancy fails the check immediately. Even a single
byte changed shows up as a mismatch, with the expected and actual
hashes printed side by side.
This is the workhorse layer. The signature on the outside binds the bundle as a whole. The manifest binds every file inside the bundle. Together they form an unbroken chain from a trusted identity to every byte of every metric file.
3d. OpenTimestamps Bitcoin anchor #opentimestamps
OpenTimestamps anchors the bundle hash into Bitcoin's blockchain via a chain of Merkle aggregations published by OpenTimestamps calendar servers. Once the bundle hash is committed to a Bitcoin block, the anchor proves the bundle existed before that block was mined.
This is independent of sigstore. If sigstore's infrastructure were compromised, OpenTimestamps would still bear witness. Defeating it requires rewriting Bitcoin's chain history. That is computationally infeasible at any scale that matters.
The OpenTimestamps anchor is optional. Skipping it does not reduce the formal security guarantee much. Rekor already provides transparency. It does add a fundamentally different trust root with different failure modes. That is belt-and-suspenders rigor for reviewers who care.
4. Per-result verification
The signing chain proves the bundle is intact and from the right publisher. The per-result layer binds each headline number directly to its source data. That covers 3.12× compression, +12.4 pp accuracy lift, 27× lower inference VRAM, and the rest.
Each headline performance result has a per-result verification record.
The record is a JSON file. It declares the value, the documented
computation method (for example,
median(dense.footprint / sgm.footprint) across 5 seeds),
and a source_data_hashes map. The map lists the exact
files the result was computed from, with their SHA-256 hashes.
The verifier does two things:
-
Fetches each source file referenced in
source_data_hashesand recomputes its SHA-256. Every hash must match. -
Re-runs the documented
computation_methodon the source data. The result must match the declaredclaim_valuewithin numerical tolerance.
Both must pass. If either fails, the result is flagged with the exact discrepancy. A reviewer can independently re-run the math and confirm. That is what makes this the credibility-load-bearing layer.
5. Verification recipe
The full step-by-step recipe with cosign, jq, and (optionally) NVIDIA's attestation CLI lives in the validation repo at VERIFICATION_RECIPE.md. The minimum to confirm the signature:
curl -sLO https://github.com/COSIMOAI/validation/releases/latest/download/receipts.tar.gz
curl -sLO https://github.com/COSIMOAI/validation/releases/latest/download/receipts.sigstore.json
cosign verify-blob \
--bundle receipts.sigstore.json \
--certificate-identity-regexp '.*@cosimo\.ai$' \
--certificate-oidc-issuer https://accounts.google.com \
receipts.tar.gz
For the per-result check, the recipe walks through extracting the
bundle and using jq to compare receipt values against
the underlying metric files. Eight commands total. All standard Unix
tools. None of them built by COSIMO.
Or use the browser verifier. Same chain. Same outputs. No install needed.
6. What's in the bundle
The inner artifact (receipts.tar.gz) contains:
manifest.json. Index of every file with its SHA-256 hash.runs/sgm_sparse/seed_{0..4}/. Per-seed metrics, attestation reports, environment specs, initial weight hashes.runs/raw_3dcnn/seed_{0..4}/. Same structure for the dense baseline.inference/. Latency distribution and memory profile from the inference run.claims/. Per-result verification records and the whitepaper-to-records mapping.canonical_results.md. The canonical results document, embedded for cross-reference.whitepaper_numeric_audit.md. Every numerical metric in the whitepaper, enumerated.timestamp.ots. The optional Bitcoin anchor.
Schema reference and field-by-field walkthrough: docs/receipts_format.md.
7. Reviewer's tour
For the reviewer who wants the strongest signal in the shortest time. A fifteen-minute path through the bundle.
-
Confirm the signature. Run
cosign verify-blob(above). Expect "Verified OK". This binds the artifact tovalidation@cosimo.ai. -
Spot-check the manifest. Pick one file, for example
runs/sgm_sparse/seed_0/metrics.json. Compute its SHA-256 withshasum -a 256and compare to the entry inmanifest.json. Should match. -
Re-run one result's math. Open
claims/compression_ratio.receipt.json. Read thecomputation_method. Pull the source files. Compute the ratio yourself. Should matchclaim_value: 3.12within rounding. -
Inspect one attestation. Run
nv-attestation-cli verify --report runs/sgm_sparse/seed_0/attestation.json. Should report the H100 GPU UUID, container image digest, and driver version, all signed by NVIDIA's NRAS. -
Check the Bitcoin anchor (optional).
ots verify timestamp.ots. Should return a Bitcoin block height and timestamp.
At each stop, the verification either passes cleanly or fails with a specific discrepancy. There is no middle ground. That is the design.
8. What the next layer offers
Layer 1 ends where independent generalization begins. Each higher layer closes a different gap.
- Layer 2. Independent audit. An independent auditor runs the sealed kernel under NDA on customer-provided data and produces a redacted report. Closes "does it work on novel data?"
- Layer 3. Black-box pilot. Deployment in your environment for broader testing. Closes "does it hold up in my pipeline?"
- Layer 4. Kernel license. Full kernel integration under commercial terms. Closes "we want to use this in production."
Tier-up paths are linked from the verification page's footer. For specific questions, see the FAQ.