Methodology · COSIMO Layer 1

1. What this proves and doesn't

Layer 1 makes three claims. The benchmark runs occurred on the hardware described. The per-seed metrics in the whitepaper match what that hardware actually produced. No byte of the published bundle has been altered since publication. That is the full scope.

Layer 1 does not prove that the technique works on data outside this benchmark. It does not prove the kernel itself is correct. Higher layers answer those questions.

	Layer 1 (this)	Layer 2 audit	Layer 3+ pilot, license
Runs occurred on attested hardware	✓	✓	✓
Metrics match the published whitepaper	✓	✓	✓
Verification record has not been altered	✓	✓	✓
Multi-seed consistency	✓	✓	✓
Kernel works on novel data	—	✓	✓
Generalization beyond UCF-101	—	✓	✓
Kernel internal logic verified	—	—	Layer 4 only

Honest framing. Layer 1 proves the runs happened and the numbers are real. It does not prove the technique generalizes. That is Layer 2's job.

2. The bundle format

The published artifact is two files. receipts.tar.gz holds the evidence. receipts.sigstore.json is the signature envelope. An optional timestamp.ots carries the Bitcoin anchor.

Inside the tarball is a manifest.json that indexes every file with its SHA-256 hash. Then six per-result receipt JSON files. Then per-seed metric files for both the SGM-Sparse run and the dense baseline. NVIDIA NRAS attestation reports tie each run to specific H100 hardware. Environment specs record the container digest, CUDA version, driver, and command line. SHA-256 hashes of the initial weight tensors lock in the starting state.

The full layout is in docs/receipts_format.md in the validation repo, including the exact JSON schema for each receipt and the manifest. Every numerical metric in the whitepaper has a corresponding receipt. The mapping lives in claims/claims_to_receipts.json.

Proves the published artifacts are reachable, parseable, and structurally complete. Every metric in the whitepaper has a verification record to back it.

3. The signing chain

The bundle is bound by three independent trust roots. Each is verifiable with off-the-shelf tools. None of them are built by COSIMO. Defeating any one is hard. Defeating all three at once is the threat model Layer 1 cannot defend against. That is what an in-person Layer 2 audit is for.

3a. Fulcio certificate #fulcio

Sigstore Fulcio issues short-lived X.509 certificates bound to OIDC identities. Long-lived signing keys can be stolen and used silently for years. Fulcio avoids that. The publisher proves their identity to a trusted OIDC issuer at signing time, receives a certificate valid for ten minutes, and signs the artifact. The certificate itself is logged in Rekor.

For COSIMO's bundles, the OIDC issuer is Google Workspace. The identity is validation@cosimo.ai. To verify, you check that the certificate is signed by Fulcio's root, that the OIDC issuer is https://accounts.google.com, and that the identity ends in @cosimo.ai. The verification recipe does this in one cosign verify-blob call.

Proves COSIMO's verified publishing identity signed this exact artifact at a specific moment, and the certificate chain back to Fulcio is intact.

Doesn't prove that the publisher is honest about what is inside the bundle. Only that they signed it. The other steps cover that.

3b. Rekor transparency log #rekor

Rekor is a public, append-only Merkle-tree log of every signature Fulcio issues. Each entry is included via an inclusion proof. That proof is a path of hashes from the entry up to the log's signed root. The root is published periodically and witnessed externally.

This prevents silent tampering. If an attacker briefly compromised a signing identity and forged a bundle, the forged signature would appear in Rekor. It would be tied to a specific point in time and unforgeable. The legitimate publisher, or any auditor, can detect the unauthorized signature later. Reviewers can also detect divergence by spotting two conflicting signatures of artifacts with the same name.

Proves the signature is publicly logged at a specific moment and cannot be retroactively forged or removed without breaking the log's Merkle consistency.

3c. Manifest hash chain #manifest-hash-chain

Inside the bundle, manifest.json lists every file with its SHA-256 hash. The verifier walks this list, fetches each file individually, recomputes the hash via SubtleCrypto, and compares. Any discrepancy fails the check immediately. Even a single byte changed shows up as a mismatch, with the expected and actual hashes printed side by side.

This is the workhorse layer. The signature on the outside binds the bundle as a whole. The manifest binds every file inside the bundle. Together they form an unbroken chain from a trusted identity to every byte of every metric file.

Proves no file in the bundle has been altered since publication. The bundle is byte-identical to what was signed.

Doesn't prove that the signed bundle was honest at the moment of signing. For that, you also need the per-result verification (next).

3d. OpenTimestamps Bitcoin anchor #opentimestamps

OpenTimestamps anchors the bundle hash into Bitcoin's blockchain via a chain of Merkle aggregations published by OpenTimestamps calendar servers. Once the bundle hash is committed to a Bitcoin block, the anchor proves the bundle existed before that block was mined.

This is independent of sigstore. If sigstore's infrastructure were compromised, OpenTimestamps would still bear witness. Defeating it requires rewriting Bitcoin's chain history. That is computationally infeasible at any scale that matters.

The OpenTimestamps anchor is optional. Skipping it does not reduce the formal security guarantee much. Rekor already provides transparency. It does add a fundamentally different trust root with different failure modes. That is belt-and-suspenders rigor for reviewers who care.

Proves the bundle existed before a specific Bitcoin block, independent of sigstore's trust model.

4. Per-result verification

The signing chain proves the bundle is intact and from the right publisher. The per-result layer binds each headline number directly to its source data. That covers 3.12× compression, +12.4 pp accuracy lift, 27× lower inference VRAM, and the rest.

Each headline performance result has a per-result verification record. The record is a JSON file. It declares the value, the documented computation method (for example, median(dense.footprint / sgm.footprint) across 5 seeds), and a source_data_hashes map. The map lists the exact files the result was computed from, with their SHA-256 hashes.

The verifier does two things:

Fetches each source file referenced in source_data_hashes and recomputes its SHA-256. Every hash must match.
Re-runs the documented computation_method on the source data. The result must match the declared claim_value within numerical tolerance.

Both must pass. If either fails, the result is flagged with the exact discrepancy. A reviewer can independently re-run the math and confirm. That is what makes this the credibility-load-bearing layer.

Proves every headline number was computed from the source data the verification record points to, using the documented method, with no synthesis after the fact.

5. Verification recipe

The full step-by-step recipe with cosign, jq, and (optionally) NVIDIA's attestation CLI lives in the validation repo at VERIFICATION_RECIPE.md. The minimum to confirm the signature:

curl -sLO https://github.com/COSIMOAI/validation/releases/latest/download/receipts.tar.gz
curl -sLO https://github.com/COSIMOAI/validation/releases/latest/download/receipts.sigstore.json

cosign verify-blob \
  --bundle receipts.sigstore.json \
  --certificate-identity-regexp '.*@cosimo\.ai$' \
  --certificate-oidc-issuer https://accounts.google.com \
  receipts.tar.gz

For the per-result check, the recipe walks through extracting the bundle and using jq to compare receipt values against the underlying metric files. Eight commands total. All standard Unix tools. None of them built by COSIMO.

Or use the browser verifier. Same chain. Same outputs. No install needed.

6. What's in the bundle

The inner artifact (receipts.tar.gz) contains:

manifest.json. Index of every file with its SHA-256 hash.
runs/sgm_sparse/seed_{0..4}/. Per-seed metrics, attestation reports, environment specs, initial weight hashes.
runs/raw_3dcnn/seed_{0..4}/. Same structure for the dense baseline.
inference/. Latency distribution and memory profile from the inference run.
claims/. Per-result verification records and the whitepaper-to-records mapping.
canonical_results.md. The canonical results document, embedded for cross-reference.
whitepaper_numeric_audit.md. Every numerical metric in the whitepaper, enumerated.
timestamp.ots. The optional Bitcoin anchor.

Schema reference and field-by-field walkthrough: docs/receipts_format.md.

7. Reviewer's tour

For the reviewer who wants the strongest signal in the shortest time. A fifteen-minute path through the bundle.

Confirm the signature. Run cosign verify-blob (above). Expect "Verified OK". This binds the artifact to validation@cosimo.ai.
Spot-check the manifest. Pick one file, for example runs/sgm_sparse/seed_0/metrics.json. Compute its SHA-256 with shasum -a 256 and compare to the entry in manifest.json. Should match.
Re-run one result's math. Open claims/compression_ratio.receipt.json. Read the computation_method. Pull the source files. Compute the ratio yourself. Should match claim_value: 3.12 within rounding.
Inspect one attestation. Run nv-attestation-cli verify --report runs/sgm_sparse/seed_0/attestation.json. Should report the H100 GPU UUID, container image digest, and driver version, all signed by NVIDIA's NRAS.
Check the Bitcoin anchor (optional). ots verify timestamp.ots. Should return a Bitcoin block height and timestamp.

At each stop, the verification either passes cleanly or fails with a specific discrepancy. There is no middle ground. That is the design.

8. What the next layer offers

Layer 1 ends where independent generalization begins. Each higher layer closes a different gap.

Layer 2. Independent audit. An independent auditor runs the sealed kernel under NDA on customer-provided data and produces a redacted report. Closes "does it work on novel data?"
Layer 3. Black-box pilot. Deployment in your environment for broader testing. Closes "does it hold up in my pipeline?"
Layer 4. Kernel license. Full kernel integration under commercial terms. Closes "we want to use this in production."

Tier-up paths are linked from the verification page's footer. For specific questions, see the FAQ.

Methodology.