The Data
Embodied data, with receipts.
Every episode ships with its full provenance chain. The chain is not metadata about the product — it is half of the product.
EU AI Act — Article 10
The provenance record the AI Act now requires.
From 2 August 2026, providers of high-risk AI systems must document the origin of their training data and the original purpose of its collection. Most datasets on the market cannot provide this — their lineage was never recorded.
Every Aiscéal bundle ships it by construction: a hash-sealed, recomputable chain of custody from capture to export. Your dataset isn’t just training data — it’s your Annex IV origin-of-data record. Provenance stops being a receipt and becomes the reason you buy.
Five Pillars
What each pillar captures.
Episodes
Sense
Languages
Lab
Vision
Delivery
What a delivery looks like.
The unit of delivery is the Aiscéal Dataset Bundle. Its contract surface is manifest.json, and every episode ships its full ordered provenance chain from capture to export, sealed by a chain-head hash recorded in the manifest. Below, a redacted excerpt.
// manifest.json (excerpt, redacted)
{
"contract_version": "1.1",
"dataset_id": "…redacted",
"format": "lerobot_v3",
"annotation_schemas": ["rubric_v1"],
"task_schemas": [{ "id": "…redacted", "name": "Manipulation outcome rubric", "version": 1 }],
"kappa": { "threshold": 0.75, "min": "…redacted", "mean": "…redacted" },
"regions": ["ovh-eu"],
"payload_delivery": "pointer_only",
"episodes": [
{
"id": "…redacted",
"duration_seconds": "…redacted",
"cohen_kappa": "…redacted",
"sensitivity": "standard",
"payload_key": "…redacted",
"payload_region": "ovh-eu",
"payload_sha256": null,
"provenance_head_hash": "c41e…redacted",
"rec_clearance_id": null
}
]
}
// provenance/<episode-id>.json (excerpt, redacted)
[
{ "event_type": "captured", "actor_role": "operator", "content_hash": "2d7c…redacted" },
{ "event_type": "annotated", "actor_role": "annotator", "content_hash": "77b0…redacted" },
{ "event_type": "reviewed", "actor_role": "reviewer", "content_hash": null },
{ "event_type": "qa_passed", "actor_role": "reviewer", "content_hash": null },
{ "event_type": "stamped", "actor_role": "system", "content_hash": "5b18…redacted" },
{ "event_type": "packaged", "actor_role": "system", "content_hash": null },
{ "event_type": "exported", "actor_role": "system", "content_hash": null }
]Every bundle ships with its provenance ledger extract. Recompute the hashes yourself.
Bundle Contents
Inside every bundle.
Media payloads
LeRobot v3 episode files in payload/ — or region-locked pointers into EU infrastructure for sensitive categories, never bulk export.
Annotation payloads
One schema-versioned JSONL file per episode in annotations/, always from at least two independent annotators, answering the task schema.
The task schema itself
The frozen, human-signed questionnaire the annotations answer ships in task_schemas/ — the questionnaire is part of the product.
Provenance ledger extract
The full ordered event chain per episode in provenance/, hashes included. A bundle missing provenance/ is invalid by definition.
Conformity documentation
EU AI Act Article-10 data-governance documentation and a κ QA report with per-episode scores and thresholds, in conformity/.
The frozen manifest
manifest.json carries the contract version, per-episode provenance chain-head hashes, κ statistics, sensitivity counts, and REC clearance references.
Catalog
Browse or commission.
Commission datasets to your task spec, QA’d through the same audited pipeline.
