The Data

Embodied data, with receipts.

Every episode ships with its full provenance chain. The chain is not metadata about the product — it is half of the product.

EU AI Act — Article 10

The provenance record the AI Act now requires.

From 2 August 2026, providers of high-risk AI systems must document the origin of their training data and the original purpose of its collection. Most datasets on the market cannot provide this — their lineage was never recorded.

Every Aiscéal bundle ships it by construction: a hash-sealed, recomputable chain of custody from capture to export. Your dataset isn’t just training data — it’s your Annex IV origin-of-data record. Provenance stops being a receipt and becomes the reason you buy.

Five Pillars

What each pillar captures.

Episodes

Robot task episodes — teleoperated and autonomous manipulation runs with full sensor and video payloads, delivered as LeRobot v3. Example tasks: pick-and-place, object handover, tool-use sequences.

Sense

Multimodal sensor streams across the modalities embodied models train on, captured synchronously with each run. Example streams: camera video, joint states, gripper and contact signals.

Languages

Irish and low-resource language data, dual-annotator QA’d for accuracy. Example tasks: transcription, translation pairs, spoken-instruction grounding.

Lab

Clinical and laboratory protocol data, EU-resident and region-locked with REC-cleared handling. Example tasks: protocol step annotation, instrument-handling episodes.

Vision

Visual grounding and annotation, with every label produced by two independent annotators and checked for agreement. Example tasks: object grounding, scene labelling, outcome rubrics on video.

Delivery

What a delivery looks like.

The unit of delivery is the Aiscéal Dataset Bundle. Its contract surface is manifest.json, and every episode ships its full ordered provenance chain from capture to export, sealed by a chain-head hash recorded in the manifest. Below, a redacted excerpt.

// manifest.json (excerpt, redacted)
{
  "contract_version": "1.1",
  "dataset_id": "…redacted",
  "format": "lerobot_v3",
  "annotation_schemas": ["rubric_v1"],
  "task_schemas": [{ "id": "…redacted", "name": "Manipulation outcome rubric", "version": 1 }],
  "kappa": { "threshold": 0.75, "min": "…redacted", "mean": "…redacted" },
  "regions": ["ovh-eu"],
  "payload_delivery": "pointer_only",
  "episodes": [
    {
      "id": "…redacted",
      "duration_seconds": "…redacted",
      "cohen_kappa": "…redacted",
      "sensitivity": "standard",
      "payload_key": "…redacted",
      "payload_region": "ovh-eu",
      "payload_sha256": null,
      "provenance_head_hash": "c41e…redacted",
      "rec_clearance_id": null
    }
  ]
}

// provenance/<episode-id>.json (excerpt, redacted)
[
  { "event_type": "captured",  "actor_role": "operator",  "content_hash": "2d7c…redacted" },
  { "event_type": "annotated", "actor_role": "annotator", "content_hash": "77b0…redacted" },
  { "event_type": "reviewed",  "actor_role": "reviewer",  "content_hash": null },
  { "event_type": "qa_passed", "actor_role": "reviewer",  "content_hash": null },
  { "event_type": "stamped",   "actor_role": "system",    "content_hash": "5b18…redacted" },
  { "event_type": "packaged",  "actor_role": "system",    "content_hash": null },
  { "event_type": "exported",  "actor_role": "system",    "content_hash": null }
]

Every bundle ships with its provenance ledger extract. Recompute the hashes yourself.

Bundle Contents

Inside every bundle.

  • Media payloads

    LeRobot v3 episode files in payload/ — or region-locked pointers into EU infrastructure for sensitive categories, never bulk export.

  • Annotation payloads

    One schema-versioned JSONL file per episode in annotations/, always from at least two independent annotators, answering the task schema.

  • The task schema itself

    The frozen, human-signed questionnaire the annotations answer ships in task_schemas/ — the questionnaire is part of the product.

  • Provenance ledger extract

    The full ordered event chain per episode in provenance/, hashes included. A bundle missing provenance/ is invalid by definition.

  • Conformity documentation

    EU AI Act Article-10 data-governance documentation and a κ QA report with per-episode scores and thresholds, in conformity/.

  • The frozen manifest

    manifest.json carries the contract version, per-episode provenance chain-head hashes, κ statistics, sensitivity counts, and REC clearance references.

Catalog

Browse or commission.

Commission datasets to your task spec, QA’d through the same audited pipeline.

Aiscéal portal customer dataset catalog listing available datasets
The customer dataset catalog in the Aiscéal portal.