VIDS — Verified Imaging Dataset Standard¶

An open standard for organizing, validating, and delivering annotated medical imaging datasets for AI/ML development. Structure · Provenance · Quality.

Mandatory Provenance¶

Every annotation carries a structured record of who created it, when, with what tool, and under what quality controls. Provenance is a validation requirement, not a recommendation.

Automated Validation¶

21 machine-enforceable validation rules. One command. Binary PASS/FAIL. Compliance is determined by running the validator — not by reading a checklist. The full VIDS standard defines 22 compliance dimensions; 21 of these are machine-enforceable through the open-source validator. The remaining dimension is assessed during a Compliance Evaluation.

Framework Export¶

Curate once in VIDS, export to nnU-Net, MONAI, COCO JSON, or flat NIfTI. Provenance travels with the data through every format conversion.

Quick Start¶

Install the validator and check any dataset in seconds:

pip install vids-validator

vids-validate /path/to/dataset

vids-validate /path/to/dataset --profile full --json

Example validator output

============================================================
VIDS Validation Report
Profile: POC
Dataset: /data/lung-nodule-ct-100
============================================================

  ✅ S001: .vids marker file present
  ✅ S002: dataset_description.json valid
  ✅ S003: Participants file present (json)
  ✅ S004: README.md present
  ✅ S005: 100 subject directories found
  ✅ S006: All subjects have session directories
  ✅ I001: All subjects have imaging files
  ✅ I002: All subjects have imaging sidecar JSONs
  ✅ I003: 100 imaging JSONs valid
  ✅ I004: All files follow VIDS naming convention
  ✅ A001: derivatives/annotations/ exists
  ✅ A002: 100 segmentation files found
  ✅ A003: 100 annotation sidecar JSONs found
  ✅ A004: 100 annotation JSONs valid
  ✅ A005: All annotations have complete provenance
  ⏭  Q001: POC profile — quality/ optional
  ⏭  Q002: POC profile — quality_summary optional
  ⏭  Q003: POC profile — annotation_agreement optional
  ⏭  M001: POC profile — ml/ optional
  ⏭  M002: POC profile — splits optional
  ⚠️  D001: CHANGES.md missing (recommended)

------------------------------------------------------------
  Passed:   15
  Failed:   0
  Warnings: 1
  Skipped:  5
------------------------------------------------------------

✅ VALIDATION PASSED (15/21 rules)

What a VIDS Dataset Looks Like¶

lung-nodule-ct-100/
├── .vids                              # Profile marker (poc or full)
├── dataset_description.json           # Dataset metadata
├── participants.json                  # Subject registry
├── README.md                          # Human-readable description
│
├── sub-001/
│   └── ses-baseline/
│       └── ct/
│           ├── sub-001_ses-baseline_ct_img.nii.gz
│           └── sub-001_ses-baseline_ct_img.json
│
├── derivatives/
│   └── annotations/
│       └── sub-001/
│           └── ses-baseline/
│               └── ct/
│                   ├── sub-001_ses-baseline_ct_seg.nii.gz
│                   └── sub-001_ses-baseline_ct_seg.json    ← provenance lives here
│
├── quality/                           # Full profile only
│   ├── quality_summary.json
│   └── annotation_agreement.json
│
└── ml/                                # Full profile only
    └── splits.json

Every annotation sidecar carries structured provenance:

{
  "VIDSVersion": "1.0",
  "AnnotationType": "segmentation",
  "SourceImage": "sub-001_ses-baseline_ct_img.nii.gz",
  "Provenance": {
    "Annotator": {
      "ID": "radiologist_001",
      "Name": "Dr. Jane Smith",
      "Credentials": "MD, Board-certified radiologist, 10 years experience",
      "Specialty": "Chest Radiology"
    },
    "AnnotationProcess": {
      "Tool": "3D Slicer",
      "ToolVersion": "5.6.2",
      "Date": "2026-02-12",
      "TimeSpent_minutes": 15,
      "Method": "Manual segmentation"
    },
    "QualityControl": {
      "ReviewedBy": "senior_radiologist_001",
      "ReviewDate": "2026-02-13",
      "ReviewOutcome": "approved",
      "Confidence": 0.93
    }
  }
}

Why VIDS?¶

Medical imaging datasets used to train AI models typically lack structured, machine-readable provenance. Annotation methodology exists in journal papers, README files, or institutional memory — invisible to any automated compliance or audit process.

Three converging pressures are making this untenable:

Pressure	Problem	What VIDS Provides
Dataset duplication	Datasets are copied across platforms without attribution or provenance	Every file carries its own structured metadata
Synthetic data	AI-generated images are indistinguishable from real scans	Origin documentation is a structural requirement
Regulatory mandates	EU AI Act, FDA, CDSCO require training data transparency	Machine-readable, auditable provenance chain

How VIDS Compares¶

VIDS is complementary to existing standards — it fills the gap none of them cover.

	DICOM	BIDS	NIfTI	nnU-Net / MONAI	VIDS
Image storage
Neuro folder structure
Multi-modality datasets
Annotation provenance
Quality documentation
Automated validation
ML framework export
Compliance profiles

For Buyers of Medical Imaging Datasets¶

Acquiring annotated imaging datasets from external vendors? VIDS provides the procurement infrastructure to make dataset quality verifiable before acceptance.

Reference Procurement Language — drop-in contract clauses that make a VIDS PASS the acceptance condition for vendor deliveries. Free to use under CC BY 4.0.

Compliance Evaluation — diagnostic assessment of an existing dataset or vendor sample against the 22 VIDS dimensions. 48-hour turnaround.

Scoring Rubric — the published criteria behind every VIDS score, dimension by dimension.

Validation Attestation — the signed artifact a vendor delivers as proof of compliance.

See the full For Buyers section →

Citation¶

VIDS: A Verified Imaging Dataset Standard for Medical AI Dr. Joan S. Muthu, John Shalen — Princeton Medical Systems · arXiv:2604.17525

@misc{muthu2026vids,
  title   = {VIDS: A Verified Imaging Dataset Standard for Medical AI},
  author  = {Muthu, Joan S. and Shalen, John},
  year    = {2026},
  eprint  = {2604.17525},
  archivePrefix = {arXiv},
  primaryClass  = {eess.IV},
  url     = {https://arxiv.org/abs/2604.17525}
}

VIDS is an open standard. Specification: CC BY 4.0 · Tools: Apache 2.0

GitHub PyPI Full Specification For Buyers