Skip to content
VIDS Logo

VIDS — Verified Imaging Dataset Standard

An open standard for organizing, validating, and delivering annotated medical imaging datasets for AI/ML development. Structure · Provenance · Quality.

PyPI GitHub License Spec v1.0

Mandatory Provenance

Every annotation carries a structured record of who created it, when, with what tool, and under what quality controls. Provenance is a validation requirement, not a recommendation.

Automated Validation

21 machine-enforceable validation rules. One command. Binary PASS/FAIL. Compliance is determined by running the validator — not by reading a checklist. The full VIDS standard defines 22 compliance dimensions; 21 of these are machine-enforceable through the open-source validator. The remaining dimension is assessed during a Compliance Evaluation.

Framework Export

Curate once in VIDS, export to nnU-Net, MONAI, COCO JSON, or flat NIfTI. Provenance travels with the data through every format conversion.


Quick Start

Install the validator and check any dataset in seconds:

pip install vids-validator
vids-validate /path/to/dataset
vids-validate /path/to/dataset --profile full --json
Example validator output
============================================================
VIDS Validation Report
Profile: POC
Dataset: /data/lung-nodule-ct-100
============================================================

  ✅ S001: .vids marker file present
  ✅ S002: dataset_description.json valid
  ✅ S003: Participants file present (json)
  ✅ S004: README.md present
  ✅ S005: 100 subject directories found
  ✅ S006: All subjects have session directories
  ✅ I001: All subjects have imaging files
  ✅ I002: All subjects have imaging sidecar JSONs
  ✅ I003: 100 imaging JSONs valid
  ✅ I004: All files follow VIDS naming convention
  ✅ A001: derivatives/annotations/ exists
  ✅ A002: 100 segmentation files found
  ✅ A003: 100 annotation sidecar JSONs found
  ✅ A004: 100 annotation JSONs valid
  ✅ A005: All annotations have complete provenance
  ⏭  Q001: POC profile — quality/ optional
  ⏭  Q002: POC profile — quality_summary optional
  ⏭  Q003: POC profile — annotation_agreement optional
  ⏭  M001: POC profile — ml/ optional
  ⏭  M002: POC profile — splits optional
  ⚠️  D001: CHANGES.md missing (recommended)

------------------------------------------------------------
  Passed:   15
  Failed:   0
  Warnings: 1
  Skipped:  5
------------------------------------------------------------

✅ VALIDATION PASSED (15/21 rules)

What a VIDS Dataset Looks Like

lung-nodule-ct-100/
├── .vids                              # Profile marker (poc or full)
├── dataset_description.json           # Dataset metadata
├── participants.json                  # Subject registry
├── README.md                          # Human-readable description
├── sub-001/
│   └── ses-baseline/
│       └── ct/
│           ├── sub-001_ses-baseline_ct_img.nii.gz
│           └── sub-001_ses-baseline_ct_img.json
├── derivatives/
│   └── annotations/
│       └── sub-001/
│           └── ses-baseline/
│               └── ct/
│                   ├── sub-001_ses-baseline_ct_seg.nii.gz
│                   └── sub-001_ses-baseline_ct_seg.json    ← provenance lives here
├── quality/                           # Full profile only
│   ├── quality_summary.json
│   └── annotation_agreement.json
└── ml/                                # Full profile only
    └── splits.json

Every annotation sidecar carries structured provenance:

{
  "VIDSVersion": "1.0",
  "AnnotationType": "segmentation",
  "SourceImage": "sub-001_ses-baseline_ct_img.nii.gz",
  "Provenance": {
    "Annotator": {
      "ID": "radiologist_001",
      "Name": "Dr. Jane Smith",
      "Credentials": "MD, Board-certified radiologist, 10 years experience",
      "Specialty": "Chest Radiology"
    },
    "AnnotationProcess": {
      "Tool": "3D Slicer",
      "ToolVersion": "5.6.2",
      "Date": "2026-02-12",
      "TimeSpent_minutes": 15,
      "Method": "Manual segmentation"
    },
    "QualityControl": {
      "ReviewedBy": "senior_radiologist_001",
      "ReviewDate": "2026-02-13",
      "ReviewOutcome": "approved",
      "Confidence": 0.93
    }
  }
}

Why VIDS?

Medical imaging datasets used to train AI models typically lack structured, machine-readable provenance. Annotation methodology exists in journal papers, README files, or institutional memory — invisible to any automated compliance or audit process.

Three converging pressures are making this untenable:

Pressure Problem What VIDS Provides
Dataset duplication Datasets are copied across platforms without attribution or provenance Every file carries its own structured metadata
Synthetic data AI-generated images are indistinguishable from real scans Origin documentation is a structural requirement
Regulatory mandates EU AI Act, FDA, CDSCO require training data transparency Machine-readable, auditable provenance chain

How VIDS Compares

VIDS is complementary to existing standards — it fills the gap none of them cover.

DICOM BIDS NIfTI nnU-Net / MONAI VIDS
Image storage
Neuro folder structure
Multi-modality datasets
Annotation provenance
Quality documentation
Automated validation
ML framework export
Compliance profiles

For Buyers of Medical Imaging Datasets

Acquiring annotated imaging datasets from external vendors? VIDS provides the procurement infrastructure to make dataset quality verifiable before acceptance.

Reference Procurement Language — drop-in contract clauses that make a VIDS PASS the acceptance condition for vendor deliveries. Free to use under CC BY 4.0.

Compliance Evaluation — diagnostic assessment of an existing dataset or vendor sample against the 22 VIDS dimensions. 48-hour turnaround.

Scoring Rubric — the published criteria behind every VIDS score, dimension by dimension.

Validation Attestation — the signed artifact a vendor delivers as proof of compliance.

See the full For Buyers section →


Citation

VIDS: A Verified Imaging Dataset Standard for Medical AI Dr. Joan S. Muthu, John Shalen — Princeton Medical Systems · arXiv:2604.17525

@misc{muthu2026vids,
  title   = {VIDS: A Verified Imaging Dataset Standard for Medical AI},
  author  = {Muthu, Joan S. and Shalen, John},
  year    = {2026},
  eprint  = {2604.17525},
  archivePrefix = {arXiv},
  primaryClass  = {eess.IV},
  url     = {https://arxiv.org/abs/2604.17525}
}

VIDS is an open standard. Specification: CC BY 4.0 · Tools: Apache 2.0

GitHub PyPI Full Specification For Buyers