Skip to content
VIDS Logo

VIDS — Verified Imaging Dataset Standard

An open standard for organizing, validating, and delivering annotated medical imaging datasets for AI/ML development. Structure · Provenance · Quality.

PyPI GitHub License Spec v1.0

Mandatory Provenance

Every annotation carries a structured record of who created it, when, with what tool, and under what quality controls. Provenance is a validation requirement, not a recommendation.

Automated Validation

21 rules. One command. Binary PASS/FAIL. Compliance is determined by running the validator — not by reading a checklist.

Framework Export

Curate once in VIDS, export to nnU-Net, MONAI, COCO JSON, or flat NIfTI. Provenance travels with the data through every format conversion.


Quick Start

Install the validator and check any dataset in seconds:

pip install vids-validator
vids-validate /path/to/dataset
vids-validate /path/to/dataset --profile full --json
Example validator output
============================================================
VIDS Validation Report
Profile: POC
Dataset: /data/lung-nodule-ct-100
============================================================

  ✅ S001: .vids marker file present
  ✅ S002: dataset_description.json valid
  ✅ S003: Participants file present (json)
  ✅ S004: README.md present
  ✅ S005: 100 subject directories found
  ✅ S006: All subjects have session directories
  ✅ I001: All subjects have imaging files
  ✅ I002: All subjects have imaging sidecar JSONs
  ✅ I003: 100 imaging JSONs valid
  ✅ I004: All files follow VIDS naming convention
  ✅ A001: derivatives/annotations/ exists
  ✅ A002: 100 segmentation files found
  ✅ A003: 100 annotation sidecar JSONs found
  ✅ A004: 100 annotation JSONs valid
  ✅ A005: All annotations have complete provenance
  ⏭  Q001: POC profile — quality/ optional
  ⏭  Q002: POC profile — quality_summary optional
  ⏭  Q003: POC profile — annotation_agreement optional
  ⏭  M001: POC profile — ml/ optional
  ⏭  M002: POC profile — splits optional
  ⚠️  D001: CHANGES.md missing (recommended)

------------------------------------------------------------
  Passed:   15
  Failed:   0
  Warnings: 1
  Skipped:  5
------------------------------------------------------------

✅ VALIDATION PASSED (15/21 rules)

What a VIDS Dataset Looks Like

lung-nodule-ct-100/
├── .vids                              # Profile marker (poc or full)
├── dataset_description.json           # Dataset metadata
├── participants.json                  # Subject registry
├── README.md                          # Human-readable description
├── sub-001/
│   └── ses-baseline/
│       └── ct/
│           ├── sub-001_ses-baseline_ct_img.nii.gz
│           └── sub-001_ses-baseline_ct_img.json
├── derivatives/
│   └── annotations/
│       └── sub-001/
│           └── ses-baseline/
│               └── ct/
│                   ├── sub-001_ses-baseline_ct_seg.nii.gz
│                   └── sub-001_ses-baseline_ct_seg.json    ← provenance lives here
├── quality/                           # Full profile only
│   ├── quality_summary.json
│   └── annotation_agreement.json
└── ml/                                # Full profile only
    └── splits.json

Every annotation sidecar carries structured provenance:

{
  "VIDSVersion": "1.0",
  "AnnotationType": "segmentation",
  "SourceImage": "sub-001_ses-baseline_ct_img.nii.gz",
  "Provenance": {
    "Annotator": {
      "ID": "radiologist_001",
      "Name": "Dr. Jane Smith",
      "Credentials": "MD, Board-certified radiologist, 10 years experience",
      "Specialty": "Chest Radiology"
    },
    "AnnotationProcess": {
      "Tool": "3D Slicer",
      "ToolVersion": "5.6.2",
      "Date": "2026-02-12",
      "TimeSpent_minutes": 15,
      "Method": "Manual segmentation"
    },
    "QualityControl": {
      "ReviewedBy": "senior_radiologist_001",
      "ReviewDate": "2026-02-13",
      "ReviewOutcome": "approved",
      "Confidence": 0.93
    }
  }
}

Why VIDS?

Medical imaging datasets used to train AI models typically lack structured, machine-readable provenance. Annotation methodology exists in journal papers, README files, or institutional memory — invisible to any automated compliance or audit process.

Three converging pressures are making this untenable:

Pressure Problem What VIDS Provides
Dataset duplication Datasets are copied across platforms without attribution or provenance Every file carries its own structured metadata
Synthetic data AI-generated images are indistinguishable from real scans Origin documentation is a structural requirement
Regulatory mandates EU AI Act, FDA, CDSCO require training data transparency Machine-readable, auditable provenance chain

How VIDS Compares

VIDS is complementary to existing standards — it fills the gap none of them cover.

DICOM BIDS NIfTI nnU-Net / MONAI VIDS
Image storage
Neuro folder structure
Multi-modality datasets
Annotation provenance
Quality documentation
Automated validation
ML framework export
Compliance profiles

Citation

@misc{vids2026,
  title   = {VIDS: Verified Imaging Dataset Standard, Specification v1.0},
  author  = {{Princeton Medical Systems}},
  year    = {2026},
  version = {1.0},
  url     = {https://vidsstandard.org/spec/1.0}
}

VIDS is an open standard. Specification: CC BY 4.0 · Tools: Apache 2.0

GitHub PyPI Full Specification