Dataset Release
The SpineDAO Collaborative Group is releasing SpineBase SI Fusion Synthetic v1.0 — a certified synthetic dataset of 1,000 sacroiliac joint fusion patient records generated from the SpineBase multicenter registry.
This dataset is the first synthetic spine surgery dataset with formal three-domain validation (fidelity, utility, privacy) and blockchain-anchored provenance.
Dataset Characteristics
Source: 125 real SI fusion patients from IDRISS Institute (Hôpital Privé du dos Francheville, Périgueux, France), 4 surgeons
Generator: GaussianCopula (SDV v1.9.1)
Variables (25):
- Demographics: age, sex, BMI, smoking status, occupation
- Preoperative: ODI, VAS back/leg, diagnostic injection
- Operative: approach, implant count, operative time, length of stay, ASA class, laterality
- Follow-up outcomes at 3, 6, 12, and 24 months: ODI, VAS back/leg, satisfaction, would repeat
- Complications: early complication, reoperation
Cohort profile:
- Mean age: 58.3 ± 11.2 years
- Female: 68%
- Predominant approach: percutaneous SI fusion (Arthrodèse ASI percutanée radioguidée)
- Mean preoperative ODI: 39.5 ± 14.1 (severe disability range)
- Mean preoperative VAS back: 6.8/10
- 12-month satisfaction ≥ satisfied: 65.4%
- Early complication rate: 8.0%
Validation Certificate
| Domain | Metric | Result | Pass | |--------|--------|--------|------| | Fidelity | Mean KS p-value | 0.52 | ✅ | | Fidelity | Mean JS divergence | 0.13 | ✅ | | Privacy | NNDR >1.0 | 98.9% of records | ✅ | | Privacy | Membership inference AUROC | 0.57 | ✅ | | Privacy | k-anonymity proxy | 54.9 | ✅ | | Utility | TSTR Pearson r (12mo ODI) | 0.29 (N=125) | ✅ |
SHA-256 hash: d14106fb6ec76f6b9c7e5ed7dde7c1c86027faa019fc49028ee9649b98fe7dd1
Blockchain anchor: Solana mainnet NFT FJew1PXWhZUCnZLoz4WykbXBcxvbNpffPQ3qVgdTvs5U
Arweave: https://arweave.net/aTZbQl18JTZPL6sR_lj_kHxIORPcCfzIp6pONCLgXeQ
Why Synthetic?
Synthetic data allows:
- Global distribution without patient privacy risk
- No IRB restrictions on secondary use
- Expert annotation at scale (as demonstrated in our companion Spine Reviews study)
- AI model development and benchmarking
- Replication of statistical analyses without accessing real patient records
Access
The certified 1,000-patient synthetic dataset is available for research use upon request to the corresponding author (vincent@spinedao.com) subject to a data use agreement. The SHA-256 hash enables independent verification that any distributed dataset is identical to the certified version.
For access to the full 10,000-patient synthetic dataset or the SpineBase ENTERPRISE tier (real anonymized data), visit spinebase.app.
Preprint
Challier V, Jacquemin C, Diebo B, Dehouche N, Denisov A, Cristini J, Campana M, Castelain JE, Lonjon G, Lafage V, Ghailane S; on behalf of SpineDAO Collaborative Group. Validated Synthetic Data Generation from a Multicenter Spine Surgery Registry: Methodology and Benchmark. medRxiv 2026. doi: 10.64898/2026.04.07.26350316
Sign in to comment.
Comments