Promptly Health · JPR / PaLaDIn · Synthetic Dataset Specification v1.0

Parkinson’s Disease — Synthetic Dataset Data Requirements

Item-level data specification for the PD synthetic dataset model: registry (JPR@Research+Me, PPMI, UKPEN), PROMs (Interactium platform), wearable digital biomarkers, and clinical trial (EJS ACT-PD) use cases. Domains A–K: demographics, genetics, motor, non-motor, PROMs, treatment, digital, trial, healthcare events, equity, milestones. FHIR R4 UK Core and OMOP CDM v5.4 mappings provided. GREEN = routinely capturable · AMBER = partial or effort required · RED = significant gap.
Distinct sources
13
Across 6 categories
Data items
-
One row per field
Requirements
78
Across domains A–K
Domain sections
11
A=Demographics to K=Milestones
Gold outputs
3
Registry + trial readouts
Category:
#SourceCategoryItemsTagDescriptionVariantsSpec
Feasibility:GREENAMBERRED
Type:SOURCEENTRYSILVERCALCGOLD
A–K:A=Demog · B=Genetics · C=Motor · D=Non-Motor · E=PROMs · F=Treatment · G=Digital · H=Trial · I=Healthcare · J=Equity · K=Milestones
Domain:
Rating:
Type:
Source:
#DomID RequirementData Item RatingSource(s)Spec Field NameType FHIR R4OMOP CDM Notes / Implementation
Interactium PROMs Platform · PDQ-39 Oxford · EMA / FDA PRO Guidance
Patient-reported outcome and experience measures
PD-specific instruments: PDQ-39 (39-item, 8 domains, Summary Index 0–100) is the dominant PD-specific HRQL measure — FDA and EMA have both accepted PDQ-39 SI as a trial endpoint. PDQ-8 (8-item short form) is appropriate for high-frequency monitoring. NMSS covers 30 non-motor symptoms across 9 domains. Parkinson’s Fatigue Scale (PFS-16) addresses the most burdensome non-motor symptom in many patients. PDSS-2 captures sleep quality. Hauser ON/OFF diaries are required for fluctuator trials.

Generic instruments: EQ-5D-5L (utility + VAS) supports health-economic modelling and is required for NICE appraisals. PROMIS Fatigue and Pain Interference Short Forms are FDA-qualified. PHQ-9 and GAD-7 address depression and anxiety (each prevalent in ~30–40% of PD patients).

Carer-specific: PDQ-Carer and Zarit Burden Interview (12-item) are required given the disproportionate carer burden in PD.

PDQ-39 Oxford EQ-5D-5L PROMIS
#DomIDRequirementField NameRatingNotes
PKG · Verily Study Watch · Empatica · Digital Biomarkers for PD
Digital & wearable biomarker streams
Digital biomarkers for PD are at varying stages of regulatory qualification. The Personal KinetiGraph (PKG) device generates FDA-recognised bradykinesia and dyskinesia scores from 6-day continuous wrist wear. PPMI’s Verily Study Watch programme (2023–) provides high-resolution IMU data.

Three generative streams for synthetic data: (1) Tabular clinical visit data — CTGAN/TabSyn. (2) Summary wearable features — aggregated daily statistics, modelled as tabular. (3) Raw time series — aspirational; requires temporal generative models.

ON/OFF state tagging is critical: Tremor and bradykinesia readings have no meaning without medication state at time of measurement. PKG v4+ provides automated ON/OFF state detection per 2-minute epoch.

PKG / Global Kinetics PPMI Wearables
#DomIDRequirementField NameRatingNotes
JPR@Research+Me · EJS ACT-PD · MRC CTU UCL · Newcastle / Plymouth
Trial enrolment & JPR registry interface
JPR@Research+Me (launched July 2025) is a national UK recruitment-matching registry co-led by Newcastle University (Prof. Camille Carroll) and University Hospitals Plymouth NHS Trust, built on the Research+Me platform. First study: EJS ACT-PD at 43 UK sites. Primary endpoint: MDS-UPDRS II+III combined in the practically-defined OFF state after overnight dopaminergic withdrawal.

JPR data model role in synthetic dataset: JPR is a recruitment-matching registry, not a deep phenotyping registry. Its records serve as: (1) recruitment simulation layer; (2) linkage anchor for consent-based NHS data linkage; (3) population re-weighting toward real UK PD prevalence (older, more comorbid, less enriched than PPMI).

JPR@Research+Me MRC CTU at UCL
#DomIDRequirementField NameRatingNotes
HL7 FHIR R4 · UK Core · LOINC · SNOMED CT
FHIR R4 resource mappings
PD data items mapped to FHIR R4 resources using UK Core profiles. PROMs are modelled as QuestionnaireResponse resources linked to a Questionnaire definition. Clinical assessments use Observation with LOINC codes. Medications use MedicationStatement with LEDD as a derived Observation. DBS uses Procedure with SNOMED CT. Consent uses the UK Core Consent profile.
#FieldFHIR ResourcePath / ElementTerminologyUK Core ProfileNotes
OMOP CDM v5.4 · SNOMED CT · LOINC · RxNorm · OHDSI
OMOP CDM v5.4 domain mappings
PD diagnosis maps to CONDITION_OCCURRENCE (SNOMED 49049000 → concept 313414). MDS-UPDRS subscores map to MEASUREMENT. PROMs use the SURVEY_CONDUCT extension (CDM v5.4). Medications map to DRUG_EXPOSURE with LEDD derived to a MEASUREMENT. DBS maps to PROCEDURE_OCCURRENCE. Death uses the DEATH table. Custom concept IDs use the PaLaDIn namespace.
#FieldOMOP TableColumnDomain / ConceptNotes
Domains A–K · Registry Architecture · Development Roadmap
Domain rationale & development considerations
Each domain represents a thematic cluster of data items in the PD synthetic dataset specification. Cards below summarise the clinical and scientific rationale for inclusion in a Parkinson’s research registry, and flag key considerations for further development. RED-rated items within a domain indicate known data gaps where targeted investment is required to reach full registry capability.
Promptly Synthetic Data Pipeline · CTGAN / TabSyn / Diffusion · PaLaDIn Architecture
Synthetic data: state of the art, frontiers, and domain roadmap
Synthetic data generation for Parkinson’s disease sits at the intersection of two converging advances: the maturation of tabular generative models (CTGAN, TabSyn, TabDDPM) and the emergence of PD-specific high-density longitudinal datasets (PPMI 2022 WGS release, PPMI Verily smartwatch programme). The Promptly 6-stage pipeline — from rule-based Stage 1 through to iteratively refined high-fidelity Stage 4/5 — is the operational framework deployed for TREAT-NMD (PaLaDIn programme) and directly transferable to the PD FLDN. The core limitation remains unchanged: synthetic data cannot create new biological truths, cannot overcome unmeasured confounders, and cannot validate itself — real-world data must anchor every generation step. The domain sections below map state of the art, frontiers, and concrete next steps for each of the 11 data domains.

Key references: Miletic & Sariyar 2025 (longitudinal synthetic health data systematic review) · May 2025 medrxiv preprint (quantitative SDG comparison on PPMI tabular data) · Li et al. 2025 (diffusion models for tabular data survey) · Elvatun et al. 2025 (synthetic external control arms, PLOS Digital Health) · AI-generated Parkinson’s diaries evaluation (medrxiv May 2025).
Promptly 6-stage generation pipeline
Stage 1
Rule-Based Generation
Deterministic + stochastic rules. Fast, explainable, regulator-friendly.
Low fidelity
Stage 2
Clinical Panel Augmentation
Multidisciplinary panel encodes tacit clinical knowledge and care pathways.
Medium fidelity
Stage 3
RWD Preparation
Bias tagging, missingness encoding, temporal alignment, PPRL linkage.
Conditioning
Stage 4
Hybrid DL Generation
CTGAN / TabSyn / TabDDPM conditioned on RWD priors with hard clinical constraints.
High fidelity
Stage 5
Multi-layer Validation
Statistical, ML utility, clinical panel face validity, patient panel burden realism.
Validation
Stage 6
Iterative Refinement
Validation feedback loop, versioned datasets, new RWD integration, SoC updates.
Continuous
NICE QS164 · NICE NG71 · GIRFT Rec 15 · NNAG Optimal Pathway · UKPEN 2022 Audit
Care pathway conformance — standard vs actual patient experience
Domain L encodes the gap between the standardised Parkinson’s care pathway and the actual patient experience. Each field maps to a specific NICE quality statement (QS164), GIRFT recommendation, NNAG optimal pathway checkpoint, or UKPEN audit metric. The PATHWAY_CONFORMANCE_SCORE composite counts QS164 statements met per patient, enabling patient-level and site-level benchmarking analogous to SSNAP’s stroke care bundle scoring.

Key GIRFT findings this domain addresses: 44% of DGH sites lack a dedicated PD clinic · only 42% of inpatients received medication on time (UKPEN 2022) · DBS referral pathways historically rather than clinically structured · PDNS distribution unreviewed nationally · lack of joined-up care was the primary patient complaint.
#DomIDStandard / RequirementField NameRatingNotes & Benchmark