A comparison of synthetic data generation and federated analysis for enabling international evaluations of cardiovascular health
Sharing health data for research purposes across international jurisdictions has been a challenge due to privacy concerns. Two privacy enhancing technologies that can enable such sharing are synthetic data generation (SDG) and federated analysis, but their relative strengths and weaknesses have not been evaluated thus far. In this study we compared SDG with federated analysis to enable such international comparative studies.
The objective of the analysis was to assess country-level differences in the role of sex on cardiovascular health (CVH) using a pooled dataset of Canadian and Austrian individuals. The Canadian data was synthesized and sent to the Austrian team for analysis. The utility of the pooled (synthetic Canadian + real Austrian) dataset was evaluated by comparing the regression results from the two approaches. The privacy of the Canadian synthetic data was assessed using a membership disclosure test which showed an F1 score of 0.001, indicating low privacy risk. The outcome variable of interest was CVH, calculated through a modified CANHEART index. The main and interaction effect parameter estimates of the federated and pooled analyses were consistent and directionally the same.
It took approximately one month to set up the synthetic data generation platform and generate the synthetic data, whereas it took over 1.5 years to set up the federated analysis system. Synthetic data generation can be an efficient and effective tool for enabling multi-jurisdictional studies while addressing privacy concerns.
Zahra Azizi, Simon Lindner, Yumika Shiba, Valeria Raparelli, Colleen M. Norris, Karolina Kublickiene, Maria Trinidad Herrero, Alexandra Kautzky-Willer, Peter Klimek, Teresa Gisinger, Louise Pilote & Khaled El Emam, A comparison of synthetic data generation and federated analysis for enabling international evaluations of cardiovascular health, Scientific Reports volume 13, Article number: 11540 (2023)