CSH Talk by Dr. Khaled El Emam: “A Synthetic Data Augmentation Approach for Mitigating Covariate Bias in Health Data”

Mar 20, 2023 | 15:0016:00

Loading Events
  • This event has passed.

Event Navigation

Professor Dr. Khaled El Emam will present a talk on Monday, March 20, 2023 at 3 PM in the Salon.


If you would like to attend the talk, please send an email to office@csh.ac.at.


Title: “A Synthetic Data Augmentation Approach for Mitigating Covariate Bias in Health Data”




Data bias (covariate imbalance) is common in real world datasets, for example, in sex, race, and age. For regression modeling, the presence of data bias produces imprecise predictions and inconsistent estimates. Our study evaluated a synthetic data generation technique (synthetic minor augmentation – SMA) as a means of mitigating the effects of bias on logistic regression models when the nature of the bias is known. The SMA approach is compared to traditional rebalancing methods such as oversampling, undersampling, SMOTE, propensity score methods, and ensemble machine learning methods. In low to medium bias severity (less than 50% missing proportion), SMA produces the results with the least bias (difference between model estimate and ground truth), and the highest confidence interval overlap than other approaches. It also performs better than other approaches on fairness metrics. Furthermore, SMA gives predictive accuracies comparable to the other approaches. In high bias cases (e.g., more than 80% missing proportion), no specific method performs consistently better. For many practical bias scenarios, SMA can be a good method for mitigating the impact of data bias on regression model performance.




Dr. Khaled El Emam is the Canada Research Chair (Tier 1) in Medical AI at the University of Ottawa, where he is a Professor in the School of Epidemiology and Public Health. He is also a Senior Scientist at the Children’s Hospital of Eastern Ontario Research Institute and Director of the multi-disciplinary Electronic Health Information Laboratory, conducting research on privacy enhancing technologies to enable the sharing of health data for secondary purposes, including synthetic data generation and de-identification methods.


Khaled is a co-founder of Replica Analytics, a company that develops synthetic data generation technology, which was recently acquired by Aetion. As an entrepreneur, Khaled founded or co-founded six product and services companies involved with data management and data analytics, with some having successful exits. Prior to his academic roles, he was a Senior Research Officer at the National Research Council of Canada. He also served as the head of the Quantitative Methods Group at the Fraunhofer Institute in Kaiserslautern, Germany.


He participates in a number of committees, number of the European Medicines Agency Technical Anonymization Group, the Panel on Research Ethics advising on the TCPS, the Strategic Advisory Council of the Office of the Information and Privacy Commissioner of Ontario, and also is co-editor-in-chief of the JMIR AI journal.


In 2003 and 2004, he was ranked as the top systems and software engineering scholar worldwide by the Journal of Systems and Software based on his research on measurement and quality evaluation and improvement. He held the Canada Research Chair in Electronic Health Information at the University of Ottawa from 2005 to 2015. Khaled has a PhD from the Department of Electrical and Electronics Engineering, King’s College, at the University of London, England.


Mar 20, 2023


Peter Klimek
Simon Lindner


Complexity Science Hub Vienna
Josefstaedter Straße 39
+ Google Map