The lecture by Kristina Lerman from University of Southern California will take place at the Complexity Science Hub Vienna in room 201.
If you are interested in participating, please email to office@csh.ac.at
Abstract
Social data often comes from a heterogeneous population composed of non-randomly sampled subgroups, each with different characteristics and behaviors. A population-level trend in the aggregate data may disappear or reverse itself when the same data is disaggregated into its underlying subgroups. This effect, known as Simpson’s paradox, confounds analyses of social data, including inference of trends and causal effects.
I illustrate the problem with several examples showing how the paradox can distort conclusions of analysis, and describe recent algorithmic efforts to address this problem. Our algorithm systematically disaggregates data to identify subgroups whose behavior deviates significantly from the rest of the population. The method allows us to leverage Simpson’s paradox to uncover interesting patterns in real-world social data, such as Q&A site Stack Exchange and online learning platforms Khan Academy and Duolingo.
About Kristina Lerman
Kristina Lerman is a Principal Scientist at the University of Southern California Information Sciences Institute and holds a joint appointment as a Research Associate Professor in the USC Computer Science Department. Trained as a physicist, she now applies network analysis and machine learning to problems in computational social science, including crowdsourcing, social network and social media analysis. Her recent work on modeling and understanding cognitive biases in social networks has been covered by the Washington Post, Wall Street Journal, and MIT Tech Review.