A new and better way to create word lists


Mar 13, 2023

Word lists are the basis of so much research in so many fields. Researchers at the Complexity Science Hub have now developed an algorithm that can be applied to different languages and can expand word lists significantly better than others.

Many projects start with the creation of word lists. Not only in companies when mind maps are created, but also in all areas of research. Imagine you want to find out on which days people are in a particularly good mood by analyzing Twitter postings. Just looking for the word “happy” wouldn’t be enough. 

 

Instead, you would have to use an algorithm that detects all tweets that indicate that someone is happy. “So the first step is to create a list of all the words that indicate just that. The whole research stands or falls on doing so,” explains Anna Di Natale, a researcher at the Complexity Science Hub in Vienna. But how to come up with the most accurate, complete word lists possible? 

A problem that concerns many

This widespread problem not only concerns opinion researchers who want to find out how politicians’ statements are received by the public. Companies, too, want to find out how their products are perceived through sentiment analyses.

 

To improve things, Di Natale has now developed a new method, called LEXpander, that outperforms previous algorithms. And this even in two different languages – German and English. Moreover, for the very first time ever, she has developed a way through which it is possible to compare different tools at all.

Improved performance

In comparison with four other algorithms for wordlist expansion (WordNet, Empath 2.0, FastText and GloVe), LEXpander performed significantly better, especially in German. For example, the researchers found that LEXpander guesses 43% of words right when expanding an English word list for positive meaning. A very popular model, FastText, in comparison, is right only 28% of the time.

Complexity Science Hub researcher Anna Di Natale finds a new and better way to create word lists

Independence from the language itself

The reason is that this tool works language-independently. It is not based on one language, but on a so-called colexification network. This recognized linguistic concept resides on homonyms and polysemies, single words that have two or more distinct meanings. For example: the ancient Greek word φάρμακον (pharmacon) can mean medicine or poison. Two different things, but thematically close. But there are others that don’t suggest kinship – such as “bank” as a financial institution or the land alongside a river. 

 

“If you collect them across many languages – and here we analyzed about 19 different languages – you can see connections between them,” Di Natale says. The network is formed when these colexifications occur in several languages across different language families, creating connections.

 

This independence from the language itself allows LEXpander to achieve better results in different languages. “There are many methods developed for English. They work very well and quickly and everyone uses them. Trying to apply them to other languages works, but not as well as it might work if you had started developing a method for German or Italian,” Di Natale explains. 

Especially important for new topics

For many topics there are already good word lists. But for new topicslike when COVID came up – new ones have to be created. Until now, they were usually created by hand during brainstorming with colleagues and several tools were used to help. But until now there was no way to compare them. Anna Di Natale and her team have now created this possibility and have also developed a new tool that performs better than the others. This can be an important cornerstone for many future research projects in various fields.

FIND OUT MORE

The study “LEXpander: Applying colexification networks to automated lexicon expansion” has been published in Behavior Research Methods.


Event

CSH Talk by Tatiana Filatova: "Cities, markets and climate change: a complexity perspective"


Apr 19, 2023 | 15:0016:00

Complexity Science Hub Vienna

Press

Sturm statt Elfenbeinturm [feat. Hannah Metzler]


Falter, Mar 21, 2023

Publication

N. Pontika, T. Klebel, A. Correia, H. Metzler, P. Knoth, T. Ross-Hellauer

Indicators of research quality, quantity, openness and responsibility in institutional review, promotion and tenure policies across seven countries

Quantitative Science Studies 1-49

Publication

V.D.P. Servedio, M. R. Ferreira, N. Reisz, R. Costas, S. Thurner

Scale-free growth in regional scientific capacity building explains long-term scientific dominance

Chaos, Solitons & Fractals 167 (2023) 113020

Research News

Mar 16, 2023

How countries can benefit from linking data

Event

CSH Talk by Norbert Streitz: "Perspectives for Designing Complex Citizens-City Relationships: Participation, Cooperation, Co-Creation and Empowerment"


May 03, 2023 | 15:0016:00

Complexity Science Hub Vienna

Research News

Mar 20, 2023

Climate-tech innovation needs corporate investment

Research News

Mar 16, 2023

A new approach to measuring multidirectional polarization

Event

CSH-ITU Copenhagen Workshop: "Sustainable Mobility: Data, Networks, and Complexity"


Apr 14, 2023 | 8:3013:00

Complexity Science Hub Vienna

Press

"War of Words: Scientists Reveal How to Create the Ultimate Word List for Different Languages" [feat. Anna Di Natale]


Newswav, Mar 14, 2023

Press

Impfpflicht und politischer Einfluss: Experten ziehen Bilanz über drei Jahre Pandemie [feat. Peter Klimek]


Die Presse, Mar 19, 2023

Publication

W. Schueller, J. Wachs, V. D. P. Servedio, S. Thurner, V. Loreto

Evolving collaboration, dependencies, and use in the Rust Open Source Software ecosystem

Scientific Data 9 (2022) 703

Event

CSH Talk by Tatiana Filatova: "Cities, markets and climate change: a complexity perspective"


Apr 19, 2023 | 15:0016:00

Complexity Science Hub Vienna

Press

Sturm statt Elfenbeinturm [feat. Hannah Metzler]


Falter, Mar 21, 2023

Publication

N. Pontika, T. Klebel, A. Correia, H. Metzler, P. Knoth, T. Ross-Hellauer

Indicators of research quality, quantity, openness and responsibility in institutional review, promotion and tenure policies across seven countries

Quantitative Science Studies 1-49

Publication

V.D.P. Servedio, M. R. Ferreira, N. Reisz, R. Costas, S. Thurner

Scale-free growth in regional scientific capacity building explains long-term scientific dominance

Chaos, Solitons & Fractals 167 (2023) 113020

Research News

Mar 16, 2023

How countries can benefit from linking data

Event

CSH Talk by Norbert Streitz: "Perspectives for Designing Complex Citizens-City Relationships: Participation, Cooperation, Co-Creation and Empowerment"


May 03, 2023 | 15:0016:00

Complexity Science Hub Vienna

Research News

Mar 20, 2023

Climate-tech innovation needs corporate investment

Research News

Mar 16, 2023

A new approach to measuring multidirectional polarization

Event

CSH-ITU Copenhagen Workshop: "Sustainable Mobility: Data, Networks, and Complexity"


Apr 14, 2023 | 8:3013:00

Complexity Science Hub Vienna

Press

"War of Words: Scientists Reveal How to Create the Ultimate Word List for Different Languages" [feat. Anna Di Natale]


Newswav, Mar 14, 2023

Press

Impfpflicht und politischer Einfluss: Experten ziehen Bilanz über drei Jahre Pandemie [feat. Peter Klimek]


Die Presse, Mar 19, 2023

Publication

W. Schueller, J. Wachs, V. D. P. Servedio, S. Thurner, V. Loreto

Evolving collaboration, dependencies, and use in the Rust Open Source Software ecosystem

Scientific Data 9 (2022) 703

Research News

Mar 20, 2023

Climate-tech innovation needs corporate investment

Research News

Mar 16, 2023

A new approach to measuring multidirectional polarization

Research News

Mar 16, 2023

How countries can benefit from linking data

News

Mar 15, 2023

Meet the "data magician"

Spotlight

Mar 14, 2023

Irrationally fascinating

News

Mar 13, 2023

Collapse of Silicon Valley Bank (SVB) - a comment by Bernhard Haslhofer

Research News

Mar 13, 2023

A new and better way to create word lists

News

Mar 6, 2023

CSH takes the lead in supply chain research with ASCII

News

Mar 6, 2023

CSH übernimmt mit ASCII eine Vorreiterrolle in der Lieferkettenforschung

News

Mar 6, 2023

Peter Klimek: „Wir müssen die Produktions- und Liefernetzwerke besser kennenlernen“

Press

Sturm statt Elfenbeinturm [feat. Hannah Metzler]


Falter, Mar 21, 2023

Press

Impfpflicht und politischer Einfluss: Experten ziehen Bilanz über drei Jahre Pandemie [feat. Peter Klimek]


Die Presse, Mar 19, 2023

Press

"War of Words: Scientists Reveal How to Create the Ultimate Word List for Different Languages" [feat. Anna Di Natale]


Newswav, Mar 14, 2023

Press

Frauen in der Chefetage – das vernachlässigte Potenzial [feat. Matthias Raddant]


Wiener Zeitung, Mar 8, 2023

Press

Top-Forscher Klimek widmet sich dem Lieferketten-Problem [feat.Peter Klimek]


Kurier.at, Mar 6, 2023

Press

Modell prognostiziert städtisches Verkehrsaufkommen [feat.Simone Daniotti]


ORF, Feb 28, 2023

Press

Measuring 6,000 African cities: Double [feat.Rafael Prieto-Curiel]


ScienceDaily, Feb 28, 2023

Press

Wie Afrikas künftige Megastädte Energie sparen [feat.Rafael Prieto-Curiel]


Nachrichten.at, Feb 23, 2023

Press

Klimek: Aufarbeitung ist mehr als "Lockdown-Tage zählen" [feat.Peter Klimek]


MSN Austria, Feb 17, 2023

Press

Anonymität bei Krypto-Deals aufheben [feat.Bernhard Haslhofer]


ORF, Feb 16, 2023

Press

Tweets zu Suizidprävention können Suizide verhindern [feat.Hannah Metzler, Thomas Niederkrotenthaler]


Anästhesie Nachrichten, Feb 9, 2023

Publication

V.D.P. Servedio, M. R. Ferreira, N. Reisz, R. Costas, S. Thurner

Scale-free growth in regional scientific capacity building explains long-term scientific dominance

Chaos, Solitons & Fractals 167 (2023) 113020

Publication

M. Leutner, M. Butylina, C. Matzhold, et al.

Simvastatin therapy in higher dosages deteriorates bone quality: Consistent evidence from population-wide patient data and interventional mouse studies

Biomedicine & Pharmacotherapy 158 (2023) 114089

Publication

W. Schueller, J. Wachs, V. D. P. Servedio, S. Thurner, V. Loreto

Evolving collaboration, dependencies, and use in the Rust Open Source Software ecosystem

Scientific Data 9 (2022) 703

Publication

N. Pontika, T. Klebel, A. Correia, H. Metzler, P. Knoth, T. Ross-Hellauer

Indicators of research quality, quantity, openness and responsibility in institutional review, promotion and tenure policies across seven countries

Quantitative Science Studies 1-49

Publication

H. Kong, S. Martin-Gutierrez, F. Karimi

Influence of the first-mover advantage on the gender disparities in physics citations

Communications Physics 5 (243) (2022)

Publication

A. Pichler, M. Pangallo, M. del Rio-Chanona, F. Lafond, D. Farmer

Forecasting the propagation of pandemic shocks with a dynamic input-output model

Journal of Economic Dynamics and Control (2022) 104527

Publication

J. Lasser, S. Taofeek Aroyehun, et al.

Social media sharing of low quality news sources by political elites

PNAS Nexus (2022) pgac186

Publication

R. Prieto Curiel, A. Schumann, I. Heo, P. Heinrigs

Detecting cities with high intermediacy in the African urban network

Computers, Environment and Urban Systems 98 (2022) 101869

Publication

T. Reisch, G. Heiler, C. Diem, P. Klimek, S. Thurner

Monitoring supply networks from mobile phone data for estimating the systemic risk of an economy

Scientific Reports 12 (13347) (2022)

Publication

G. De Marzo, F. Pandolfelli, V.D.P. Servedio

Modeling innovation in the cryptocurrency ecosystem

Scientific Reports 12 (12942) (2022)

Publication

A. Nerpel, et al.

SARS-ANI: a global open access dataset of reported SARS-CoV-2 events in animals

Scientific Data 9 (438) (2022)

Publication

M. Kaleta, J. Lasser, E. Dervic, et al.

Stress-testing the resilience of the Austrian healthcare system using agent-based simulation

Nature Communications 13 (4259) (2022)

Event

CSH Talk by Norbert Streitz: "Perspectives for Designing Complex Citizens-City Relationships: Participation, Cooperation, Co-Creation and Empowerment"


May 03, 2023 | 15:0016:00

Complexity Science Hub Vienna

Event

CSH-ITU Copenhagen Workshop: "Sustainable Mobility: Data, Networks, and Complexity"


Apr 14, 2023 | 8:3013:00

Complexity Science Hub Vienna

Event

CSH Talk by Tatiana Filatova: "Cities, markets and climate change: a complexity perspective"


Apr 19, 2023 | 15:0016:00

Complexity Science Hub Vienna

Event

CSH Talk by Jan Korbel: "Homophily-based social group formation in a spin-glass self-assembly framework"


Mar 31, 2023 | 15:0016:00

Complexity Science Hub Vienna