A new and better way to create word lists


Mar 13, 2023

Word lists are the basis of so much research in so many fields. Researchers at the Complexity Science Hub have now developed an algorithm that can be applied to different languages and can expand word lists significantly better than others.

Many projects start with the creation of word lists. Not only in companies when mind maps are created, but also in all areas of research. Imagine you want to find out on which days people are in a particularly good mood by analyzing Twitter postings. Just looking for the word “happy” wouldn’t be enough. 

 

Instead, you would have to use an algorithm that detects all tweets that indicate that someone is happy. “So the first step is to create a list of all the words that indicate just that. The whole research stands or falls on doing so,” explains Anna Di Natale, a researcher at the Complexity Science Hub in Vienna. But how to come up with the most accurate, complete word lists possible? 

A problem that concerns many

This widespread problem not only concerns opinion researchers who want to find out how politicians’ statements are received by the public. Companies, too, want to find out how their products are perceived through sentiment analyses.

 

To improve things, Di Natale has now developed a new method, called LEXpander, that outperforms previous algorithms. And this even in two different languages – German and English. Moreover, for the very first time ever, she has developed a way through which it is possible to compare different tools at all.

Improved performance

In comparison with four other algorithms for wordlist expansion (WordNet, Empath 2.0, FastText and GloVe), LEXpander performed significantly better, especially in German. For example, the researchers found that LEXpander guesses 43% of words right when expanding an English word list for positive meaning. A very popular model, FastText, in comparison, is right only 28% of the time.

Complexity Science Hub researcher Anna Di Natale finds a new and better way to create word lists

Independence from the language itself

The reason is that this tool works language-independently. It is not based on one language, but on a so-called colexification network. This recognized linguistic concept resides on homonyms and polysemies, single words that have two or more distinct meanings. For example: the ancient Greek word φάρμακον (pharmacon) can mean medicine or poison. Two different things, but thematically close. But there are others that don’t suggest kinship – such as “bank” as a financial institution or the land alongside a river. 

 

“If you collect them across many languages – and here we analyzed about 19 different languages – you can see connections between them,” Di Natale says. The network is formed when these colexifications occur in several languages across different language families, creating connections.

 

This independence from the language itself allows LEXpander to achieve better results in different languages. “There are many methods developed for English. They work very well and quickly and everyone uses them. Trying to apply them to other languages works, but not as well as it might work if you had started developing a method for German or Italian,” Di Natale explains. 

Especially important for new topics

For many topics there are already good word lists. But for new topicslike when COVID came up – new ones have to be created. Until now, they were usually created by hand during brainstorming with colleagues and several tools were used to help. But until now there was no way to compare them. Anna Di Natale and her team have now created this possibility and have also developed a new tool that performs better than the others. This can be an important cornerstone for many future research projects in various fields.

FIND OUT MORE

The study “LEXpander: Applying colexification networks to automated lexicon expansion” has been published in Behavior Research Methods.


Press

Kindesmissbrauch: Wie Kriminelle die Anonymität des Darknets nutzen


Kurier, Sep 7, 2023

News

Sep 18, 2023

Unlocking Urban Diversity: The Magnetism of Complex Amenities

Press

L'effondrement de la dernière dynastie chinoise des Qing, un avertissement pour le futur ?


GEO, Sep 6, 2023

Press

Warum Saudi-Arabiens ehrgeizige Stadt der Zukunft nicht optimal ist


Spektrum der Wissenschaft, Sep 6, 2023

Publication

B. Méro, A. Borsos, et al.

A High-Resolution, Data-Driven Agent-Based Model of the Housing Market

Journal of Economic Dynamics and Control (2023) 104738

News

Sep 21, 2023

Curbing the Violence by Mexican Cartels

News

Sep 18, 2023

Why do some environmental shocks lead to disaster while others don't?

Publication

R. Prieto-Curiel, G. M. Capedelli, A. Hope

Reducing cartel recruitment is the only way to lower violence in Mexico

Science 381(6664) (2023) 1312-1316

Publication

S. Juhász, G. Pintér, et al.

Amenity complexity and urban locations of socio-economic mixing

EPJ Data Science 12 (2023) 34

News

Sep 21, 2023

Curbing the Violence by Mexican Cartels

News

Sep 18, 2023

Unlocking Urban Diversity: The Magnetism of Complex Amenities

News

Sep 18, 2023

Why do some environmental shocks lead to disaster while others don't?

News

Aug 31, 2023

New study uncovers the Causes of the Qing Dynasty's Collapse

News

Aug 28, 2023

CSH hosts workshop on visualizing complexity science

Spotlight

Aug 22, 2023

Wallet 2.0: What Does the Future of Money Look Like?

Spotlight

Aug 14, 2023

Open Arms Grant: How conferences can ensure global participation

News

Aug 1, 2023

Scientists develop method to spot the spread of armed conflicts

News

Jul 27, 2023

A lot of exchanges and discussions at NetSci

News

Jul 20, 2023

Prenatal malnutrition increases diabetes incidence later in life

Spotlight

Jun 29, 2023

CSH Spin-Off Iknaio receives aws seed funding

Press

Kindesmissbrauch: Wie Kriminelle die Anonymität des Darknets nutzen


Kurier, Sep 7, 2023

Press

L'effondrement de la dernière dynastie chinoise des Qing, un avertissement pour le futur ?


GEO, Sep 6, 2023

Press

Warum Saudi-Arabiens ehrgeizige Stadt der Zukunft nicht optimal ist


Spektrum der Wissenschaft, Sep 6, 2023

Press

Qing Dynasty’s Collapse Driven By Three Things, And They Could Happen To Us


IFL Science, Sep 4, 2023

Press

Warum die Qing-Dynastie unterging


ORF, Sep 5, 2023

Press

There are thousands of cities in the world, and there’s a reason none is in the shape of a line


Fast Company, Aug 12, 2023

Press

Podcast "Was wichtig ist"


Die Presse, Aug 29, 2023

Press

Das Leben ist ein einziges riesiges Netzwerk


Der Standard - Forschung Spezial, Aug 27, 2023

Press

The Military’s Recruitment of AI Has Already Begun


The Daily Beast, Aug 23, 2023

Press

„Leider sind Bürgerkriege sehr wahrscheinlich“


Die Welt, Aug 18, 2023

Publication

R. Prieto-Curiel, G. M. Capedelli, A. Hope

Reducing cartel recruitment is the only way to lower violence in Mexico

Science 381(6664) (2023) 1312-1316

Publication

B. Méro, A. Borsos, et al.

A High-Resolution, Data-Driven Agent-Based Model of the Housing Market

Journal of Economic Dynamics and Control (2023) 104738

Publication

S. Juhász, G. Pintér, et al.

Amenity complexity and urban locations of socio-economic mixing

EPJ Data Science 12 (2023) 34

Publication

R. Hanel, S. Thurner

Equivalence of information production and generalised entropies in complex processes

PLOS ONE 18(9) (2023) e0290695

Publication

K. Frenken, F. Neffke, A. van Dam

Capabilities, institutions and regional economic development: a proposed synthesis

Cambridge Journal of Regions, Economy and Society (2023) rsad021

Publication

G. Orlandi, D. Hoyer, et al.

Structural-demographic analysis of the Qing Dynasty (1644–1912) collapse in China

PLoS ONE 18(8) (2023) e0289748

Publication

N. Kushwaha, E.D. Lee

Discovering the mesoscale for chains of conflict

PNAS Nexus 2(7) (2023) pgad228

Publication

H. Metzler, D. Garcia

Social Drivers and Algorithmic Mechanisms on Digital Media

Perspectives on Psychological Science (2023)

Publication

M. Laber, P. Klimek, et al.

Shock propagation from the Russia–Ukraine conflict on international multilayer food production network determines global food availability

Nature Food (2023) doi: 10.1038/s43016-023-00771-4

Publication

M. Kaleta, et al.

Diabetes incidence in Austria: The role of famines on diabetes and related NCDs

Helyion, Volume 9, Issue 7, July 2023, e17570

Publication

D. R. Lo Sardo, S. Thurner, et al.

Systematic population-wide ecological analysis of regional variability in disease prevalence

Heliyon 9(4) (2023) e15377

Publication

R. Prieto-Curiel, J. E. Patino, B. Anderson

Scaling of the morphology of African cities

PNAS 120 (9) (2023) e2214254120