A systematic evaluation of text mining methods for short texts: Mapping individuals' internal states from online posts
Source
Behavior Research Methods, 56, 4, (2024), pp. 2782-2803ISSN
Publication type
Article / Letter to editor
Display more detailsDisplay less details
Organization
SW OZ RSCR SOC
Journal title
Behavior Research Methods
Volume
vol. 56
Issue
iss. 4
Languages used
English (eng)
Page start
p. 2782
Page end
p. 2803
Subject
Inequality, cohesion and modernization; Ongelijkheid, cohesie en moderniseringAbstract
Short texts generated by individuals in online environments can provide social and behavioral scientists with rich insights into these individuals' internal states. Trained manual coders can reliably interpret expressions of such internal states in text. However, manual coding imposes restrictions on the number of texts that can be analyzed, limiting our ability to extract insights from large-scale textual data. We evaluate the performance of several automatic text analysis methods in approximating trained human coders' evaluations across four coding tasks encompassing expressions of motives, norms, emotions, and stances. Our findings suggest that commonly used dictionaries, although performing well in identifying infrequent categories, generate false positives too frequently compared to other methods. We show that large language models trained on manually coded data yield the highest performance across all case studies. However, there are also instances where simpler methods show almost equal performance. Additionally, we evaluate the effectiveness of cutting-edge generative language models like GPT-4 in coding texts for internal states with the help of short instructions (so-called zero-shot classification). While promising, these models fall short of the performance of models trained on manually analyzed data. We discuss the strengths and weaknesses of various models and explore the trade-offs between model complexity and performance in different applications. Our work informs social and behavioral scientists of the challenges associated with text mining of large textual datasets, while providing best-practice recommendations.
This item appears in the following Collection(s)
- Academic publications [245263]
- Electronic publications [132514]
- Faculty of Social Sciences [30345]
- Open Access publications [106157]
Upload full text
Use your RU credentials (u/z-number and password) to log in with SURFconext to upload a file for processing by the repository team.