Multi-method annotation of theses in management and data validation
Abstract
Our project investigates gendered club effects in the structure of French PhD committees in management sciences. Using data from Theses.fr enriched with IdRef, we construct a panel of theses defended between 2011 and 2023 that links doctoral candidates, supervisors, and jurors. The dataset includes inferred gender, academic age, institutional affiliation, and jury participation history. In this seminar, I will present a question that we asked ourselves: "Are the clubs of researchers that we identify in our works produced by gender- or topic-related effects". To answer this question, we will dive into the methodology design, implementation and, most importantly, hours of struggle that we needed to produce a simple dataset of automatically annotated texts.
About this workshop
The aim of this workshop is to promote technical and practical exchanges between researchers who use NLP methods. There is no hesitation in detailing the code (r/python), sharing tips, and discovering new methods and models.
Periodicity: Thursdays from 12h15 to 13h30, by videoconference.