Skill distance and job transitions of unemployed workers after a training program
Abstract
Machine learning techniques apply to relations between objects (e.g. job offers) or categories of objects (e.g occupations). Social sciences often investigate category-to-category relations (e.g. is one occupation more prestigious than another?), whereas supervised learning applies mostly to object-to-category relations (e.g. which occupation is that offer about?). In this talk, I show how the latter can contribute to the former. First, I present the methodology of Frick et al. (2025) to obtain a meaningful representation of occupations in terms of skills from a corpus of job offers. Predicting the occupation of an offer serves as a pretext task to train simultaneously a position for the occupations in a 20-dimensional space and a function mapping job offers to that space. Second, I discuss the quantitative and qualitative validation of their representation. The resulting distance between occupations generalizes better than available alternatives based on structured fields. Third, I investigate the mathematical properties of their methodology. I study the type of category-to-category relations and metrics that can be rationalized with this representation. I also relate cosine distances between categories trained using the pretext task and misclassification errors in the training dataset.
About this workshop
The aim of this workshop is to promote technical and practical exchanges between researchers who use NLP methods. There is no hesitation in detailing the code (r/python), sharing tips, and discovering new methods and models.
Periodicity: Thursdays from 12h15 to 13h30, by videoconference.