Past Session
Monday, July 7, 2025
17:30h
Presented by
Prashant Garg (Imperial College Business School)
https://www.prashantgarg.org/

Retrieving and Generating Data using LLMs

Abstract

Large language models (LLMs) can structure big data that either available or unavailable to researchers. I clearly distinguish two core approaches: retrieval and generation. Retrieval involves leveraging LLMs to extract structured information from extensive document corpora provided by user, while generation taps directly into the model's internal knowledge to produce structured outputs from minimal inputs. I will present papers that use these methods both separately: i) document-based retrieval techniques for extracting key data points and causal relationships from long and messy documents (e.g., www.causal.claims), alongside (ii) generative methods to construct granular production networks (e.g., aipnet.io) and comprehensive keyword dictionaries for political text classification (e.g., www.academicexpression.online). Additionally, I'll cover principles I've encountered regarding the LLM pipeline to effectively scale and adapt these tools to wide-set of research opportunities.

About this workshop

The Public Governance workshop is an online seminar series focused on state of art research in political economy that uses non-traditional data and data-intensive methods.

The workshop gives a platform for the research on the role of governance in designing and developing better policies. Key features are the political environment, the role of the media, the engagement of stakeholders such as civil society and firms, the market structure and level of competition, and the independence of public regulators, among others. Particular emphasis is placed on research with NLP methods due to the proven usefulness of transforming text into data for further econometric analysis.

Periodicity: Mondays from 17h30 to 19h.