research
Working Papers
- Content vs. Form: What Drives the Writing Score Gap Across Socioeconomic Backgrounds? A Generated Panel Approach with P. Pertusi arXiv Abstract: Students from different socioeconomic backgrounds exhibit persistent gaps in test scores, gaps that can translate into unequal educational and labor-market outcomes later in life. In many assessments, performance reflects not only what students know, but also how effectively they can communicate that knowledge. This distinction is especially salient in writing assessments, where scores jointly reward the substance of students’ ideas and the way those ideas are expressed. As a result, observed score gaps may conflate differences in underlying content with differences in expressive skill. A central question, therefore, is how much of the socioeconomic-status (SES) gap in scores is driven by differences in what students say versus how they say it. We study this question using a large corpus of persuasive essays written by U.S. middle- and high-school students. We introduce a new measurement strategy that separates content from style by leveraging large language models to generate multiple stylistic variants of each essay. These rewrites preserve the underlying arguments while systematically altering surface expression, creating a “generated panel” that introduces controlled within-essay variation in style. This approach allows us to decompose SES gaps in writing scores into contributions from content and style. We find an SES gap of 0.67 points on a 1-6 scale. Approximately 69% of the gap is attributable to differences in essay content quality, Style differences account for 26% of the gap, and differences in evaluation standards across SES groups account for the remaining 5%. These patterns seems stable across demographic subgroups and writing tasks. More broadly, our approach shows how large language models can be used to generate controlled variation in observational data, enabling researchers to isolate and quantify the contributions of otherwise entangled factors.
- Linear Regression in a Nonlinear World PDF arXiv Abstract: The interpretation of coefficients from multivariate linear regression relies on the assumption that the conditional expectation function is linear in the variables. However, in many cases the underlying data generating process is nonlinear. This paper examines how to interpret regression coefficients under nonlinearity. We show that if the relationships between the variable of interest and other covariates are linear, then the coefficient on the variable of interest represents a weighted average of the derivatives of the outcome conditional expectation function with respect to the variable of interest. If these relationships are nonlinear, the regression coefficient becomes biased relative to this weighted average. We show that this bias is interpretable, analogous to the biases from measurement error and omitted variable bias under the standard linear model.
- Polarization by Design: How Elites Could Shape Mass Preferences as AI Reduces Persuasion Costs arXiv Abstract: In democracies, major policy decisions typically require some form of majority or consensus, so elites must secure mass support to govern. Historically, elites could shape support only through limited instruments like schooling and mass media; advances in AI-driven persuasion sharply reduce the cost and increase the precision of shaping public opinion, making the distribution of preferences itself an object of deliberate design. We develop a dynamic model in which elites choose how much to reshape the distribution of policy preferences, subject to persuasion costs and a majority rule constraint. With a single elite, any optimal intervention tends to push society toward more polarized opinion profiles - a “polarization pull’’ - and improvements in persuasion technology accelerate this drift. When two opposed elites alternate in power, the same technology also creates incentives to park society in ``semi-lock’’ regions where opinions are more cohesive and harder for a rival to overturn, so advances in persuasion can either heighten or dampen polarization depending on the environment. Taken together, cheaper persuasion technologies recast polarization as a strategic instrument of governance rather than a purely emergent social byproduct, with important implications for democratic stability as AI capabilities advance.
- The (Short-Term) Effects of Large Language Models on Unemployment and Earnings with D. Chen, C. Kane, A. Kozlowski, and J. A. Evans arXiv Abstract: Large Language Models have spread rapidly since the release of ChatGPT in late 2022, accompanied by claims of major productivity gains but also concerns about job displacement. This paper examines the short-run labor market effects of LLM adoption by comparing earnings and unemployment across occupations with differing levels of exposure to these technologies. Using a Synthetic Difference in Differences approach, we estimate the impact of LLM exposure on earnings and unemployment. Our findings show that workers in highly exposed occupations experienced earnings increases following ChatGPT’s introduction, while unemployment rates remained unchanged. These results suggest that initial labor market adjustments to LLMs operate primarily through earnings rather than worker reallocation.
- Biased AI improves human decision-making but reduces trust with S. Lai, J. Kim, Y. Potter, and J. Evans arXiv Abstract: Current AI systems minimize risk by enforcing ideological neutrality, yet this may introduce automation bias by suppressing cognitive engagement in human decision-making. We conducted randomized trials with 2,500 participants to test whether culturally biased AI enhances human decision-making. Participants interacted with politically diverse GPT-4o variants on information evaluation tasks. Partisan AI assistants enhanced human performance, increased engagement, and reduced evaluative bias compared to non-biased counterparts, with amplified benefits when participants encountered opposing views. These gains carried a trust penalty: participants underappreciated biased AI and overcredited neutral systems. Exposing participants to two AIs whose biases flanked human perspectives closed the perception-performance gap. These findings complicate conventional wisdom about AI neutrality, suggesting that strategic integration of diverse cultural biases may foster improved and resilient human decision-making.
- Measuring (a Sufficient) World Model in LLMs: A Variance Decomposition Framework with J. A. Evans arXiv Abstract: Understanding whether large language models (LLMs) possess a world model-a structured understanding of the world that supports generalization beyond surface-level patterns-is central to assessing their reliability, especially in high-stakes applications. We propose a formal framework for evaluating whether an LLM exhibits a sufficiently robust world model, defined as producing consistent outputs across semantically equivalent prompts while distinguishing between prompts that express different intents. We introduce a new evaluation approach to measure this that decomposes model response variability into three components: variability due to user purpose, user articulation, and model instability. An LLM with a strong world model should attribute most of the variability in its responses to changes in foundational purpose rather than superficial changes in articulation. This approach allows us to quantify how much of a model’s behavior is semantically grounded rather than driven by model instability or alternative wording. We apply this framework to evaluate LLMs across diverse domains. Our results show how larger models attribute a greater share of output variability to changes in user purpose, indicating a more robust world model. This improvement is not uniform, however: larger models do not consistently outperform smaller ones across all domains, and their advantage in robustness is often modest. These findings highlight the importance of moving beyond accuracy-based benchmarks toward semantic diagnostics that more directly assess the structure and stability of a model’s internal understanding of the world.
- Introspective Growth: Automatically Advancing LLM Expertise in Technology Judgment with S. Wu, H. Bao, and J. A. Evans arXiv Abstract: Large language models (LLMs) are rapidly becoming core tools for science, engineering, and innovation. Their promise lies not just in remembering facts, but in putting knowledge to work. Despite their impressive ability to answer increasingly difficult questions, it remains unclear whether LLMs truly use their knowledge when confronted with new and challenging tasks. We address this question with a patent classification task that requires deep conceptual understanding: distinguishing objectively different but semantically similar patents. To evaluate this approach, we introduce a challenging new benchmark of 1.3 million post-2015 computer science patent pairs, characterized by dense technical jargon and strategically complex writing. We find that LLMs often fail our benchmark and struggle to distinguish among semantically similar patents. To probe this failure, we introduce a novel framework that decomposes model errors into two sources: missing and unused knowledge. Our approach asks models to generate clarifying questions to improve their understanding, and then compares three settings: raw performance, self-answered questions, and externally supplied answers. This decomposition reveals that LLMs often possess the relevant knowledge internally but fail to deploy it, while a smaller share of errors arises from genuine knowledge gaps. We then ask whether the ability of models to construct a task-specific database of questions and answers differs across models. We find that smaller models generate simpler, broadly transferable questions, while larger models propose more complex but less generalizable ones. This suggests new strategies for combining strengths across models. Our findings highlight a critical limitation of current LLMs and their evaluation: models often know more than they can use. LLM evaluation should shift from recall of static facts to application of dynamic knowledge.
- Bridging the Gap: Information, Returns and Choices PDF Abstract: How much of the gap in choices across social groups is driven by differences in returns or the ability to predict these returns? To address this question, we employ a decomposition exercise, based on a structural model, to quantify the roles of information quality and differences in returns in driving this gap, focusing on the college attendance decisions of White and Hispanic high school students in Texas. We find that the average monetary returns from college are almost zero for Hispanics, in contrast to being high for Whites. We then estimate the extent to which differences in returns and information quality contribute to the gap in choices. Our findings indicate that differences in information quality across the two groups help mitigate the choice gap, whereas differences in returns drive the gap. Finally, we use our model to show that achieving parity in choice between the two groups would require policymakers to provide highly accurate additional information, potentially explaining between 24\% and 49\% of the variance in post-college earnings.
- It’s Not Who You Are, It’s What They Know: Wage Gaps and Informational Frictions PDF Abstract: Can informational asymmetries among firms account for all observed wage gaps across social groups? We confirm this through a parsimonious common-value auction model in the labor market with unspecified information structures. Firms with identical characteristics encounter workers with unobserved productivity and extend wage offers based on their information about worker productivity and competing offers. Using 2010 American Community Survey data, we show that wage disparities among both Black and White men and women can be explained using a common productivity distribution for all social groups and differences in what firms know, if the mean of this common productivity distribution ranges between $48,000 and $132,800. Our results emphasize the importance of understanding what firms know in shaping wage distributions and explaining wage disparities
- Uncovering Latent Types in Sequential Choice Data Using Text Embedding Algorithm PDFAbstract: In economic analyses of agents making a series of discrete choices, deciding what constitutes an alternative is crucial. This paper introduces a technique for categorizing similar alternatives in contexts where forward-looking agents make a series of decisions. The proposed method groups options that are equivalent from the perspective of the agents, using the renowned word2vec algorithm (Mikolov et al., 2013b, Mikolov et al., 2013a) from the Natural Language Processing literature. The paper discusses the link between the word2vec method and the underlying dynamic optimization problem of the agent.
Work in Progress
- Quantifying Uncertainty over the Lifecycle SlidesAbstract: We examine the welfare implications of income uncertainty, specifically its differential impact across social groups. Leveraging a new lifecycle metric for uncertainty costs, we compare utility outcomes from both expected and optimal consumption profiles under certainty. To perform this analysis, we employ a new approach that uses a Generative AI model (Normalized Flow) for the estimation and simulation of future consumption and income trajectories. Utilizing comprehensive household survey data from India, our findings reveal small but persistent disparities in uncertainty costs across different castes, under the assumption of homogeneous utility functions. The study suggests that, in the absence of preference heterogeneity, income-to-welfare mapping may be adequately performed without considering uncertainty.
- Integrating Minority Perspectives: An Analysis of Women’s Induction into the Field of Economics
- Can complementarity explain path dependence in innovation? Evidence from the secondary market of patents
Publications
- On the Interpretation of the Intergenerational Elasticity and the Rank-Rank Coefficients for Cross Country Comparison. Economics letters (2024) PDF ProofsAbstract: This paper investigates Intergenerational Elasticity (IGE) and Rank-Rank coefficients, employing Yitzhaki’s theorem (Yitzhaki, 1996) to express them as weighted averages of underlying causal mechanisms driving mobility. We highlight the challenges of interpreting cross-country comparisons using IGE or Rank-Rank coefficients due to the regression weighting scheme. We also show that, while the Rank-Rank coefficient is more interpretable for positional mobility, it lacks insights into the underlying mechanisms driving mobility across countries. The analysis demonstrates potential drawbacks of using linear regression coefficients as summary statistics in the context of intergenerational mobility comparisons.
- Network-Mediated Knowledge Spillovers in ICT/Information Security. Review of Network Economics (2021) with Neil Gandal and Lee Branstetter LinkAbstract:: A large literature has used patent data to measure knowledge spillovers across inventions but few papers have explicitly measured the impact of the collaboration networks formed by inventors on the quality of invention. This paper develops a method to measure the impact of collaboration networks of inventors on invention quality. We apply this methodology to the information and communication technology (ICT) and information security sectors in Israel and find that the quality of Israeli inventions are systematically linked to the structure of the collaborative network in these sectors.
- The High-Tech Sector, Chapter 17, The Israeli Economy, 1995–2017: Light and Shadow in a Market Economy. with Neil Gandal and Stefania Gandal Link