Disagreements, Fast And Slow

Henry Farrell explains how I’m mistaken — and I’m loving it

J. Nathan Matias
8 min readApr 6, 2024

A few weeks ago, I critiqued an article that tried to explain online toxicity. I’m delighted to share that Henry Farrell, one of the article’s authors, has published a post explaining where I’m mistaken and also where there are legitimately-open questions.

Marble jar photograph CC-BY by Darrenn Tunnicliff

The most extraordinary part of this exchange is that Henry was generous and kind enough to disagree with me on the Internet. I’ll admit that one of the great disappointments of my career as a professor has been how uncommon it is for people to constructively disagree. People on the Internet sometimes threaten my life for studying gender-based harassment. Grumpy professors will send me (or my editors) nit picks for suggesting that universities in democracies might need to serve all the nation’s people. But substantive disagreements that yield new insights? They are very rare in my worlds.

If I had to wager a guess, I would blame the pandemic, burnout from the replication crisis in science, and the intense pressure faced by precarious scholars to maximize papers and citation counts. It’s simply too risky and exhausting to have a disagreement in 2024, especially when scholarly differences can escalate into something bigger with a real-time global audience. But the world of ideas is much poorer for it.

I also appreciate Henry’s considered response because I was writing fast and misunderstood a few things (you can read the post here). At a high level, I was questioning two things about the short article: the use of the concept of toxicity, and the specific way that it was modeled. I think more clarity there is possible now.

Toxicity

I think Henry and Cosma probably stepped unawares into the toxicity conversation, given that this is one of the areas that they’re planning to adjust in the future. For those who aren’t aware, computer scientists really love benchmarks. If someone gives us a way to quantify something, we as a field have a history of running with it and trying to optimize it without looking at the benchmark with the criticality of science (even if there are many exceptions).

Consider for example the case of Microsoft’s “Sparks of Artificial General Intelligence” paper from 2023 (version 1), which cites race science in the eugenics tradition in the opening paragraph’s definition of intelligence. The paper’s authors adjusted the article a month later after people pointed out the discredited science behind their definition of intelligence. In the kindest interpretation of this situation, fourteen of the world’s leading AI researchers were so uninformed about the history of eugenics in the study of intelligence that they mistakenly borrowed whatever they saw on Google for their paper’s introduction without looking twice.

I tend to believe this generous interpretation of the Microsoft story. After all, I get emails from AI researchers just days before a deadline asking me for references on complex social and ethical issues that take months or years to understand — so they can plug in a paragraph to acknowledge the issue after the project is over. I saw this all the time with my work on gender inference as a graduate student. I spent years writing about the ethics and privacy risks of automated gender inference, and then watched so many technologists copy/paste my code without any apparent attention to the risks I had so carefully written about.

Something similar may be happening with the concept of “Toxicity.” Even though this word has no scientific meaning and its pragmatic meaning is deliberately ambiguous, multiple companies have created automated tools to measure so-called “toxicity.” Many busy scientists are using these tools for research, even though the measures have no agreed upon meaning, are constantly changing, and cannot be reproduced. The authors of a recent peer reviewed article explain their decision as one of convenience:

The efficacy and constraints of current machine-learning-based automated toxicity detection systems have recently been debated11,35. Despite these discussions, automated systems are still the most practical means for large-scale analyses.

I get it — one of the hardest challenges in science involves being clear about what we can’t currently know — even if that high-demand knowledge could benefit society. That’s a hard situation for scientists, and even harder for a field like computing with (some) founding figures who argued against reliable science as a goal in favor of inventing things and solving problems.

The problem of technologists running after flawed definitions is a longstanding challenge, described in Philip Agre’s seminal 1997 paper on artificial intelligence, “Toward a Critical Technical Practice.” Agre found that the more he questioned this pattern, the less people thought of him as a computer scientist — a hypothesis consistent with more recent large-scale meta-scientific research. This pattern is what makes caring about the science so exhausting: even if you convince thousands of people to think differently, there will always be others who move forward quickly in an unconsidered way, growing their status with fast papers that rack up a lot of citations. I don’t have an answer to it.

So I’m glad to see that Henry and Cosma are rethinking their use of the term “toxicity,” since they may not have been aware of this wider pattern in computing or the growing critical mass around vague definitions of toxicity. I would hate to see them spend the next decade regretting how some technologists treated their conversation-provoking article as an API to borrow without further consideration.

Models

Henry is right that I didn’t fully understand the models in their paper, and I’m grateful that he took the time to further explain. We all agree that simple models are important tools for thinking. Where I misunderstood their study was in the specification of the outcome variable. I’m not sure I still fully understand, but I’m going to try to summarize.

Henry and Cosma are setting up a counter-factual — trying to imagine what the world would look like without the kind of “engagement maximizing” systems that power Facebook’s news feed, advertising platforms, and even the technologies used by advocacy organizations (which I have studied with the Upworthy Archive).

I think one of the reasons they turn to simulation is that they’re interested in something that empirical social scientists would struggle to reliably measure — what they “rationalizations.” These rationalizations are information, and also psychological, and also related to the structure of arguments, and also the structure of discourse and groups. In their model, groups form around rationalizations as people adopt rationalizations and also use them to sort themselves.

In my first several reads of the paper, I thought Henry and Cosma were interested in the average size of groups that form through the process they are simulating. I think that’s because their simulation explanation concludes with the statement that “Under plausible assumptions, previously dominant interfaces (such as search) can explain people’s separation into self-reinforcing bubbles.”

But on a closer read and after reading Henry’s follow-up post, I think they’re arguing about the distribution in the size and structure of the groups — the mix of large and small groups that result. They’re not arguing that the stones in the jar are too small; they’re arguing that the mix of large and small stones makes it impossible to pour without breaking. And as Henry explains, it’s not just size — it’s structure: they believe that interactions between this mix of small and large groups enables conflicts to escalate.

Overall, I think the main source of my misunderstanding may be that Henry and Cosma are using simulation as a clever way to call a bluff: to critique a techno-determinist argument about the role of technology in society, they are using simulations — which could be described as a form of probabilistic determinism. Reading Henry’s follow-up explanation of the work of the philosopher Cailin O’Connor, I think I fell for the trap of working too hard to (incompletely) understand their simulation and too little to understand the argument they hoped their simulation would open the door to.

Easter Sunday on the road to the Finger Lakes National Forest

Democracy

Last Sunday as I rode my bicycle near the Finger Lakes National Forest and tried not to get too distracted by the work I’m doing to support faculty of color under threat, I listened to an essay by Marilynne Robinson about what it means to believe in democracy. I found myself transfixed by a Walt Whitman quote she shared in the preface (from his work Democratic Vistas (1871)):

[Democracy] is a great word, whose history, I suppose, remains unwritten, because that history has yet to be enacted.

In Protestant Christianity, we talk often about things that are “already, but not yet.” And so I was grateful to Robinson (one of the great Protestant writers about democracy) for the reminder that democracy is also one of those things — a truth that Americans of color have painfully and hopefully borne and continue to bear. Because democracy is still an incomplete idea, it’s hard to write about, hard to simulate, and especially hard to observe empirically in lived experience. Yet because democracy matters to the survival, flourishing, and dignity of our world’s people, we need creativity in how we imagine and enact this elusive word.

So overall I do agree with Henry in his follow-up post that one of the most important questions is to arrive at “an understanding of democracy that is both (a) more just and egalitarian, and (b) stable against urgent threats, which do include polarization.” And I look forward to an ongoing conversation that I know I will learn from, whether or not it continues along this track.

Disagreements, Fast and Slow

The best part of this exchange has been the opportunity to have this conversation — and to have it in public.

We live in a time where academics face tremendous pressures that disincentivize thoughtful, graceful disagreement. The last fifteen years of social media have given us few opportunities or incentives to practice the art of being slow and thoughtful online, and patterns of online moral outrage do not reward considered conversation. And even without engagement-maximizing social media, I’ll admit that as a junior scholar, I feel tremendous pressures to publish more papers and rack up citations. Just writing this post, I can hear imaginary versions of my mentors (whether or not they would actually say this) whispering “why are you doing this instead of getting out that next paper?”

For now, my answer is that the calling of professors is to contribute to human understanding and steward it well for the generations to come. That only works when at least some of us are willing to model that endeavor in public, especially when we disagree— with the faith that other people will engage out of an interest in us as fellow humans and our shared fate.

At a time when that feels rare, I’m grateful to Henry for this generous gift.

--

--

J. Nathan Matias

Citizen social science to improve digital life & hold tech accountable. Assistant Prof, Cornell. citizensandtech.org Prev: Princeton, MIT. Guatemalan-American