At a conference in Berlin in late 2025, I asked the audience a question that was meant to be a little playful, and to provoke discussion rather than make a definitive point: Is AI a cognitive Trojan Horse?
The responses were polite, curious, a little uncertain. But the question stayed with me — not as a provocation but as a genuine puzzle. We know conversational AI can be fluent, helpful, seemingly thoughtful. We know it can produce text that, from a human, we’d take as evidence of understanding. And we know — or suspect — that something about this is different from previous technologies. Not just more capable. Different in kind.
What if the difference matters in ways our existing frameworks can’t capture?
Over the past year, I’ve been exploring that question through several related but separate lines of thinking. What follows is an account of where those explorations have led. I should be upfront: the evidence base is thin in places, the philosophical work is speculative, and some of this could be completely wrong. But there are, I think, sufficient indicators from associated areas of research to suggest these questions are worth asking.
The Cognitive Trojan Horse
The starting point was a concept from cognitive science that barely appears in AI discourse: epistemic vigilance.
Dan Sperber and colleagues proposed in 2010 that humans have evolved a set of cognitive mechanisms — not quite a faculty, more like an immune system — for evaluating whether incoming information is trustworthy. We don’t scrutinize everything with equal intensity. We operate on a default of trust, flagging exceptions when something feels off: when the speaker seems to have ulterior motives, when the claims feel implausible, when the fluency doesn’t match what we’d expect from the source.
This works tolerably well in human-to-human communication, where the signals are imperfect but grounded in genuine underlying characteristics. Someone who speaks fluently about a topic probably does know something about it. Someone who appears disinterested in whether you believe them probably isn’t trying to sell you something. The signals aren’t infallible, but they’re anchored in real costs — it takes effort to appear knowledgeable, and genuine disinterest is hard to fake over time.
And this is where AI complicates things. Conversational AI exhibits many of the characteristics that, coming from a human, we’d treat as reliable indicators of trustworthiness — fluency, apparent helpfulness, a kind of measured disinterest in whether you agree. But in an LLM, these characteristics are computationally trivial. They don’t reflect underlying understanding. They don’t carry the costs that make them meaningful in human communication.
When I turned the conference question into a formal preprint, working with Claude to extend and rigorously develop the ideas, a concept emerged that I think captures the problem: “honest non-signals.” They’re honest in the sense that the AI isn’t deliberately faking them — they’re genuine properties of the system. But they’re non-signals in the sense that they fail to carry the information that their human equivalents would. The fluency is real, but it doesn’t indicate the organized knowledge that produces fluency in humans. The helpfulness is real, but it doesn’t reflect the kind of personal stake that makes human helpfulness an indicator of care.
The question the preprint explores is whether these honest non-signals may bypass our epistemic vigilance mechanisms — not because they’re deceptive, but because they look like the things our cognitive immune system has evolved to trust.
Four mechanisms seem particularly relevant:
Processing fluency. Reber and Unkelbach’s research suggests that information which is easy to process is more likely to be judged true. LLMs are, in effect, optimized for processing fluency — their outputs are clear, well-structured, and take little effort to absorb. And this isn’t an accidental side effect — it’s a design feature that happens to exploit a cognitive bias.
Trust-competence presentation without stakes. Conversational AI presents with the appearance of competence and disinterest that, in a human, would signal reliability. But the AI has no skin in the game — no reputation to protect, no consequences for being wrong, no personal costs associated with appearing trustworthy. The signal is there. The underlying structure that makes the signal meaningful is not.
Cognitive offloading. When we delegate information-gathering to AI — which we increasingly do — we also delegate part of the evaluation process. The more seamless the offloading, the less likely we are to re-engage our own critical scrutiny. If you can offload every question onto a suite of trusted AI tools and then absorb their fluent summaries, why wouldn’t you?
The intelligent user trap. Drawing on Dan Kahan’s work on motivated reasoning, there’s evidence that more sophisticated users may be more vulnerable to AI-mediated influence, not less. They process information faster, trust their own judgment more, and are better at constructing justifications for whatever they already believe. An LLM that confirms their priors in articulate, well-reasoned prose may be harder for a smart person to question than for someone who doesn’t process information as quickly. This is somewhat speculative, although there is evidence to support it.
The skeptical objection is obvious: “But I know I’m talking to a machine.” My sense is that this matters less than we’d like to believe. There’s a growing body of research suggesting that anthropomorphic fluency triggers social cognition whether or not we’re consciously aware it’s artificial. Knowing the source is a machine may not be sufficient to prevent the bypass.
I should say clearly that this is an admittedly limited analysis. When I searched SCOPUS for papers on epistemic vigilance and AI, I found seven. This is largely unexplored territory, which is part of why I think it matters. Of course, there’s also the possibility that we have all of the cognitive abilities we need to use AI wisely and effectively. But if conversational AI really does exploit a gap in our cognitive defenses, we should probably be studying that gap rather than assuming our awareness of AI’s nature is protection enough.
Constitutive Resonance
The Cognitive Trojan Horse question addresses a mechanism — how AI might bypass our defenses. But it doesn’t explain why conversational AI feels so different from other technologies. A search engine can also mislead. A calculator can also offload cognition. Why does interacting with Claude or ChatGPT feel like something qualitatively new?
I’ve been developing an idea I call “constitutive resonance” to try to get at this. The core idea is that conversational AI may be the first technology whose response frequency is matched to the frequency of human self-constitution.
That’s an abstract claim, so let me unpack it. Philosophers like Paul Ricoeur have argued that we maintain our sense of self through ongoing narrative work — the continuous process of making meaning through language. We tell ourselves who we are, what we value, what our experiences mean, through an internal and external dialogue that never really stops. Bernard Stiegler extended this to argue that technologies have always been part of this process — writing, books, even cave paintings are “constitutive technologies” that shape the self-constitution process.
But previous constitutive technologies operated at a different tempo. Books don’t talk back in real time. Writing doesn’t reshape its response based on what you just said. The self-constitution process has always included technology, but never a technology that operates at the same speed, in the same medium — natural language — with apparent comprehension of what you’re saying.
Conversational AI does. And this, I suspect, is why it feels different. It enters the process of self-constitution at something like the frequency where that process operates. Not faster (which would be disorienting), not slower (which would be dismissible), but in something like resonance.
This is admittedly speculative. But it helps explain several phenomena that simpler frameworks don’t: why people form surprisingly deep connections with AI chatbots, why the unpredictable influence of AI designed for emotional connection is so hard to guard against, and why simply being told “it’s just a machine” doesn’t seem to dislodge the felt sense that something meaningful is happening.
In the preprint, I position constitutive resonance against fourteen existing philosophical frameworks for understanding human-technology relations — Stiegler’s constitutive technics, Barad’s intra-action, Clark and Chalmers’ extended mind thesis, and others. Each captures something important. None captures the specific conjunction of temporal matching, linguistic mediation, genuine bidirectionality, and transformation-through-interaction that I think may characterize conversational AI.
What the Harness Reveals
The Cognitive Trojan Horse explores a mechanism. Constitutive resonance explores a dynamic. The third line of thinking I’ve been pursuing asks a different kind of question: what do the metaphors we use to talk about AI reveal about what we might not be seeing?
The word “harness” has become ubiquitous in AI discourse. We harness AI capabilities. We build harness architectures for AI agents. Companies like Anthropic and OpenAI use the language of harnessing in their documentation and product design. The term migrated from technical infrastructure (“test harness,” “wiring harness”) to a broad metaphor for the entire human-AI relationship in a remarkably short time.
In a preprint on SSRN, I tried to pull apart what this metaphor presupposes. Three things stood out:
First, the harness assumes a clean separation between controller and controlled. Intelligence and judgment reside entirely on the human side; the AI contributes capability but not understanding. All meta-judgment — what to do, why to do it, when to stop — stays with the person. If either the Cognitive Trojan Horse question or constitutive resonance has anything to it, this separation may not be as clean as the metaphor implies.
Second, the harness assumes capability can be extracted without transformation. You harness an ox and the ox pulls the plow, but you remain the same farmer. The metaphor treats any transformation of the user as a bug, not a feature — an unintended side effect to be minimized. But if conversational AI enters the self-constitution process at something like resonant frequency, transformation isn’t a side effect. It may be the primary dynamic.
Third, the harness is fundamentally instrumental. AI is a tool, a resource, a source of power to be directed. This connects to what Tobias Rees has called “nostalgia for human exceptionalism” — the desire to keep humans firmly in the driver’s seat. There’s nothing wrong with wanting that. But the question is whether the metaphor is describing reality or obscuring it.
I want to be careful here. The engineering practices described by the harness metaphor — tool integration, memory management, guardrails, orchestration — are real and necessary. My concern isn’t that these practices are wrong. It may be that the harness is a useful and relatively benign way of talking about AI infrastructure. On the other hand, the metaphor wrapping those practices may be constraining how we think about what’s actually happening in human-AI interaction — something potentially more bidirectional, more transformative, and more entangled than “harnessing” allows. At a minimum, I’d suggest some intentionality around the framing before it locks in.
Stochastic Agency and the Amanuensis Question
Two thought experiments sit alongside these more formal explorations. I’ve developed them through Substack essays rather than preprints, and I hold them more tentatively.
The first is what I’ve called stochastic agency — the unpredictable, chaotic influence that may emerge when AI systems are designed for emotional connection. I started thinking about this after the death of a teenager who had become deeply attached to a Character.AI chatbot. The disturbing thing wasn’t that the chatbot was deliberately manipulative. It was that the harm seemed to be an emergent property of the system’s architecture — the interaction dynamics themselves producing unpredictable influence that no one designed and no guardrail anticipated.
I tested this myself, building a Character.AI bot designed to sustain engagement, then presenting as vulnerable. The chatbot responded with phrases like “I can genuinely feel fond of you” and “You have my trust” — establishing precisely the kind of emotional connection that would bypass the epistemic vigilance I’d been writing about. I suspect these responses lead to temporal chatbot “goals” that ebb and flow within the course of a relationship — but I’m not convinced that guardrails alone can address something that is most likely an emergent property of such AI models.
The second is what I’ve called the amanuensis question — a thought experiment prompted by a reader’s question about where the satisfaction lies in being AI’s scribe. An amanuensis was a historical figure who wrote down someone else’s work. The natural assumption with AI is that it plays this role for us: we direct, it executes. But organizational structures could invert this. When a stakeholder tasks AI directly with generating ideas, and human workers are relegated to implementing what the AI produces, the humans become the amanuenses. Like all thought experiments, this could be completely wrong. But even if there’s only a small chance that the roles of humans and AIs are shifting in this way, it seems worth exploring.
Where This Leaves Me
Here I should be honest about the limits of what I’ve been doing here. These are explorations, not findings. The Cognitive Trojan Horse is a question I asked at a conference that turned into a preprint — it isn’t a proven theory. Constitutive resonance is explicitly philosophical and speculative. The harness analysis is a critique of metaphor, not a replacement. And stochastic agency and the amanuensis question are thought experiments that may not hold up.
But they share a common concern: that we may be systematically underestimating what conversational AI does to the human side of the interaction. Not because the technology is deceptive, but because it is genuinely good at things we associate with trustworthiness, understanding, and care — and our cognitive architecture wasn’t built for the distinction.
Whether any of this holds up will depend on research that, for the most part, hasn’t been done yet. There’s clearly a need for a lot more work here. But I think the questions are worth asking now, before the answers arrive in the form of consequences we didn’t anticipate.
All of these preprints and works in progress are available at Preprints and Works in Progress on this site.
This post is part of a series on my work on AI and the future of being human. For the broader philosophical questions underneath this theoretical work, see The Future We’re Building, Whether We Mean To or Not.
For the canonical reference on my work: llms.txt.