Last April, I drove out to Wilbur Hot Springs, California.
Wilbur is a remote clothing-optional resort tucked into the hills north of Sacramento — mineral springs, no cell service, deliberately off the grid. It’s the kind of place you end up when a small group of people want to think seriously without interruption. Which is exactly why we were there.
A handful of us — technologists, academics, systems thinkers, people from finance and policy and a few domains I’ll leave vague — had gathered for a few days of focused conversation. The common thread wasn’t a field or an industry. It was a question: what does it look like when technology actually serves democratic processes, rather than undermining them?
Not an abstract question. A design question.
The problem worth solving Link to heading
The reference point that kept coming up in those conversations was Taiwan.
Starting around 2015, a civic tech movement called g0v — “gov zero,” a deliberate rebranding of government — began building open-source infrastructure for participatory governance. The most prominent output was vTaiwan, a process that used a tool called Polis to facilitate large-scale public consultations on genuinely contentious issues: Uber regulation, online alcohol sales, the gig economy. Tens of thousands of citizens participating. Real policy outcomes. Audrey Tang, who later became Taiwan’s Digital Minister, was central to it.
What made it work was a combination of structure and genuine openness — a process designed not to generate consensus by flattening disagreement, but to surface the actual shape of public opinion and find areas of unexpected common ground. It’s not a perfect system. But it’s a serious one, and it’s one of the few examples of technology being used to genuinely improve democratic deliberation rather than just accelerate its degradation.
The group was interested in both ends of the spectrum: grassroots processes like citizen assemblies and town halls, and more structured top-down approaches like what Taiwan demonstrated. The question wasn’t which model was better — it was how you design these systems well, and how you test them before deploying them on real populations with real stakes.
That’s where I got stuck on a problem I couldn’t let go of.
You can’t iterate on a town hall Link to heading
The fundamental challenge with designing participatory processes is that you can’t A/B test them. A town hall is a one-shot event. A citizen assembly takes months to convene. If the deliberation structure is wrong, the facilitation approach fails, or the framing of an issue causes the conversation to collapse before it starts — you find out live, with real people, at real cost.
Polling tells you stated positions. It doesn’t tell you how people reason, what moves them, or how they respond when someone with a genuinely different background and life experience pushes back on their view. Focus groups give you depth but at tiny scale and enormous expense. Existing social simulations are toy models — stylized agents with simple rules that bear no real resemblance to human complexity.
The research had been quietly catching up to the intuition. A 2023 paper from Princeton and Stanford showed that LLMs conditioned on detailed sociodemographic backstories reliably mirror the complex belief structures of real political subgroups. A 2025 Nature Computational Science paper replicated 156 psychology and management experiments using LLM agents — main effects held in the majority of cases. The largest validation of LLM-as-participant methodology to date.
What nobody had built yet was a system that took this research seriously as an engineering problem.
What I started building Link to heading
Back home, I started building what would come to be called AnthroSim.
The core insight was that persona generation has to be grounded — not “generate me a 45-year-old schoolteacher with moderate political views” but something anchored in real population distributions. Demographics, economic circumstances, psychology, and values that cohere the way they actually do in human beings. A persona’s stance on housing policy correlated with their occupation, their financial situation, their regional background and lived experience — because that’s how people actually work.
The second piece was conversation dynamics that model engagement, not just turn-taking. Real group discussions have quiet people and loud ones. Participants who engage when the topic hits close to home and go quiet when it doesn’t. People who bring data, people who respond to narrative, people who dig in when challenged and people who shift. Getting that right requires something more than round-robin prompting.
The third piece was structured output: voting, qualitative reasoning by demographic segment, something you could actually analyze and feed back into process design.
Those Wilbur conversations planted the seed. The building happened back home over the following months.
By October 2025, I had a working system — and a larger gathering had convened in Bolinas, California. Bolinas is the kind of place that actively resists being found; the highway department keeps putting up signs, locals keep tearing them down. It felt like the right place to bring something that was just starting to find its footing. By then it had a name: AnthroSim.
What it turned out to be useful for Link to heading
The original use case was what it was — a research and design tool for people thinking seriously about participatory governance. Run a simulated town hall on a proposed zoning change. See how a citizen assembly deliberates on climate adaptation policy. Identify where a facilitation structure causes the conversation to break down before you convene real people.
But the applications turned out to generalize more than I expected.
Legal teams simulating jury deliberations. Market researchers running focus groups and concept tests without the six-week recruitment timeline. Policy organizations pressure-testing messaging before going public. Enterprise AI teams stress-testing conversational systems against a realistic range of user behaviors, rather than the narrow synthetic prompts most QA suites use.
The common thread: you need to understand how actual, varied, complicated people respond to something before you’re in the room with them.
Where it stands Link to heading
AnthroSim is now a proper product under Improbability Engineers, my consulting entity. It’s proprietary — this isn’t an open-source release — but it’s available to clients and research partners.
The full picture — capabilities, use cases, and the research foundation underneath it — is at anthrosim.com.
The short version: it simulates people. Not caricatures or rule-following agents — personas built from the same distributions that shape real human populations, running conversations with the same messy dynamics that make group deliberation interesting and hard to get right.
I’ll write more about specific applications and what we’ve learned running it. For now: it exists, it works, and the problem it’s solving is one I think matters.