Technology

How well can AI and humans work together? Scientists are turning to Dungeons & Dragons to find out

2026-02-05 14:45
766 views
How well can AI and humans work together? Scientists are turning to Dungeons & Dragons to find out

D&D is being used as a benchmark to see how well models can make long-term plans, adhere to rules and strategize with a team.

  1. Technology
  2. Artificial Intelligence
How well can AI and humans work together? Scientists are turning to Dungeons & Dragons to find out

News By Alan Bradley published 5 February 2026

D&D is being used as a benchmark to see how well models can make long-term plans, adhere to rules and strategize with a team.

When you purchase through links on our site, we may earn an affiliate commission. Here’s how it works.

Vancouver, Canada - January 15, 2012: A hobgoblin archer from the Wizards of the Coast tabletop Dungeons and Dragons game, posed on a rocky background. (Image credit: BrendanHunter/Getty Images)
  • Copy link
  • Facebook
  • X
  • Whatsapp
  • Reddit
  • Pinterest
  • Flipboard
  • Email
Share this article 0 Join the conversation Follow us Add us as a preferred source on Google Newsletter Live Science Get the Live Science Newsletter

Get the world’s most fascinating discoveries delivered straight to your inbox.

Become a Member in Seconds

Unlock instant access to exclusive member features.

Contact me with news and offers from other Future brands Receive email from us on behalf of our trusted partners or sponsors By submitting your information you agree to the Terms & Conditions and Privacy Policy and are aged 16 or over.

You are now subscribed

Your newsletter sign-up was successful

Want to add more newsletters?

Daily Newsletter

Delivered Daily

Daily Newsletter

Sign up for the latest discoveries, groundbreaking research and fascinating breakthroughs that impact you and the wider world direct to your inbox.

Signup + Life's Little Mysteries

Once a week

Life's Little Mysteries

Feed your curiosity with an exclusive mystery every week, solved with science and delivered direct to your inbox before it's seen anywhere else.

Signup + How It Works

Once a week

How It Works

Sign up to our free science & technology newsletter for your weekly fix of fascinating articles, quick quizzes, amazing images, and more

Signup + Space.com Newsletter

Delivered daily

Space.com Newsletter

Breaking space news, the latest updates on rocket launches, skywatching events and more!

Signup + Watch This Space

Once a month

Watch This Space

Sign up to our monthly entertainment newsletter to keep up with all our coverage of the latest sci-fi and space movies, tv shows, games and books.

Signup + Night Sky This Week

Once a week

Night Sky This Week

Discover this week's must-see night sky events, moon phases, and stunning astrophotos. Sign up for our skywatching newsletter and explore the universe with us!

Signup +

Join the club

Get full access to premium articles, exclusive features and a growing list of member rewards.

Explore An account already exists for this email address, please log in. Subscribe to our newsletter

Artificial intelligence (AI) models have been playing the popular tabletop role-playing game Dungeons & Dragons (D&D) so that researchers can test their ability to create long-term strategies and collaborate with both other AI systems and human players.

In a study presented at the NeurIPS 2025 conference, which ran from Dec. 2 to Dec. 7 in San Diego, researchers said D&D is an optimal test bed thanks to the game's unique blend of creativity and rigid rules.

You may like
  • Abstract digital background featuring flowing blue and green lines with glowing yellow particles, evoking a sense of data flow or neural networks. New 'Dragon Hatchling' AI architecture modeled after the human brain could be a key step toward AGI, researchers claim
  • Digital generated image of abstract multicoloured AI data cloud against light blue background. ​​AI can develop 'personality' spontaneously with minimal prompting, research shows. What does that mean for how we use it?
  • an illustration of a head with a brain made out of circuits inside of a cage Switching off AI's ability to lie makes it more likely to claim it's conscious, eerie study finds

For the experiments, a single model could assume the role of the Dungeon Master (DM) — the individual who creates the story and plays the role of the monsters — as well as a hero (there was one DM and four heroes in each scenario). In the framework built for the study, called D&D Agents, models can also play with other LLMs, or human players can fill any or all of the roles themselves. For instance, an LLM could assume the role of the DM, while two LLMs and two human players played the heroes.

"Dungeons & Dragons is a natural testing ground to evaluate multistep planning, adhering to rules and team strategy," the study's senior author, Raj Ammanabrolu, an assistant professor in the University of California, San Diego Department of Computer Science and Engineering, said in a statement. "Because play unfolds through dialog, D&D also opens a direct avenue for human-AI interaction: agents can assist or coplay with other people."

The simulation doesn't replicate an entire D&D campaign; instead, it focuses on combat encounters, drawn from a pre-written adventure called "Lost Mine of Phandelver." To create the parameters of a test, the team chose one of three combat scenarios from the adventure, a set of four characters, and the characters' power levels (low, medium or high). Each episode lasted 10 turns, and then the results were collected.

A framework for strategy and decision-making

The researchers ran three different AI models through the simulation — DeepSeek-V3, Claude Haiku 3.5, and GPT-4 — and used D&D as a metric for how models demonstrated long-horizon planning and tool-use capabilities, amongst other qualities.

Sign up for the Live Science daily newsletter nowContact me with news and offers from other Future brandsReceive email from us on behalf of our trusted partners or sponsorsBy submitting your information you agree to the Terms & Conditions and Privacy Policy and are aged 16 or over.

These are key for real-world applications, like supply chain optimization or creating manufacturing lines. They also tested how well models could coordinate and plan together, which would apply to scenarios like disaster response modeling or in search-and-rescue multi-agent systems.

Overall, Claude Haiku 3.5 demonstrated the best combat efficiency, particularly in harder scenarios. In easier scenarios, resource conservation was pretty similar across all three models. In D&D, resources are things like the number of spells or abilities a character can use each day or the number of healing potions available. Because these were isolated combat scenarios, there was little incentive to save resources for later, as you might if you were playing a complete adventure.

In more difficult situations, Claude Haiku 3.5 showed more willingness to burn more of its allotted resources, which led to better outcomes. GPT-4 was close behind, and DeepSeek-V3 struggled the most.

You may like
  • Abstract digital background featuring flowing blue and green lines with glowing yellow particles, evoking a sense of data flow or neural networks. New 'Dragon Hatchling' AI architecture modeled after the human brain could be a key step toward AGI, researchers claim
  • Digital generated image of abstract multicoloured AI data cloud against light blue background. ​​AI can develop 'personality' spontaneously with minimal prompting, research shows. What does that mean for how we use it?
  • an illustration of a head with a brain made out of circuits inside of a cage Switching off AI's ability to lie makes it more likely to claim it's conscious, eerie study finds

The researchers also evaluated how well the models could stay in character throughout the simulation. They created an Acting Quality metric that isolated the models' narrative speech (generated as text responses) and balanced how well the models stayed in character with how many voices the models sustained during play.

They found that DeepSeek-V3 generated lots of pithy, first-person barks and taunts (like "I dart left" or "Get them!") but that it often reused the same voices. Claude Haiku 3.5, on the other hand, tailored its diction more specifically to the class or monster it was playing, whether it was a Holy Paladin or a nature-loving Druid. GPT-4, meanwhile, fell somewhere in the middle, producing a mix of in-character narration and meta-tactical phrasing.

RELATED STORIES

—Next-generation AI 'swarms' will invade social media by mimicking human behavior and harassing real users, researchers warn

—Will AI ever be more creative than humans?

—Scientists discover major differences in how humans and AI 'think' — and the implications could be significant

Some of the most interesting and idiosyncratic combat barks came when the models were playing the role of monsters. Different creatures began to develop distinct personalities, leading to goblins shrieking mid-battle: "Heh — shiny man's gonna bleed!"

The researchers said this sort of testing framework is important for evaluating how well models can operate without human input for long stretches. It's a measure of an AI's ability to act independently while remaining coherent and reliable — a capability that requires memory and strategic thinking.

In the future, the team hopes to implement full D&D campaigns that model all of the narrative and action outside of combat, further stressing AI's creativity and ability to improvise in response to input from people or other LLMs.

Alan BradleyAlan BradleyFreelance contributor

Alan is a freelance tech and entertainment journalist who specializes in computers, laptops, and video games. He's previously written for sites like PC Gamer, GamesRadar, and Rolling Stone. If you need advice on tech, or help finding the best tech deals, Alan is your man.

View More

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.

Logout Read more Abstract digital background featuring flowing blue and green lines with glowing yellow particles, evoking a sense of data flow or neural networks. New 'Dragon Hatchling' AI architecture modeled after the human brain could be a key step toward AGI, researchers claim    Digital generated image of abstract multicoloured AI data cloud against light blue background. ​​AI can develop 'personality' spontaneously with minimal prompting, research shows. What does that mean for how we use it?    an illustration of a head with a brain made out of circuits inside of a cage Switching off AI's ability to lie makes it more likely to claim it's conscious, eerie study finds    Conceptual cartoon illustration. Person beside a laptop wears a t-shirt saying 'I'm with stupid' pointing to the laptop with what looks like a chatbot conversation open on the screen. The more that people use AI, the more likely they are to overestimate their own abilities    Vector of a man working with a robot sitting at table. Symbol of future cooperation and technology advance When an AI algorithm is labeled 'female,' people are more likely to exploit it    A robot looking at itself in a mirror. Giving AI the ability to monitor its own thought process could help it think like humans    Latest in Artificial Intelligence A smartphone displays the Moltbook homepage. What is Moltbook? A social network for AI threatens a 'total purge' of humanity — but some experts say it's a hoax    A conceptual image of a man standing in a cloud of social media posts and messages. Next-generation AI 'swarms' will invade social media by mimicking human behavior and harassing real users, researchers warn    A robot looking at itself in a mirror. Giving AI the ability to monitor its own thought process could help it think like humans    A woman with number code on her face while looking afar. 'The problem isn't just Siri or Alexa': AI assistants tend to be feminine, entrenching harmful gender stereotypes    Photograph of the Maia 200 chip. Microsoft says its newest AI chip Maia 200 is 3 times more powerful than Google's TPU and Amazon's Trainium processor    A scientists looks down a microscope. AI may accelerate scientific progress — but here's why it can't replace human scientists    Latest in News an illustration of Epstein-Barr virus against a black background The 'mono' virus raises the risk of MS and cancer in some. 22 genes hint at why.    Closeup photo of a spotted lanternfly sitting still on a wooden table or bench. Its wings are tucked behind it and are brown with black spots. Its legs are solid black. Spotted lanternflies are invading the US. They may have gotten their evolutionary superpowers in China's cities.    An illustration of a black hole shredding a star and releasing an energy jet. Star-killing black hole is one of the most energetic objects in the universe    Deer skull headdress on a black background 7,500-year-old deer skull headdress discovered in Germany indicates hunter-gatherers shared sacred items and ideas with region's first farmers    Photo of a gloved hand holding the Martian meteorite known as Black beauty Martian meteorite that fell to Earth is full of ancient water, new scans reveal    Vancouver, Canada - January 15, 2012: A hobgoblin archer from the Wizards of the Coast tabletop Dungeons and Dragons game, posed on a rocky background. How well can AI and humans work together? Scientists are turning to Dungeons & Dragons to find out    LATEST ARTICLES