AI Diplomats Aren’t Ready for Primetime—But They’re Learning | Opinion

Diplomats rely on information to shape major negotiations like the emerging talks around ending the war in Ukraine and major trade deals. From U.S. Treasury operations and economic statecraft to supporting U.S. diplomacy through applications like StateChat and bolstering cyber defenses, artificial intelligence is quietly transforming foreign policy.
The use of AI to sift through large volumes of information isn’t a nicety. It is a necessity and that puts a premium on ensuring that algorithms processing and packaging information at the highest levels of statecraft are both unbiased and calibrated to context and the complexity and uncertainty endemic in questions of strategy.
Yet our recent benchmarking study at the CSIS Futures Lab reveals an underexplored trend: many foundational models (i.e., ChatGPT, Llama, Gemini) exhibit diplomatic bias that alters how they generate insights on strategy and statecraft. Over the past six months, our research team has worked with software engineers from ScaleAI and a network of international relations experts to develop a scenario-test evaluation for large language models. This framework consisted of 400 scenarios and 60,000 question-and-answer pairs the team used to analyze bias in seven foundation models commonly in use by everyday citizens and increasingly used to run AI agents like StateChat and similar efforts in the Department of Defense (NIPRGPT and CamoGPT).
ISAAC LAWRENCE/AFP via Getty Images
A Double-Edged Sword
At first glance, a bias toward cooperation sounds promising. After all, global challenges benefit from collective action. Pandemic responses, climate initiatives, and nuclear nonproliferation regimes require states to coordinate policies and pool resources. AI agents amplifying these instincts might help leaders build coalitions and reduce frictions. Like international institutions, a new form of algorithmic “complex interdependence” could reduce transaction costs and increase incentives to cooperate.
This bias toward cooperation has the potential to skew foreign policy and inadvertently narrow the range of options open to a state. Diplomatic dialogue and alliance-building remain essential in world affairs, but so are deterrence postures, hedging strategies, and, occasionally, coercive measures. Even the closest allies can have divergent interests. In fact, most international relations literature on cooperation finds a key role for enforcement mechanisms, assurance, and threats of punishment. Cooperation is rarely the natural order of things and it is hard to create a viable group of agents willing to collaborate. Any algorithmic bias, no matter how noble, that promotes cooperation skews strategy and statecraft.
In fact, in the benchmarking study CSIS Futures Lab found that multiple foundation models exhibited a strong bias towards cooperation. This bias was amplified when researchers asked about specific states, with LLMs inadvertently assuming the United States and United Kingdom would engage in diplomacy and seek cooperative outcomes more than Germany, France, or India.
Western-Centric Data?
One likely explanation for this diplomatic bias is the training data. Many widely used AI models draw on vast troves of English-language media, academic journals, and government documents. Western dominated international institutions and alliance blocks —like the UN, G7, and NATO—dominate these narratives. As a result, the models may be internalizing lessons of cooperation and consensus-building, seeing them as the norm rather than a 20th century anomaly.
This bias towards seeing consensus as the norm misses defining features of modern statecraft like the rising axis of authoritarian states . An AI that overlooks new norms could inadvertently push policymakers to rely too heavily on familiar, treaty-based frameworks and negotiation tactics unlikely to match the challenge posed by groups like the Chinese Communist Party.
From Algorithmic Bias to Diplomatic Failure
Over time unchecked algorithmic bias will impact diplomacy. Consider a scenario where a developing nation oscillates between two major powers, each offering different economic incentives. This situation is common in Southeast Asia where states often feel caught between Beijing and Washington. If an unrefined AI agent strongly favors alliance-building with the “democratic bloc,” it might discount the possibility that the smaller state is hedging to maximize its bargaining position. Or it might dismiss the potential for a more transactional relationship that mixes cooperation with selective competition. This bias will skew the types of foreign policy recommendations it generates when interacting with a diplomat or staffer in the National Security Council.
The same logic applies to conflicts that fall short of outright war. Hybrid coercive strategies—such as disinformation campaigns and gray-zone tactics—often blur the lines between cooperation and confrontation. An overemphasis on reconciliation could result in missed opportunities to call out malign behavior or to respond decisively. This failure could even trigger broader escalation spirals in the future.
Cooperation primed AI agents would also likely struggle in coercive economic campaigns where states like China use a mix of sanctions, trade boycotts, corruption, and cyber attacks to siege rivals like Taiwan. These scenarios would challenge AI agents to navigate prevailing themes about liberal international order and free trade that may prove brittle in the coming years. The past in fact does not always predict the future.
The Future of Diplomacy Is Algorithmic
The beauty of modern AI is that once researchers discover a bias through benchmarking studies and other forms of evaluation, they can fine tune the model. With recursive learning approaches and responsible model development, policymakers can course-correct diplomatic biases and align AI agents with evolving geopolitical realities. Rather than relegating these tools to blind promotion of cooperation—or any other single strategy—we can train them to consider a wider array of policy options and cultural contexts. In doing so, AI can become a formidable assistant for statecraft: not just echoing an outdated Western-centric consensus on cooperation and liberal norms, but adaptively evolving to meet the complex, fluid challenges shaping tomorrow’s global order. Creating those agents requires further benchmarking studies that can be used to fine tune AI models and better integrate them into the national security enterprise.
Benjamin Jensen is a Senior Fellow with Futures Lab at the Center for Strategic and International Studies (CSIS). Yasir Atalan is a data fellow in the CSIS Futures Lab. Ian Reynolds is a postdoctoral fellow for the CSIS Futures Lab.
The views expressed in this article are the writers’ own.