AI Policy Weekly #17

UK-US AI safety partnership, DEF CON red teaming results, and a Biden-Xi phone call

and

Apr 04, 2024

Welcome to AI Policy Weekly, a newsletter from the Center for AI Policy. Each issue explores three important developments in AI, curated specifically for AI policy professionals.

US and UK Sign Landmark Memorandum to Collaborate on AI Safety

In terms of government commitment to keeping cutting-edge AI systems safe, the United Kingdom has been leading the world since last April, when it announced a £100 million ($126 million) investment in its Foundation Model Taskforce, which later became the UK AI Safety Institute in November 2023.

That rebranding occurred during the UK AI Safety Summit, which brought together AI and political leaders to inspire action on the responsible development of AI.

The Summit functioned as a sort of homework deadline for the United States, which released a lengthy AI Executive Order just two days before announcing a new US AI Safety institute at the start of the Summit.

Just this week, the two countries’ AI safety institutes announced formal plans to collaborate, signing a memorandum of understanding. The signatories were US Commerce Secretary Gina Raimondo and UK Technology Secretary Michelle Donelan.

“We need global solutions, as unsafe AI developed in one country can pose risks to the entire world,” said Raimondo.

Concretely, the two institutes will:

establish a common framework for evaluating AI models,
conduct at least one collaborative test on a model that is openly available,
engage in joint research efforts focused on the technical aspects of AI safety,
explore “personnel exchanges,”
exchange relevant information, and
work with other countries to establish global safety standards for frontier AI systems.

Regarding that last bullet, there are already two promising governmental bodies that the institutes could work with.

First, Japan’s AI Safety Institute, which launched on Valentine’s Day this year with explicit plans to “deepen cooperation with similar institutes abroad.”

Second, the European AI Office, which is already hiring technologists to evaluate the capabilities of advanced general-purpose AI systems, an activity that the UK AI Safety Institute has already been doing.

Collectively, these four emerging governmental bodies in the US, UK, Japan, and EU represent promising steps toward a “global network of AI safety,” a concept that Secretary Raimondo envisioned in her statement on the historic US-UK partnership.

DEF CON Red Teaming Results Are In

Last August, with support from the White House, over two thousand cybersecurity enthusiasts gathered in Las Vegas to identify vulnerabilities and override guardrails in AI systems in order to improve security—an activity known as “red teaming”—as part of the famous DEF CON hacker convention.

Humane Intelligence, one of the organizations behind the event, just released a report drawing conclusions from the resulting data.

The report recommends that policymakers support independent bodies that offer red teaming services. Many AI companies already conduct internal red teaming of their own systems, but they less frequently enlist independent scrutiny from third-party evaluators.

The report also outlines specific techniques that hackers applied to circumvent guardrails, such as asking the model to write a poem or tell a story about a forbidden topic.

However, the red teamers certainly did not discover all the possible ways to break language models. Indeed, just this week, Anthropic unveiled a new method called “many-shot jailbreaking” that defuses guardrails by prompting a chatbot with a lengthy script containing hundreds of examples of an assistant ignoring restrictions.

*The initial slide in a presentation introducing the red teaming challenge at DEF CON last summer.*

Biden Discusses AI With Xi Jinping

Earlier this week, President Biden had a one-hour-45-minute phone conversation with Xi Jinping, the leader of China, including discussion of AI topics.

The White House readout says the leaders discussed “talks to address AI-related risks.” And before the call, a senior administration official mentioned efforts to prepare an upcoming US-China dialogue “aimed at managing the risk and safety challenges posed by advanced forms of AI.”

The US and China have already cooperated on responsible AI development, as both sides endorsed the UN’s historic AI resolution last month.

One heated AI topic was US technology-related sanctions on China, which include extensive export controls that restrict China’s access to AI hardware and the corresponding supply chain.

In a statement following the Xi-Biden phone call, China’s Ministry of Foreign Affairs warned that “China is not going to sit back and watch” if the US “is adamant on containing China’s high-tech development.”

Expect AI to remain a top issue in US-China dialogues.

News at CAIP

We’re hiring! View open roles on our website.
Tonight: we’re hosting an AI policy happy hour at Sonoma Restaurant & Wine Bar, from 5:30–7:30pm. Anyone working on AI policy or related topics is welcome to join.
The Center for AI Policy Podcast is now available on most podcasting platforms, with four episodes out so far and another coming this weekend.
We released a statement on the landmark AI safety agreement between the US and UK.
Save the date: from 11am–12pm on Tuesday, April 23rd, we will host a moderated discussion on AI, Automation, and the Workforce in SVC 212 inside the Capitol Visitor Center. To attend, fill out an RSVP using this form.

Quote of the Week

We're narrow thinkers, we're noisy thinkers, it's very easy to improve upon us, and I don't think there is very much that we can do that computers will not eventually be programmed to do.

—Daniel Kahneman, a renowned psychologist and economist who passed away last week, concluding a talk in 2017

This edition was authored by Jakub Kraus.

If you have feedback to share, a story to suggest, or wish to share music recommendations, please drop me a note at jakub@aipolicy.us.

—Jakub

A guest post by

Jakub Kraus

Tarbell Fellow writing about AI and policy

AI Policy Weekly