AI Policy Weekly #29

Anthropic’s Claude 3.5, Stability AI’s lifeline, and Congress’s AI Public Awareness and Education Campaign Act

and

Jun 27, 2024

Welcome to AI Policy Weekly, a newsletter from the Center for AI Policy. Each issue explores three important developments in AI, curated specifically for US AI policy professionals.

Claude 3.5 Sonnet Extends the Frontier of AI Capabilities

In March 2023, OpenAI released GPT-4, a successor to ChatGPT that used nearly ten times as much computation during training. Since then, OpenAI has continued improving the model, with updates such as GPT-4 Turbo and GPT-4o.

Since 2023—and arguably as far back as 2019—the GPT lineage has maintained a dominant position in the field of general-purpose AI models.

While many models are close behind, Google’s Gemini and Anthropic's Claude stand out as the two main rivals to GPT.

Last week, Anthropic released a new system, Claude 3.5 Sonnet, that has arguably unseated GPT-4o as the world’s best AI chatbot.

First, there are results from Anthropic’s internal usage of the model. The model scored almost twice as well as the previous top Claude model on an “internal agentic coding evaluation.”

The test provided the AI with a codebase and instructions for a code change, such as fixing a bug or adding a feature. Claude 3.5 Sonnet had to implement these changes across multiple files without seeing the tests used to verify its work, simulating real-world software engineering.

Accordingly, Anthropic’s AI researchers are already using the model to assist in their work. One engineer was stunned at the model’s coding abilities, remarking that “this is pretty unprecedented for me.” Another researcher is “frequently asking it to explain [AI] papers.”

Sonnet also outclassed its competitors with top test scores on popular AI benchmarks:

59% on the Google-Proof Q&A Benchmark (GPQA), a set of multiple-choice questions designed to stump PhD students, even when the students are given unlimited time and internet access.
67% on GPQA when using particular prompts and solution choice techniques designed to improve performance.
90% on Massive Multitask Language Understanding (MMLU), a set of multiple-choice questions designed for high school and college students in 57 different subjects ranging from chemistry to psychology to philosophy.
96% on Grade School Math 8K (GSM8K), a dataset of over 8,000 math problems designed for elementary school students.

Importantly, Anthropic announced that they will release Claude 3.5 Opus, a successor to Claude 3 Opus, “later this year.”

Claude 3.5 Opus will likely perform significantly better than 3.5 Sonnet on GPQA, because Claude 3 Opus was a larger and more capable model than Claude 3 Sonnet, the precursor to 3.5 Sonnet.

But for MMLU and GSM8K, there’s little room left for improvement, as Sonnet is already scoring over 90%.

Former GitHub CEO Nat Friedman noticed this, commenting that “we’re gonna need some new benchmarks.”

But before new tests arrive to replace the old ones—which lasted for only a few years—it’s worth contemplating the types of questions that could realistically stump AI systems for the foreseeable future.

Stated differently: what are the least impressive capabilities that AI models definitely won’t have by 2030?

And if AI progress exceeds those expectations like it has in the past, will the US government be ready to respond?

At the Center for AI Policy, we think there’s no time like the present to begin preparing for AI’s impacts. That’s why we support policies like funding the US AI Safety Institute and strengthening security at top AI companies.

Stability AI Attempts Recovery From Financial Woes

Many startups fail. AI ventures are not immune to this trend.

Indeed, Stability AI, a generative AI startup that was once considered a rising star, seemed poised to join the ranks of failed ventures. Yet in a surprising turn of events, the company recently secured a pathway to potential survival.

Stability’s story began in February 2022, when it had zero developers and zero researchers. It made its first hires the following month.

Stability initially attracted attention by supporting a famous text-to-image system, “Stable Diffusion,” with computing resources. In October 2022, the startup raised $101 million at a reported valuation of $1 billion.

But it quickly burned through its funds under the leadership of founder and CEO Emad Mostaque, a former hedge fund manager. By October 2023, the company had only $4 million left.

The two most significant expenses were supercomputers and R&D talent. Last October, the company’s projected 2023 costs included $99 million on compute and $54 million on wages and operating expenses.

Meanwhile, its projected 2023 revenue was a mere $11 million. For comparison, OpenAI is likely to earn billions of dollars in 2024.

The company also failed to pay for licensed training data, which can easily cost tens of millions of dollars. As a result, it faced a looming copyright infringement lawsuit from Getty Images.

Thus, Stability’s implosion highlights the steep costs of competing in cutting-edge AI development. Big AI is swiftly becoming a Big Tech project.

Nonetheless, the company gained a glimmer of hope this week. Facebook’s first president, Sean Parker, helped coordinate an effort to bring $80 million in fresh funding and a makeover to company leadership.

The new investors also “struck a deal with suppliers to forgive some $100 million owed by Stability,” and released the company from $300 million in future obligations. Much of this money was set to go towards computing resources.

Time will tell whether Stability can recover. But if it hopes to catch up to AI leaders like OpenAI and Anthropic, then it will probably need to spend hundreds of millions of dollars, if not more.

*Prem Akkaraju (right) will step in as CEO, three months after the departure of Stability AI’s founder and former CEO, Emad Mostaque (left).*

Senators Introduce Legislation to Raise Awareness of AI’s Effects on Daily Life

Last week, Senators Todd Young (R-IN) and Brian Schatz (D-HI) introduced the AI Public Awareness and Education Campaign Act.

The bill would direct the Secretary of Commerce to run an educational campaign to inform the public about AI’s benefits, risks, and prevalence in daily life.

Specific outreach efforts would include promoting best practices for detecting AI-generated content, informing vulnerable populations about AI-related scams, and highlighting AI-related workforce opportunities.

This legislation is timely, because more and more Americans are beginning to recognize and use AI.

For example, a recent poll found that 79% of US teachers are familiar with ChatGPT, up from 55% in February 2023. Additionally, 49% of surveyed K–12 students reported using ChatGPT at least weekly, up from 22% the previous year.

Thus, the bipartisan AI Public Awareness Act underscores the growing importance of AI literacy as a skill for navigating the 21st century.

News at CAIP

We’re pleased to announce the newest full-time member of CAIP: Claudia Wilson is joining the team as Senior Policy Analyst. Claudia earned her master’s degree in public policy at Yale’s Jackson School of Global Affairs, where she was part of the Schmidt Program on Artificial Intelligence, Emerging Technologies, and National Power. She also brings several years of consulting experience from her time at Boston Consulting Group (BCG).
We hosted a panel discussion on AI and privacy in the Rayburn House Office Building. Stay tuned for a recording and transcript.
Our latest research report explores AI and privacy concerns. We find that AI will both intensify current privacy concerns and fundamentally restructure the privacy landscape.
Jason Green-Lowe wrote a memo regarding tonight’s presidential debate. If there’s one thing this year’s presidential candidates agree on, artificial intelligence is scary.
CAIP on the road: today, we are hosting a booth at RecruitMilitary’s job fair in Joint Base Myer-Henderson Hall.
We’re hiring for an External Affairs Director.

Quote of the Week

The vast majority of research and development that has national security implications used to be government programs, and now it is happening in the private sector, so these companies became really potentially lucrative targets from a Chinese perspective.

—Lt. General H.R. McMaster (Ret.), former United States National Security Advisor, commenting on state-sponsored foreign espionage threats to US tech companies

This edition was authored by Jakub Kraus.

If you have feedback to share, a story to suggest, or wish to share music recommendations, please drop me a note at jakub@aipolicy.us.

—Jakub

A guest post by

Jakub Kraus

Tarbell Fellow writing about AI and policy

AI Policy Weekly