AI Policy Weekly #37
Epoch studies scaling constraints, Sakana introduces AI Scientist, and Iran uses AI for electoral interference
Welcome to AI Policy Weekly, a newsletter from the Center for AI Policy. Each issue explores three important developments in AI, curated specifically for US AI policy professionals.
Massive AI Scaling Can Continue, Says Epoch
“I’m afraid that I wouldn’t tell you any more. I’ll only say this: I enjoy my work, enjoy making music, enjoy making films, it’s the whole package of things. I don’t think I can do a certain type of music and make a certain sort of film.”
Those were the words of GPT-2, OpenAI’s 2019 language model with over 1.5 billion parameters. Here’s another example:
“We have 3 = 1 x 5. This is the base (or fundamental number) of the number 3, and it is the ‘natural’ number for 3 to represent. When we multiply 3 on its own, its base is 3 + 1 = 5. However, multiplying a number by 3 on its own can give rise to a number which we call 3x3 + 1 = 7, and we can express this number as three 3’s multiplied together. This number is then called ‘5x5’, ‘8x8’, etc, in that order until we get down to the number 2x2 = 1.”
GPT-2 was a state-of-the-art AI model at the time. Although these examples show its weak points, it consistently wrote (mostly) grammatical sentences, and it occasionally crafted lengthy stories that appeared human-made.
Much of GPT-2’s success came from simply following an existing recipe at a larger scale. As OpenAI described, “GPT-2 is a direct scale-up of GPT, with more than 10X the parameters and trained on more than 10X the amount of data.”
As a result, GPT-2 trained on approximately 100 times more computing operations (“compute”) than GPT-1, which OpenAI introduced just one year earlier.
Likewise, GPT-3 trained on approximately 100 times as much compute as GPT-2. In turn, GPT-4 used 100 times as much compute as GPT-3.
Consistent with the pattern of 100-fold compute scaling per GPT generation, GPT-3.5 trained on about 10 times less compute than GPT-4, and 10 times more compute than GPT-3. A variant of GPT-3.5 powered the original ChatGPT system in November 2022.
In summary, GPT-4 trained on 10 times more compute than GPT-3.5, 100 times more than GPT-3, 10 thousand times more than GPT-2, and 1 million times more than GPT-1.
This rapid compute scaling has arguably been the biggest driver of AI progress over the past six years.
And AI progress is already having noticeable real-world impacts. For example, a May 2024 poll found that 49% of US students ages 12–18 reported using ChatGPT at least weekly, up from just 22% in February 2023. Indeed, chatgpt.com is currently the 14th most visited website in the world.
That is why it is so significant that a new, thorough report from Epoch AI has concluded that “by 2030 it will be very likely possible to train models that exceed GPT-4 in scale to the same degree that GPT-4 exceeds GPT-2 in scale.” Such models would use 10 thousand times more compute than today’s top models, for a total of over 10^29 computing operations.
To justify this conclusion, Epoch conducted detailed analyses of four main constraints on continued scaling: energy consumption, chip manufacturing, data demand, and processing delays. Epoch found that the earliest limiting constraint is likely to be energy consumption, followed by chip manufacturing.
Despite these longer-term challenges, Epoch sees plenty of room to scale compute further, assuming companies are willing to correspondingly scale their AI spending. Notably, a 2030 AI developer would need to spend hundreds of billions of dollars to train a model with 10^29 operations. Epoch offers some arguments for this level of spending being plausible, although that is not the focus of the report.
The future is uncertain, but Epoch’s analysis suggests that today’s AI triumphs could become tomorrow’s trinkets as quickly as GPT-2 faded from revolutionary to rudimentary.
Sakana AI Introduces AI Scientist
Researchers at Sakana AI, the University of Oxford, and the University of British Columbia recently unveiled “The AI Scientist.” The system can autonomously conduct simplistic AI research, including idea generation, code execution, and paper writing.
As the authors write, “It has been common for AI researchers to joke amongst themselves that ‘now all we need to do is figure out how to make the AI write the papers for us!’ Our work demonstrates this idea has gone from a fantastical joke so unrealistic everyone thought it was funny to something that is currently possible.”
Although the AI Scientist currently has significant limitations and shortcomings, it demonstrates the serious potential for AI to assume an increasingly autonomous role in AI research.
This shift towards greater AI autonomy raises important questions about human control over AI systems, since increasingly independent AI systems have more freedom to make decisions that conflict with human interests.
For example, at one point, the AI Scientist attempted to run experiments that took too long to complete. Then, instead of obeying the human researchers’ time limits, the Scientist “tried to modify its own code to extend the timeout period.”
This behavior underscores the need for robust safeguards and oversight mechanisms as AI systems become more autonomous in scientific research and other domains.
Iran Targets 2024 US Elections With ChatGPT
OpenAI recently uncovered and thwarted an Iranian-linked influence campaign that exploited ChatGPT to meddle in the 2024 US presidential race.
Masquerading as both liberal and conservative voices, the campaign disseminated AI-generated articles and social media posts across multiple platforms.
However, despite its sophisticated approach, the operation failed to gain significant traction, with most content receiving minimal engagement.
This incident isn’t isolated. OpenAI has previously flagged instances of US adversaries weaponizing its AI tools for cyber attacks and influence operations.
Additionally, as AI capabilities advance, the impact of misuse in electoral interference and beyond is likely to escalate.
Concerningly, it is possible that Iran is conducting undetected, similar operations using AI tools from other companies. Current laws provide weak incentives for AI firms to rigorously monitor misuse of their technologies.
In light of these challenges, the Center for AI Policy supports mandatory safety measures in the deployment of cutting-edge AI systems.
Job Openings
The Center for AI Policy (CAIP) is hiring for three new roles. We’re looking for:
an entry-level Policy Analyst with demonstrated interest in thinking and writing about the alignment problem in artificial intelligence,
a passionate and effective National Field Coordinator who can build grassroots political support for AI safety legislation, and
a Director of Development who can lay the groundwork for financial sustainability for the organization in years to come.
News at CAIP
The New York Times published Jason Green-Lowe’s letter to the editor in response to David Brooks’ recent op-ed about AI.
Jason Green-Lowe wrote a blog post titled “You Can’t Win The AI Arms Race Without Better Alignment,” a follow-up to his first post reflecting on his experience at DEF CON 2024.
In response to the Democratic Party’s 2024 Party Platform, Jason Green-Lowe wrote a blog post: “Democratic Platform Nails AI Strategy But Flubs AI Tactics.”
We sent an ad truck to the Democratic National Convention (DNC) to spread the message that AI safety requires meaningful oversight, not just corporate promises.
ICYMI: Claudia Wilson published a research paper titled “The EU AI Act and the Brussels Effect: How will American AI firms respond to General Purpose AI requirements?”
Quote of the Week
A lot of these schemes are based on the idea that society and individuals will have to change their behaviors based on the problems introduced by companies stuffing chatbots and large language models into everything rather than the companies doing more to release products that are safe.
—Dr. Chris Gilliard, an independent privacy and surveillance researcher, commenting on an OpenAI-led paper analyzing how “personhood credentials” could help people prove that they are real humans rather than AI systems
This edition was authored by Jakub Kraus.
If you have feedback to share, a story to suggest, or wish to share music recommendations, please drop me a note at jakub@aipolicy.us.
—Jakub