AI Policy Weekly #61

Deep Research, International AI Safety Report, Kodiak RoboTrucks

and

Feb 06, 2025

Welcome to AI Policy Weekly, a newsletter from the Center for AI Policy (CAIP). Each issue explores three important developments in AI, curated specifically for U.S. AI policy professionals.

OpenAI Launches New AI Agent for Conducting Research

OpenAI deployed a new AI system on Tuesday called “Deep Research.” The tool is available to all $200/month ChatGPT Pro users, who can submit up to 100 queries per month.

In response to a single prompt, Deep Research will “find, analyze, and synthesize hundreds of online sources to create a comprehensive report.”

This process can take “anywhere from 5 to 30 minutes.” For example, Deep Research was able to produce a 30-page treatise on tabletop role-playing games in 15 minutes. Thus, Deep Research is a type of AI “agent,” an autonomous machine that completes tasks without human handholding.

Under the hood, Deep Research is currently powered by “a version of the upcoming OpenAI o3 model that’s optimized for web browsing and data analysis.”

Astute AI Policy Weekly readers will remember that Google released a similar product in December with the exact same name: “Deep Research.”

According to UPenn Associate Professor Ethan Mollick, Google’s version “surfaces far more citations, but they are often a mix of websites of varying quality.” The resulting research reports are “much more surface-level” than OpenAI’s Deep Research.

Perhaps the most impressive accomplishment of OpenAI’s Deep Research is its record-breaking 26.6% accuracy on “Humanity’s Last Exam,” a brand-new collection of 3,000 expert-crafted questions that stumped prior AI models across over 100 different disciplines. OpenAI’s o1 reasoning model previously led the pack with a score of just 9.1%.

There are already some rave reviews about Deep Research. Tyler Cowen, a professor of economics at George Mason University, used it to write a ten-page paper on 19th-century economist David Ricardo’s theory of rent.

“I compared it to a number of other sources online, and thought it was better, and so I am using it for my history of economic thought class,” says Cowen. “I do not currently see signs of originality, but the level of accuracy and clarity is stunning.”

Other reviews are less enthusiastic.

“The general theme,” writes AI commentator Zvi Mowshowitz, is “it will give you a lot of text, most of it accurate, but not all, and it will have some insights but pile the slop and unimportant stuff high on top of it without noticing which is which.”

“That is highly useful,” Mowshowitz adds, “if you know how to use it.”

OpenAI Chief Research Officer Mark Chen (left) introduced Deep Research in a livestream from Tokyo with colleagues (from right to left) Josh Tobin, Neel Ajjarapu, and Isa Fulford. (source)

AI Experts Publish Thorough Scientific Report on AI Safety

In an unprecedented collaboration, 96 AI experts backed by 30 countries have produced what is arguably the most comprehensive and rigorous assessment ever of AI safety.

Led by AI pioneer Yoshua Bengio, the 2025 International AI Safety Report studies the capabilities and risks of general-purpose AI systems, as well as approaches to risk management. Here are some notable findings:

“We are in the midst of a technological revolution that will fundamentally alter the way we live, work, and relate to one another.”
“There is no consistently up-to-date comprehensive index of AI capabilities.”
“It is difficult to predict when specific capabilities will appear.”
“Machine-generated synthetic data could dramatically alleviate data bottlenecks, but evidence for its utility is mixed.”
“General-purpose AI systems are increasingly deployed to automate and accelerate AI research and development.”
“Abuse using fake pornographic or intimate content overwhelmingly targets women and girls.”
“Current techniques for identifying content generated by general-purpose AI are helpful but often easy to circumvent.”
“General-purpose AI offers significant dual-use cyber capabilities.”
“LLMs can now provide detailed, step-by-step plans for creating chemical and biological weapons, improving on plans written by people with a relevant PhD.”
“Hypothesized outcomes from loss of control [of AI] vary in severity, but include the marginalization or extinction of humanity.”
“Involuntary job loss can cause long-lasting and severe harms for affected workers.”
“The development of state-of-the-art general-purpose AI requires enormous financial investment, often reaching hundreds of millions of US dollars.”
“Whistleblowers can play an important role in alerting authorities to dangerous risks at AI companies due to the proprietary nature of many AI advancements.”
“General-purpose AI developers understand little about how their models operate internally.”
“The culture of ‘build-then-test’ in AI hinders comprehensive risk assessment and mitigation.”

The report’s conclusion is clear: “AI does not happen to us; choices made by people determine its future.”

Those choices are being made now.

The report includes this chart, which illustrates how quickly AI systems have matched and exceeded human performance on increasingly difficult tests. (source)

Atlas Energy Deploys Self-Driving Kodiak RoboTrucks to Deliver Sand

Atlas Energy Solutions recently reached a major milestone by completing 100 deliveries of fracking sand using driverless trucks equipped with Kodiak Robotics’ autonomous driving software.

“This is the first time, as far as we’re aware, that the customer is owning and operating the driverless vehicle, instead of the AV company,” said Kodiak CEO Don Burnette. “We think this is the model of the future.”

TechCrunch reports that “Kodiak is now generating revenue from Atlas through a combined hardware and software annual subscription.”

According to a press release, Atlas is currently using just two Kodiak-powered “RoboTrucks,” but Atlas “intends to scale its RoboTruck deployment considerably over the course of 2025 with multiple RoboTruck deployments expected throughout the year.”

To support this expansion, Kodiak has established an office in Odessa, Texas with over a dozen employees.

Adding to the automation atmosphere, Atlas has begun operating “Dune Express, a 42-mile long, fully-electric conveyor system that carries sand from Atlas’s Kermit, Texas sand facility to an end-of-line loadout facility in eastern New Mexico.”

Atlas plans to connect this conveyor system with its self-driving trucks, which would transport sand “from the Dune Express to Atlas’s customers across the Delaware Basin.” (Note that the Delaware Basin is not in Delaware; it’s in Texas and New Mexico.)

These advancements are the latest stage in a partnership between Atlas and Kodiak that began last year, when the companies completed a 21-mile test delivery in West Texas.

So begins the era of driverless trucks.

A Kodiak-powered RoboTruck in action. (source)

CAIP News

On January 30th, CAIP convened public safety leaders, federal officials, and AI experts to examine AI threats to emergency response. The event featured tabletop exercise scenario discussions and took place in Fairfax County’s McConnell Public Safety and Transportation Operations Center.
Jason Green-Lowe wrote a blog post on Super Bowl LIX, AI-powered commercials, and the gradual loss of human control over AI.
Mark Reddish wrote a blog post about DeepSeek R1’s internal reasoning and the need for careful audits of AI systems’ inner workings.
Claudia Wilson wrote a blog post reviewing Meta’s Frontier AI Framework.
Jason Green-Lowe wrote a blog post on Humanity’s Last Exam and the dwindling edge humans retain over machines in building benchmarks for advanced AI.
Willamette University’s alumni news published a profile of Bethany Abbate BA’21, featuring her contributions to CAIP’s panel discussion on AI and education.
CAIP joined together with several AI policy organizations urging the Senate to confirm Howard Lutnick as the incoming U.S. Secretary of Commerce.
ICYMI: Jakub Kraus wrote a blog post on the flurry of AI policy actions from the U.S. executive branch during Biden’s final week and Trump’s first week in office.

Quote of the Week

Honestly I’m pretty terrified by the pace of AI development these days. When I think about where I’ll raise a future family, or how much to save for retirement, I can’t help but wonder: Will humanity even make it to that point?

—Steven Adler, an AI researcher who worked at OpenAI for four years before departing in November 2024

This edition was authored by Jakub Kraus.

If you have feedback to share, a story to suggest, or wish to share music recommendations, please drop me a note at jakub@aipolicy.us.

—Jakub

A guest post by

Jakub Kraus

Tarbell Fellow writing about AI and policy

AI Policy Weekly