{"id":19825,"date":"2026-03-20T12:34:06","date_gmt":"2026-03-20T12:34:06","guid":{"rendered":"https:\/\/ideainthebox.com\/index.php\/2026\/03\/20\/openai-is-throwing-everything-into-building-a-fully-automated-researcher\/"},"modified":"2026-03-20T12:34:06","modified_gmt":"2026-03-20T12:34:06","slug":"openai-is-throwing-everything-into-building-a-fully-automated-researcher","status":"publish","type":"post","link":"https:\/\/ideainthebox.com\/index.php\/2026\/03\/20\/openai-is-throwing-everything-into-building-a-fully-automated-researcher\/","title":{"rendered":"OpenAI is throwing everything into building a fully automated researcher"},"content":{"rendered":"<div>\n<div data-chronoton-summary=\"&lt;ul&gt;&lt;br&gt;&lt;li&gt;&lt;strong&gt;A fully automated research lab:&lt;\/strong&gt; OpenAI has set a new &quot;north star&quot; \u2014 building an AI system capable of tackling large, complex scientific problems entirely on its own, with a research intern prototype due by September and a full multi-agent system planned for 2028.&lt;\/li&gt;&lt;br&gt;&lt;li&gt;&lt;strong&gt;Coding agents as a proof of concept:&lt;\/strong&gt; OpenAI's existing tool Codex, which can already handle substantial programming tasks autonomously, is the early blueprint \u2014 the bet is that if AI can solve coding problems, it can solve almost any problem formulated in text or code.&lt;\/li&gt;&lt;br&gt;&lt;li&gt;&lt;strong&gt;Serious risks with no clean answers:&lt;\/strong&gt; Chief scientist Jakub Pachocki admits that a system this powerful running with minimal human oversight raises hard questions \u2014 from hacking and misuse to bioweapons \u2014 and that chain-of-thought monitoring is the best safeguard available, for now.&lt;\/li&gt;&lt;br&gt;&lt;li&gt;&lt;strong&gt;Power concentrated in very few hands:&lt;\/strong&gt; Pachocki says governments, not just OpenAI, will need to figure out where the lines are drawn.&lt;\/li&gt;&lt;br&gt;&lt;\/ul&gt;\" data-chronoton-post-id=\"1134438\" data-chronoton-expand-collapse=\"1\" data-chronoton-analytics-enabled=\"1\"><\/div>\n<p>OpenAI is refocusing its research efforts and throwing its resources into a new grand challenge. The San Francisco firm has set its sights on building what it calls an AI researcher, a fully automated agent-based system that will be able to go off and tackle large, complex problems by itself. \u200b\u200bOpenAI says that the new goal will be its \u201cnorth star\u201d for the next few years, pulling together multiple research strands, including work on reasoning models, agents, and <a href=\"https:\/\/www.technologyreview.com\/2026\/01\/12\/1129782\/ai-large-language-models-biology-alien-autopsy\/\">interpretability<\/a>.<\/p>\n<p>There\u2019s even a timeline. OpenAI plans to build \u201can autonomous AI research intern\u201d\u2014a system that can take on a small number of specific research problems by itself\u2014by September. The AI intern will be the precursor to a fully automated multi-agent research system that the company plans to debut in 2028. This AI researcher (OpenAI says) will be able to tackle problems that are too large or complex for humans to cope with.<\/p>\n<p>Those tasks might be related to math and physics\u2014such as coming up with new proofs or conjectures\u2014or life sciences like biology and chemistry, or even business and policy dilemmas. In theory, you would throw such a tool any kind of problem that can be formulated in text, code or whiteboard scribbles\u2014which covers a lot.<\/p>\n<p>OpenAI has been setting the agenda for the AI industry for years. Its early <a href=\"https:\/\/www.technologyreview.com\/2023\/03\/03\/1069311\/inside-story-oral-history-how-chatgpt-built-openai\/\">dominance with large language models<\/a> shaped the technology that hundreds of millions of people use every day. But it now faces fierce competition from rival model makers like Anthropic and Google DeepMind. What OpenAI decides to build next matters\u2014for itself and for the future of AI.\u00a0\u00a0\u00a0<\/p>\n<p>A big part of that decision falls to Jakub Pachocki, OpenAI\u2019s chief scientist. Alongside chief research officer Mark Chen, Pachocki is one of <a href=\"https:\/\/www.technologyreview.com\/2025\/07\/31\/1120885\/the-two-people-shaping-the-future-of-openais-research\/\">two people responsible for setting the company\u2019s long-term research goals<\/a>. Pachocki played key roles in the development of both GPT-4, a game-changing LLM released in 2023, and so-called reasoning models, a technology that first appeared in 2024 and now underpins all major chatbots and agent-based systems.\u00a0<\/p>\n<p>In an exclusive interview this week, Pachocki talked me through OpenAI\u2019s new grand challenge. \u201cI think we are getting close to a point where we\u2019ll have models capable of working indefinitely in a coherent way just like people do,\u201d he says. \u201cOf course, you still want people in charge and setting the goals. But I think we will get to a point where you kind of have a whole research lab in a data center.\u201d<\/p>\n<p>Such big claims aren\u2019t new. Saving the world by solving its hardest problems is the stated mission of all the top AI firms. Demis Hassabis told me back in 2022 that it was <a href=\"https:\/\/www.technologyreview.com\/2022\/02\/23\/1045016\/ai-deepmind-demis-hassabis-alphafold\/\">why he started DeepMind<\/a>. Anthropic CEO Dario Amodei says he is building the equivalent of a <a href=\"https:\/\/darioamodei.com\/essay\/machines-of-loving-grace\">country of geniuses in a data center<\/a>. Pachocki\u2019s boss, Sam Altman, <a href=\"https:\/\/www.technologyreview.com\/2025\/12\/15\/1129169\/a-brief-history-of-sam-altmans-hype\/\">wants to cure cancer<\/a>. But Pachocki says OpenAI now has most of what it needs to get there.<\/p>\n<p>In January, OpenAI released Codex, an agent-based app that can spin up code on the fly to carry out tasks on your computer. It can analyze documents, generate charts, make you a daily digest of your inbox and social media, and much more. OpenAI claims that most of its technical staff now use Codex in their work. You can look at Codex as a very early version of the AI researcher, says Pachocki: \u201cI expect Codex to get fundamentally better.\u201d<\/p>\n<p>The key is to make a system that can run for longer periods of time, with less human guidance. \u201cWhat we\u2019re really looking at for an automated research intern is a system that you can delegate tasks that would take a person a few days,\u201d says Pachocki.<\/p>\n<p>\u201cThere are a lot of people excited about building systems that can do more long-running scientific research,\u201d says Doug Downey, a research scientist at the Allen Institute for AI, who is not connected to OpenAI. \u201cI think it\u2019s largely driven by the success of these coding agents. The fact that you can delegate quite substantial coding tasks to tools like Codex is incredibly useful and incredibly impressive. And it raises the question: Can we do similar things outside coding, in broader areas of science?\u201d<\/p>\n<p>For Pachocki, that\u2019s a clear <em>Yes<\/em>. In fact, he thinks it\u2019s just a matter of pushing ahead on the path we\u2019re already on. A simple boost in all-round capability also leads to models working for longer without help, he says. He points to the leap from <a href=\"https:\/\/www.technologyreview.com\/2020\/07\/20\/1005454\/openai-machine-learning-language-generator-gpt-3-nlp\">2020\u2019s GPT-3<\/a> to <a href=\"https:\/\/www.technologyreview.com\/2023\/03\/14\/1069823\/gpt-4-is-bigger-and-better-chatgpt-openai\">2023\u2019s GPT-4<\/a>, two of OpenAI\u2019s previous models. GPT-4 was able to work on a problem for far longer than its predecessor, even without specialized training, he says.\u00a0<\/p>\n<p>So-called reasoning models brought another bump. Training LLMs to work through problems step by step, backtracking when they make a mistake or hit a dead end, has also made models better at working for longer periods of time. And Pachocki is convinced that OpenAI\u2019s reasoning models will continue to get better.<\/p>\n<p>But OpenAI is also training its systems to work by themselves for longer by feeding them specific samples of complex tasks, such as hard puzzles taken from math and coding contests, which force models to learn how to do things like keep track of very large chunks of text and split problems up into (and then manage) multiple subtasks.<\/p>\n<p>The aim isn\u2019t to build models that just win math competitions. \u201cThat lets you prove that the technology works before you connect it to the real world,\u201d says Pachocki. \u201cIf we really wanted to, we could build an amazing automated mathematician, we have all the tools, and I think it would be relatively easy. But it\u2019s not something we\u2019re going to prioritize now because, you know, at the point where you believe you can do it, there\u2019s much more urgent things to do.\u201d<\/p>\n<p>\u201cWe are much more focused now on research that\u2019s relevant in the real world,\u201d he adds.<\/p>\n<p>Right now that means taking what Codex (and tools like it) can do with coding and trying to apply that to problem-solving in general. \u201cThere\u2019s a big change happening, especially in programming,\u201d he says. \u201cOur jobs are now totally different than they were even a year ago. Nobody really edits code all the time anymore. Instead, you manage a group of Codex agents.\u201d If Codex can solve coding problems (the argument goes), it can solve any problem.<\/p>\n<h3 class=\"wp-block-heading\"><strong>The line always goes up<\/strong><\/h3>\n<p>It\u2019s true that OpenAI has had a handful of remarkable successes in the last few months. Researchers have used GPT-5 (the LLM that powers Codex) to discover new solutions to a number of unsolved math problems and punch through apparent dead ends in a <a href=\"https:\/\/www.technologyreview.com\/2026\/01\/26\/1131728\/inside-openais-big-play-for-science\/\">handful of biology, chemistry and physics puzzles<\/a>.\u00a0\u00a0\u00a0<\/p>\n<p>\u201cJust looking at these models coming up with ideas that would take most PhD weeks, at least, makes me expect that we\u2019ll see much more acceleration coming from this technology in the near future,\u201d Pachocki says.<\/p>\n<p>But Pachocki admits that it\u2019s not a done deal. He also understands why some people still have doubts about how much of a game-changer the technology really is. He thinks it depends on how people like to work and what they need to do. \u201cI can believe some people don\u2019t find it very useful yet,\u201d he says.<\/p>\n<p>He tells me that he didn\u2019t even use autocomplete\u2014the<a href=\"https:\/\/www.technologyreview.com\/2023\/12\/06\/1084457\/ai-assistants-copilot-changing-code-software-development-github-openai\/\"> most basic version of generative coding tech<\/a>\u2014a year ago himself. \u201cI\u2019m very pedantic about my code,\u201d he says. \u201cI like to type it all manually in vim if I can help it.\u201d (Vim is a text editor favored by many hardcore programmers that you interact with via dozens of keyboard shortcuts instead of a mouse.)<\/p>\n<p>But that changed when he saw what the latest models could do. He still wouldn\u2019t hand over complex design tasks, but it\u2019s a time saver when he just wants to try out a few ideas. \u201cI can have it run experiments in a weekend that previously would have taken me like a week to code,\u201d he says.<\/p>\n<p>\u201cI don\u2019t think it is at the level where I would just let it take the reins and design the whole thing,\u201d he adds. \u201cBut once you see it do something that would take a week to do, I mean that\u2019s hard to argue with.\u201d<\/p>\n<p>Pachocki\u2019s game plan is to supercharge the existing problem-solving abilities that tools like Codex have now and apply them across the sciences.\u00a0\u00a0<\/p>\n<p>Downey agrees that the idea of an automated researcher is very cool: \u201cIt would be exciting if we could come back tomorrow morning and the agent\u2019s done a bunch of work and there\u2019s new results we can examine,\u201d he says.<\/p>\n<p>But he cautions that building such a system could be harder than Pachocki makes out. Last summer, Downey and his colleagues <a href=\"https:\/\/arxiv.org\/abs\/2510.21652\">tested several top-tier LLMs on a range of scientific tasks<\/a>. OpenAI\u2019s latest model, GPT-5, came out on top but still made lots of errors. <\/p>\n<p>\u201cIf you have to chain tasks together then the odds that you get several of them right in succession tend to go down,\u201d he says. Downey admits that things move fast and he has not tested the latest versions of GPT-5 (OpenAI released GPT-5.4 two weeks ago). \u201cSo those results might already be stale,\u201d he says.\u00a0<\/p>\n<h3 class=\"wp-block-heading\"><strong>Serious unanswered questions<\/strong><\/h3>\n<p>I ask Pachocki about the risks that may come with a system that can solve large, complex problems by itself with little human oversight. Pachocki says people at OpenAI talk about those risks all the time.<\/p>\n<p>\u201cIf you believe that AI is about to substantially accelerate research, including AI research, that\u2019s a big change in the world, that\u2019s a big thing,\u201d he says. \u201cAnd it comes with some serious unanswered questions. If it\u2019s so smart and capable, if it can run an entire research program, what if it does something bad?\u201d<\/p>\n<p>The way Pachocki sees it, that could happen in a number of ways. The system could go off the rails. It could get hacked. Or it could simply misunderstand its instructions.<\/p>\n<p>The best technique OpenAI has right now to address these concerns is to train its reasoning models to share details about what they are doing as they work. This approach to keeping tabs on LLMs is known as <a href=\"https:\/\/www.technologyreview.com\/2026\/01\/12\/1129782\/ai-large-language-models-biology-alien-autopsy\/\">chain-of-thought monitoring<\/a>.<\/p>\n<p>In short, LLMs are trained to jot down notes about what they are doing in a kind of scratchpad as they step through tasks. Researchers can then use those notes to make sure a model is behaving as expected. Yesterday OpenAI published new details on how it is <a href=\"https:\/\/openai.com\/index\/how-we-monitor-internal-coding-agents-misalignment\/\">using chain-of-thought monitoring in-house to study Codex<\/a>.\u00a0<\/p>\n<p>\u201cOnce we get to systems working mostly autonomously for a long time in a big data center, I think this will be something that we\u2019re really going to depend on,\u201d says Pachocki.<\/p>\n<p>The idea would be to monitor an AI researcher\u2019s scratchpads using other LLMs and catch unwanted behavior before it\u2019s a problem, rather than stop that bad behavior from happening in the first place. LLMs are not understood well enough to control them fully.<\/p>\n<p>\u201cI think it\u2019s going to be a long time before we can really be like, okay, this problem is solved,\u201d he says. \u201cUntil you can really trust the systems, you definitely want to have restrictions in place.\u201d Pachocki thinks that very powerful models should be deployed in sandboxes cut off from anything they could break or use to cause harm.\u00a0<\/p>\n<p>AI tools have already been used to come up with novel cyberattacks. Some worry that they will be used to design synthetic pathogens that could be used as bioweapons. You can insert any number of evil-scientist scare stories here. \u201cI definitely think there are worrying scenarios that we can imagine,\u201d says Pachocki.\u00a0<\/p>\n<p>\u201cIt\u2019s going to be a very weird thing, it\u2019s extremely concentrated power that\u2019s in some ways unprecedented,\u201d says Pachocki. \u201cImagine you get to a world where you have a data center that can do all the work that OpenAI or Google can do. Things that in the past required large human organisations, would now be done by a couple of people.\u201d<\/p>\n<p>\u201cI think this is a big challenge for governments to figure out,\u201d he adds.<\/p>\n<p>And yet some people would say governments were part of the problem. The US government wants to use AI on the battlefield, for example. The recent showdown between Anthropic and the Pentagon revealed that there is little agreement across society about where we draw red lines for how this technology should and should not be used\u2014let alone who should draw them. In the immediate aftermath of that dispute, OpenAI stepped up to sign a deal with the Pentagon instead of its rival. The situation remains murky.<\/p>\n<p>I push Pachocki on this. Does he really trust other people to figure it out or does he, as a key architect of the future, feel personal responsibility? \u201cI do feel personal responsibility,\u201d he says. \u201cBut I don\u2019t think this can be resolved by OpenAI alone, pushing its technology in a particular way or designing its products in a particular way. We\u2019ll definitely need a lot of involvement from policy makers.\u201d<\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>OpenAI is refocusing its research efforts and throwing its resources  [&#8230;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"content-type":"","footnotes":""},"categories":[226],"tags":[],"class_list":["post-19825","post","type-post","status-publish","format-standard","hentry","category-technology"],"acf":[],"_links":{"self":[{"href":"https:\/\/ideainthebox.com\/index.php\/wp-json\/wp\/v2\/posts\/19825","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ideainthebox.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ideainthebox.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ideainthebox.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ideainthebox.com\/index.php\/wp-json\/wp\/v2\/comments?post=19825"}],"version-history":[{"count":0,"href":"https:\/\/ideainthebox.com\/index.php\/wp-json\/wp\/v2\/posts\/19825\/revisions"}],"wp:attachment":[{"href":"https:\/\/ideainthebox.com\/index.php\/wp-json\/wp\/v2\/media?parent=19825"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ideainthebox.com\/index.php\/wp-json\/wp\/v2\/categories?post=19825"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ideainthebox.com\/index.php\/wp-json\/wp\/v2\/tags?post=19825"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}