{"id":20703,"date":"2026-04-06T11:01:51","date_gmt":"2026-04-06T11:01:51","guid":{"rendered":"https:\/\/ideainthebox.com\/index.php\/2026\/04\/06\/how-to-reap-compound-benefits-from-generative-ai\/"},"modified":"2026-04-06T11:01:51","modified_gmt":"2026-04-06T11:01:51","slug":"how-to-reap-compound-benefits-from-generative-ai","status":"publish","type":"post","link":"https:\/\/ideainthebox.com\/index.php\/2026\/04\/06\/how-to-reap-compound-benefits-from-generative-ai\/","title":{"rendered":"How to Reap Compound Benefits From Generative AI"},"content":{"rendered":"<div>\n<figure class=\"article-inline\">\n<img decoding=\"async\" src=\"data:image\/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==\" data-orig-src=\"https:\/\/sloanreview.mit.edu\/wp-content\/uploads\/2026\/04\/Kiron-1290x860-1.jpg\" alt=\"\" class=\"lazyload wp-image-126461\"><figcaption>\n<p class=\"attribution\">Carolyn Geason-Beissel\/MIT SMR | Minneapolis Institute of Art<\/p>\n<\/figcaption><\/figure>\n<\/p>\n<p><span class=\"smr-leadin\">In domain after domain<\/span>, AI has compressed work that used to be expensive \u2014 generating drafts, code, prototypes, and analyses. The marginal cost of a first attempt has dropped sharply. What remains expensive is what happens after the output arrives: evaluating what gets generated. That involves separating signals from noise, catching errors, capturing what was learned, and applying those lessons to the next iteration.<\/p>\n<p>This shift changes what organizations should optimize for. The old question was \u201cHow do we produce more, faster?\u201d The new question is \u201cHow do we systematically learn from, and with, what AI produces?\u201d<\/p>\n<p>Most organizations still overinvest in answering the old question. They treat artificial intelligence as a throughput accelerator: task in, output out, loop closes. This is consumption economics. A serious CFO instantly recognizes the pattern: asset depreciation.<\/p>\n<p>The organizations pulling ahead answer the new question. They treat AI as a capability accelerator: task in, output out. But they also ask, \u201cWhat worked? What failed? What should change next time?\u201d Insights get captured, converted into shared knowledge, and applied to subsequent interactions. Each cycle makes the next more effective. This is compounding value. Serious CFOs recognize this pattern, too: asset appreciation.<\/p>\n<\/p>\n<p>The data bears this out. Organizations that build systematic feedback loops between humans and AI are six times more likely to derive substantial financial benefits from AI, according to research by <cite>MIT Sloan Management Review<\/cite> and Boston Consulting Group.<a id=\"reflink1\" class=\"reflink\" href=\"https:\/\/sloanreview.mit.edu\/article\/how-to-reap-compound-benefits-from-generative-ai\/#ref1\">1<\/a> Organizations that invest in learning with AI are 73% more likely to achieve significant financial impact.<a id=\"reflink2\" class=\"reflink\" href=\"https:\/\/sloanreview.mit.edu\/article\/how-to-reap-compound-benefits-from-generative-ai\/#ref2\">2<\/a>  Yet, as of 2024, 70% of companies had adopted AI, but only 15% were using it for organizational learning.<a id=\"reflink3\" class=\"reflink\" href=\"https:\/\/sloanreview.mit.edu\/article\/how-to-reap-compound-benefits-from-generative-ai\/#ref3\">3<\/a><\/p>\n<p>Leaders seeking compound returns must build what most companies don\u2019t yet understand, let alone possess: systems that verify AI outputs, evaluate what they reveal, and capture what was learned so that each interaction becomes a building block for the next. This type of ROI with GenAI \u2014 return on iteration \u2014 doesn\u2019t happen by accident; it requires infrastructure. Let\u2019s examine what that infrastructure looks like.<\/p>\n<h3>Why This Moment Is Structurally Different<\/h3>\n<p>This is not old productivity advice dressed in new rhetoric. Two complementary economic dynamics that reinforce each other in a virtuous cycle make compounding management an imperative. <\/p>\n<p>In his 1966 book <cite>The Tacit Dimension<\/cite>, philosopher Michael Polanyi observed that humans know more than they can articulate. For decades, that tacit knowledge protected knowledge workers. What could not be explicitly described could not be automated. Tacit expertise was a moat.<\/p>\n<p>AI breaches that moat \u2014 not by codifying tacit knowledge but by inferring it from behavioral traces at scale. Large language models (LLMs) absorb how experts actually work, including knowledge the experts never articulated. Legal reasoning in briefs and opinions, financial judgment in analyst reports and trading patterns, strategic thinking in board presentations: As these behavioral traces become more legible to AI models, the tacit expertise embedded in them becomes readable by machines.<\/p>\n<\/p>\n<p>Boris Cherny, who led the development of Claude Code, described a revealing moment: After he gave Claude the tools to interact with his file system, the <a href=\"https:\/\/newsletter.pragmaticengineer.com\/p\/how-claude-code-is-built\" target=\"_blank\">AI began exploring the system on its own<\/a> to find answers. \u201cIt was mind-blowing,\u201d Cherny said. He had not programmed that capability. The model inferred how developers work from the traces they had left behind \u2014 behaviors that no one had previously formalized.<\/p>\n<p>The second dynamic makes the economic case for compounding even more compelling. In 1865, economist William Stanley Jevons observed that when steam engines became more efficient, coal consumption increased rather than decreased. Efficiency gains made the capability cheaper, stimulating demand. As tacit expertise becomes readable by machines, the cost of sophisticated capability drops dramatically. Projects that were previously too expensive to prototype can proliferate. Iteration cycles that once took months compress to hours. More expertise becomes readable to machines, expanding what AI can access while enhancing the AI\u2019s knowledge base and improving its capability. More capability expands what organizations attempt. The loop feeds itself.<\/p>\n<p>The data supports this structural shift. Organizations that combine strong organizational learning with learning specific to AI are up to 80% more effective at managing uncertainty.<a id=\"reflink4\" class=\"reflink\" href=\"https:\/\/sloanreview.mit.edu\/article\/how-to-reap-compound-benefits-from-generative-ai\/#ref4\">4<\/a> The implication is direct: Becoming better learners with AI is at least as important as using AI to create efficiencies.<\/p>\n<p>The organizational challenge worldwide is not whether or how AI will access their people\u2019s domain expertise \u2014 that appears computationally inevitable. The issue is developing the competence and commitment to install mechanisms that reap compounding returns on human-AI interactions before competitors do.<\/p>\n<\/p>\n<h3>Three Steps to Compounding Benefits<\/h3>\n<p>What do those essential mechanisms look like? We argue that organizations must prioritize three distinct but interrelated operations. When all three of the following steps are present and connected, organizations can reap compounding benefits on AI use. When any step is missing, organizations merely consume AI outputs.<\/p>\n<p><strong>1. Verification.<\/strong> The question here is \u201cDoes this output meet the standard?\u201d This step produces a binary answer: correct or incorrect, usable or not. Verification compares output against a criterion that already exists. Unverified AI output is noise with a confident tone. But verification, used alone, catches errors without generating learning.<\/p>\n<p><strong>2. Evaluation.<\/strong> For this step, the question is \u201cWhat does this output reveal?\u201d Where verification compares output against existing standards, evaluation may generate standards that did not exist before. This is why evaluation requires domain expertise in ways verification often does not. The expert as evaluator is not merely checking quality. They are discovering <em>what quality means<\/em> in this new context. With AI outputs, evaluation is required across three dimensions: volume, variety, and velocity. Human bandwidth to do evaluations, not AI access, becomes the binding constraint.<\/p>\n<p><strong>3. Learning capture.<\/strong> The third question is \u201cHow do we ensure that this insight persists?\u201d When evaluation is not recorded, knowledge does not compound; it evaporates after each interaction. Learning capture converts single insights into organizational knowledge, such as documented criteria, updated prompts, and shared repositories of what worked and why. Think of it as version control for organizational judgment. Without it, evaluation is a one-time event. And learning capture alone (documentation without verification or evaluation upstream) produces nothing but organized noise.<\/p>\n<p>Those three steps dynamically reinforce one another. Better verification produces cleaner signals for evaluation. Better evaluation generates richer material for capture. Better capture improves the criteria used in the next round of verification. The cycle is the point.<\/p>\n<\/p>\n<p>There is yet another valuable and scalable learning dividend: Most experts cannot fully articulate what makes their judgment good. Forcing that judgment into written standards, such as the way developers write CLAUDE.md files that specify what \u201cgood\u201d code looks like, makes the tacit explicit for colleagues and for AI alike. The gap between what an LLM delivers and what the expert wanted surfaces unspoken knowledge. <\/p>\n<p>At Anthropic, Cherny gives the AI a way to verify its own work \u2014 a test suite, a browser check \u2014 before a human ever sees it. To evaluate the work\u2019s quality, he concurrently runs 10 to 15 Claude instances that generate swarms of smart subagents: One checks style while another hunts bugs, then a second cohort challenges the first for false positives. Capture is key: A CLAUDE.md file gathers mistakes, corrections, and design principles inside the workflow itself \u2014 not after its completion but while it is happening. Each new session inherits what every prior session learned. For Cherny and his developers, the benefits compound.<\/p>\n<p>There are analogous questions for leaders of other business functions: What is your equivalent of version control for organizational decisions? Of automated testing for new approaches? Of code review to make evaluation criteria explicit and shared? The \u201cverification-evaluation-learning capture\u201d flywheel offers both challenge and opportunity for managers and executives who want to use AI to do measurably more than simply cut costs and improve efficiencies.<\/p>\n<p>Consider a marketing team using AI to generate campaign briefs. Verification asks whether the brief meets basic brand standards, such as consistent tone, correct product claims, and regulation-compliant disclaimers. Automation is fast and cheap. Evaluation asks what the brief reveals: Did AI surface customer insights the team hadn\u2019t named? Did it miss the emotional register entirely? Are these insights \u201cactionable\u201d \u2014 meaning, can they trigger interactions and offers to cultivate relationships and\/or close deals? These judgments require a senior strategist, not a checklist. <\/p>\n<p>Learning capture asks whether that strategist\u2019s correction \u2014 \u201cOur brand never leads with product features; it leads with customer identity\u201d \u2014 gets written into a shared prompt template or brief standard for the whole team to use the next time. Without that last step, the strategist\u2019s insight dies with the session. With it, every subsequent brief starts smarter. And perhaps that brief becomes the charter for designing an intelligent marketing agent.<\/p>\n<p>The moment a CMO and\/or CFO builds dashboards around those questions and criteria, the organization has begun compounding.<\/p>\n<h3>When Verification Masquerades as Evaluation<\/h3>\n<p>The machinery requires a human who holds the loop open when every instinct says to close it.<\/p>\n<p>Jaana Dogan, a principal engineer at Google responsible for developer infrastructure on the Gemini API, ran a revealing experiment. She pointed Claude Code \u2014 a rival\u2019s tool \u2014 at a problem her team had spent many months solving. Given a short prompt with no proprietary Google data, Claude Code generated a design solution comparable to the one her team had landed on, along with a working prototype.<\/p>\n<p>Most managers, seeing that output, would just verify: \u201cDoes this match what we built? Close enough? Adopt or reject.\u201d Verification is fast, comfortable, and binary. It answers the question already in your head.<\/p>\n<p>Dogan did something different. She <a href=\"https:\/\/x.com\/rakyll\/status\/2007240188645581224\" target=\"_blank\" rel=\"noopener noreferrer\">decided<\/a>, \u201cIt\u2019s not perfect and I\u2019m iterating on it.\u201d <\/p>\n<p>Evaluation interrogates what the output reveals \u2014 about the problem, about your assumptions, and about what you haven\u2019t yet named. Dogan could do this because she had months of judgment to bring to the encounter. AI compressed the implementation; it could not compress the formation of expertise. Without that prior work, only two moves exist: Accept or reject. With it, a third move opens up: Stay in the encounter and learn.<\/p>\n<p>This is the distinction most organizations miss. They treat AI outputs as verdicts to be confirmed rather than starting points to be interrogated. The result is consumption dressed up as adoption \u2014 verification mistaken for the whole job.<\/p>\n<p>The implication: Deploy AI first in domains where your people already have deep expertise, not because AI needs hand-holding but because evaluation requires someone capable of recognizing what \u201cnot perfect\u201d actually means and knowing what iteration may reveal. The expert as evaluator is not a transitional role.<\/p>\n<\/p>\n<p>But Dogan\u2019s insight lives only in her head until infrastructure captures it. The question for any organization is not whether individual experts can hold loops open \u2014 some always will. It\u2019s whether the machinery exists to convert their judgment into shared knowledge that persists.<\/p>\n<p>That machinery is what most organizations lack. They have experts. Some even have experts with the right disposition. What they don\u2019t have is the infrastructure that makes compounding automatic rather than incidental.<\/p>\n<h3>Building the Capability<\/h3>\n<p>Translating these practices into infrastructure for business functions beyond software is the work that remains for leaders. This requires a minimum of five moves.<\/p>\n<p><strong>1. Preserve your company\u2019s evaluation expertise.<\/strong> To reap compound interest, you\u2019re dependent on people who can accurately evaluate AI output. This is domain expertise repositioned: the expert as evaluator rather than the expert as producer. Organizations that let people\u2019s deep expertise atrophy because \u201cAI can do that now\u201d will lose this very valuable capability.<\/p>\n<p><strong>2. Build verification mechanisms.<\/strong> As noted above, the cycle cannot begin without verification of output. Software verification is cheap: Code runs or it doesn\u2019t. Finance has moderate verification costs; models can be stress-tested against historical data, for example. Strategic planning has expensive verification costs: Long bets may not resolve for years. Most organizations treat expensive verification costs as a good reason not to start some work with AI tools. Instead, the smart move is doing <em>minimally viable verification<\/em>, the cheapest credible check that an AI output is not wrong. Consider multijudge systems that surface disagreement, and consistency checks that compare outputs across different formulations of the same problem. None of these guarantees correctness, but each offers enough verification to start the cycle. <\/p>\n<p><strong>3. Institute evaluation practices.<\/strong> Few organizations systematically evaluate AI outputs. After every significant AI interaction, users should ask three questions: What worked? What failed? What was interestingly wrong \u2014 wrong in a way that reveals something about the problem the team has not previously articulated? That third question is where hidden value lives. An output that fails in a way the expert noticed but had not yet named becomes new organizational knowledge: It is tacit expertise becoming explicit. People must be prompted to ask these questions as part of the existing workflow. Build evaluation into workflows to pave the way for value to compound.<\/p>\n<p><strong>4. Create capture systems.<\/strong> Evaluation without capture evaporates. Capture systems operate on two levels: inferential (learning from patterns in accumulated traces, the way AI learns from historical data) and explicit (recording human judgment in retrievable form). Both matter. A practical approach to both is lightweight infrastructure: decision journals that record not just what was decided but why; prompt repositories that preserve what worked and what failed instructively; and evaluation logs that make the team\u2019s evolving standards searchable. The design principle is retrievability, not comprehensiveness. A marketing team\u2019s capture system is a prompt library and a shared brief template. A finance team\u2019s is an annotated model log. Every function can build its equivalent of CLAUDE.md. Discipline, not cost or creativity, is the true constraint.<\/p>\n<p><strong>5. Measure the cycle, not just the output.<\/strong> Most organizations judge an AI deployment\u2019s success using measures like tools adopted, hours saved, or tasks completed. These are consumption metrics. Organizations trying to reap compound returns measure the cycle: How many interactions were verified? How many were evaluated? How much learning was captured? How quickly did captured learning change subsequent practice? Did your team leaders learn things from AI interactions last week that changed how they worked this week? If not, the cycle is not running.<\/p>\n<\/p>\n<h3>The Deeper Transformation<\/h3>\n<p>Leaders want to consume AI. They ask, \u201cHow do we produce faster, better, cheaper with AI?\u201d The new question is \u201cHow do we learn from what AI produces systematically, and at speed?\u201d<\/p>\n<p>Productivity in an era of generative AI is not output per unit of input. It is also determined by measurable learning per unit of interaction. Organizations that build the machinery to run the cycle \u2014 verify, evaluate, capture, apply \u2014 will build that capability over time. Those that do not will consume AI without converting it into knowledge. They\u2019ll be busy, perhaps, but not learning and not reaping compound benefits.<\/p>\n<p>Dogan\u2019s eight words embody this shift: \u201cIt\u2019s not perfect and I\u2019m iterating on it.\u201d She verified that the output was usable. She evaluated what it revealed. <\/p>\n<p>She is iterating; her learning is being applied to the next interaction. The compounding cycle is running. It is available to any organization willing to build the machinery that makes it possible.<\/p>\n<\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Carolyn Geason-Beissel\/MIT SMR | Minneapolis Institute of Art In domain  [&#8230;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"content-type":"","footnotes":""},"categories":[194],"tags":[],"class_list":["post-20703","post","type-post","status-publish","format-standard","hentry","category-graphic-design"],"acf":[],"_links":{"self":[{"href":"https:\/\/ideainthebox.com\/index.php\/wp-json\/wp\/v2\/posts\/20703","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ideainthebox.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ideainthebox.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ideainthebox.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ideainthebox.com\/index.php\/wp-json\/wp\/v2\/comments?post=20703"}],"version-history":[{"count":0,"href":"https:\/\/ideainthebox.com\/index.php\/wp-json\/wp\/v2\/posts\/20703\/revisions"}],"wp:attachment":[{"href":"https:\/\/ideainthebox.com\/index.php\/wp-json\/wp\/v2\/media?parent=20703"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ideainthebox.com\/index.php\/wp-json\/wp\/v2\/categories?post=20703"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ideainthebox.com\/index.php\/wp-json\/wp\/v2\/tags?post=20703"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}