The AI industry is entering a new phase. For the past decade, the defining challenge was training larger and more capable models to produce useful answers. Now, attention is shifting toward inference: how to run those models reliably, cheaply, and at scale in real-world environments. NVIDIA’s launch of Dynamo at GTC 2026 — an open-source operating system designed specifically for inference workloads — reflected that shift. Google followed with inference-optimized TPUs, while Amazon, Google, Microsoft, and Meta are collectively expected to spend more than $600 billion on AI infrastructure this year alone.[12]
Yet despite the investment, most enterprise AI agents still struggle in production. According to Forrester, 88% of agent pilots fail to scale, with many breaking down not because the models cannot reason, but because they cannot reliably access the information needed to act.[10] That gap points to something specific about what the inference era actually demands. NVIDIA and the hyperscalers are racing to make models run faster and cheaper. A parallel race, quieter and arguably more consequential, is underway around grounding those models in reliable, current information at the moment they act. Running a model in production is not just a compute problem. It is a context problem — and search is emerging as one of the most critical yet least-discussed layers of the AI production stack determining if agents succeed or fail.
This article was contributed by Nicholas Choong, Insignia Ventures Academy Cohort 10 Venture Fellow and Development Partner at Enterprise Singapore. If you’re building next generation companies in AI and deeptech, you can reach out to him at nicholas.choong@insigniaacademy.vc.
From Copilots to Vertical Agents
During the first wave of the Generative AI boom, most of us experienced AI as a copilot. We prompted; it answered. Sometimes the answer was brilliant. Sometimes it was wrong. But in almost every case, the human remained in charge: reading, judging, editing, and deciding what to do next.
That relationship is beginning to change.
The next wave of AI is moving beyond copilots that assist human work toward AI agents that mimic human decision-making to independently solve complex tasks in real time. In practice, this means agents can proactively and autonomously retrieve information, reason through options, trigger workflows, coordinate with other agents, and make decisions.
This does not mean agents will replace entire human workflows overnight, however. According to McKinsey, the more realistic shift is that workflows will be progressively redesigned around human–AI collaboration: some tasks automated end-to-end, others accelerated through AI assistance, and still others routed back to humans for judgment [1]. In this emerging “agentic organization,” work and workflows become fundamentally “AI-first,” with humans selectively reintroduced where strategic direction, trust, or human interaction matters most.
That shift is already being adopted by enterprises. McKinsey’s 2025 State of AI survey found that 23% of enterprises are already scaling an agentic AI system in at least one business unit, while another 39% have begun experimenting with AI agents [2]. The next challenge, then, is not simply getting employees to use AI. It is embedding AI into the workflows where business value is created.
This is where agentic AI becomes more than a productivity tool.
The opportunity is not merely to automate isolated tasks. It is to reimagine entire workflows — the sequence of steps involving people, processes, and technology — where agents can operate at lower marginal cost and with less supervision than traditional software-enabled processes. As Bessemer Venture Partners notes, AI applications are increasingly targeting high-cost, repetitive, language-based work in sectors such as legal, healthcare, and finance — areas that legacy vertical software struggled to fully reach [3].
But if agents are expected to complete these workflows, they need more than general intelligence. They need to understand the environment in which the work happens: the relevant sources, approval paths, exceptions, risks, and thresholds for action.
Therefore, while the first wave of generative AI asked whether machines could produce useful answers, the next wave will ask whether they can take useful action.
When AI Agents Act, Mistakes Travel Further
As the old adage goes, “With great power comes great responsibility.” The promise of agentic AI is that agents can take work off human hands. However, the more authority we give AI, the more consequential their mistakes become.
In the first wave of generative AI, the human was usually the final checkpoint. If an AI-generated answer was incomplete, outdated, or wrong, the user would read it, challenge it, or edit it. The error lived inside a paragraph, a draft, or a recommendation. It was visible, reviewable, and often reversible.
Agents change that equation.
An agent does not simply produce content. It may autonomously retrieve information, query internal systems, compare options, update records, draft messages, trigger workflows, or escalate decisions. Once AI enters a workflow, its mistakes no longer stay confined to an output a human reviews. They can become inputs into the next step of a business process.
Consider a compliance agent monitoring regulatory changes for a financial-services firm. If it works from stale or incomplete information, it might miss a new enforcement notice, rely on superseded guidance, overlook a jurisdiction-specific filing change, or fail to flag a newly sanctioned counterparty. The recommendation may still sound professional and its reasoning may appear coherent. But the decision is compromised because the context is outdated.
That is the difference between a bad answer and a bad outcome.
This is why the hard part of agentic AI is reliability. As Andrew Ng, founder and CEO of Landing AI, cautions: agentic proofs of concept can often be built quickly, but making them robust enough for enterprise use requires much more evaluation and engineering work than many organizations expect [4].
If agents are going to reshape how work gets done, they must be dependable not only in controlled environments, but across the messy, interconnected real-world operating context. That shift introduces new risks, including chained vulnerabilities, cross-agent task escalation, synthetic-identity risk, untraceable data leakage, and data corruption propagation. Put simply, when agents can act across systems, one bad input, flawed handoff, or compromised agent identity can cascade through the workflow.
Yet governance is still catching up. In Deloitte’s 2026 State of AI in the Enterprise report, nearly 75% of companies reported that they were planning to deploy agentic AI within two years, but only 21% have a mature model for agent governance [5]. That gap matters. Enterprises are moving quickly toward agents, but many are still learning how to govern AI systems that retrieve information, act across business processes, and influence decisions.
The Bottleneck Is Moving From Models to Context
One natural response to the risks of agentic AI is to ask for better models. That instinct is understandable. Much of the last few years of AI progress has been framed around model capability: stronger reasoning, larger context windows, faster inference, lower costs, and better benchmarks.
But as AI moves deeper into workflows, the bottleneck begins to shift.
In many real-world settings, the limiting factor is not whether the model can reason in the abstract. It is whether the model has the right information to reason with. A capable model working from stale data, incomplete documents, or the wrong version of an internal policy can still produce a poor decision. Worse, because the output may be fluent and coherent, the failure may not be obvious at first.
This is why the next phase of AI will not be defined by the model alone. It will be defined by how effectively that model is connected to the context required to act: data, tools, workflows, feedback loops, and enterprise knowledge. Model APIs are only one layer of the stack. The real differentiator will be the surrounding systems that integrate AI into the operational fabric of the enterprise.
Context has two sides.
The first is continuity: what an agent should retain from the past. This is where memory matters. A useful agent should remember prior decisions, repeated workflows, enterprise rules, user preferences, and relevant history. Without memory, every interaction starts from zero.
The second is grounding: what an agent needs to know from the world now. This is where search matters. A useful agent must be able to retrieve current, relevant, and trustworthy information before it reasons or acts. Without fresh information, even a capable model can produce answers that sound coherent but are wrong.
Memory tells the agent what it has learned. Search tells the agent what it still needs to know.
Both are important. But grounding becomes especially critical as agents move into dynamic production environments, where the right answer may depend on information that changed yesterday, this morning, or five minutes ago.
Traditional Search is Not Good Enough for Agentic AI
If the next bottleneck is grounding, what happens when the primary way of finding information was built for people, not agents?
Traditional search engines like Google or Bing have been designed for human users. A person can type a query, scan results, open a few tabs, judge credibility, ignore irrelevant pages, and decide when to stop. Search engines do not need to understand the workflow behind the query. They only need to point users toward potentially relevant information.
Agents do not search that way.
When search becomes part of an agentic workflow, the output cannot simply be a ranked list of links. Agents cannot reliably act on links alone. They need fine-grained, usable context: the relevant passage, the source, the publication date, supporting evidence, and some indication that the information is reliable enough to inform the next action.
Consider the compliance agent from earlier. If it were asked to monitor sanctions exposure across a bank’s corporate customers, it could not simply search “latest sanctions updates” and return a list of articles. Before it can flag a risk, it needs to check official sanctions lists, match company names and aliases, distinguish similarly named entities, identify the relevant jurisdiction, compare the update against internal customer records, and decide whether to escalate. If the agent works from stale or incomplete information, it may miss a newly sanctioned entity or wrongly flag an unrelated customer. The problem is not that the answer sounds unprofessional. It is that the agent is reasoning from the wrong context.
This is not a single search query. It is a multi-step retrieval and reasoning process that traditional search systems were never designed to handle.
A human skimming through links can intuitively separate signal from noise. An agent, however, must do this systematically: retrieve pages, extract relevant content, compare sources, and structure the results for the next stage of reasoning. In complex workflows, this burden compounds quickly. Latency rises, extraction errors accumulate, and the model may end up reasoning from stale snippets, incomplete pages, or irrelevant text.
The cost compounds too. Every model has a finite context window. If an agent fills that window with navigation menus, ads, cookie banners, stale snippets, or duplicated text, it leaves less room for the evidence that matters. And because agents often search iteratively — refining prompts, cross-checking sources, and issuing follow-up queries as part of a reasoning chain — every unnecessary loop adds token cost and latency.
For agents to function reliably in real workflows, the search layer must do more than surface information. It must transform messy, fast-changing data into context the model can use: relevant, structured, current, and tied to sources. Traditional search was designed to help humans find information. Agentic search must help machines use it.
The Rise of Next-Gen Search Infrastructure
Unsurprisingly, a new infrastructure race is emerging around AI-native search.
The timing is not accidental. Just as agents are beginning to require fresh, citable, machine-readable information, the traditional search API market is becoming less accessible. Microsoft has retired its Bing Search APIs and is directing customers toward Grounding with Bing Search as part of Azure AI Agents. For developers, that shift matters because it narrows access to flexible, programmable web-search infrastructure at the very moment agents need richer ways to ground models in live information.
That gap is creating room for a new generation of search infrastructure companies that believe the search layer will become a critical part of the AI production stack itself.
Independent AI-native search players such as Tavily, Parallel, and Octen are competing from different angles to become the underlying retrieval layer for LLMs, agents, and enterprise applications. Instead of optimising for page rank, advertising, and click behaviour, they are optimising for what matters inside AI workflows: freshness, latency, structured outputs, citation quality, semantic relevance, and developer usability.
One battleground is speed. Octen claims its infrastructure supports sub-100 millisecond latency and million-level queries per second, with real-time indexing that can make data available within minutes [6]. These claims should be validated in production settings, but the strategic point is clear: in agentic workflows, search latency sits on the critical path. Slow retrieval delays downstream actions and increases token usage, tool-call volume, and infrastructure cost.
Another battleground is precision and provenance. The question is not simply whether a search system can find relevant information, but whether it can deliver evidence in a form that agents can use and enterprises can verify. Parallel is a useful example. Its web search and research APIs are built to return structured outputs with citations, relevant excerpts, reasoning traces, and confidence signals — features that become important when an agent’s output must be reviewed, audited, or used in a professional workflow. Parallel’s recent $100 million Series B at a $2 billion valuation, together with notable enterprise customers such as Harvey, Notion, and Clay, suggests that demand for agent-grade web infrastructure is already appearing in production-facing AI applications [7]. In these settings, live web context is not a one-time lookup; it becomes a recurring input into how work gets done.
These developments change the build-versus-buy calculus. If every agent team has to assemble its own retrieval layer — search, crawling, parsing, ranking, citations, fallbacks, and monitoring — deployment becomes slower, more expensive, and harder to scale. Next-gen search infrastructure offers a different path: reusable retrieval systems that let teams spend less time rebuilding plumbing and more time designing the workflows where agents create value.
Nebius’s agreement to acquire Tavily points in the same direction. Better known for cloud infrastructure and high-performance inference, Nebius is acquiring the young agentic search company Tavily in a deal valued at up to $400 million [8]. While the headline was striking, the more critical story is architectural.
For years, the AI race was defined by models, chatbots, and compute. Now attention is shifting to the connective tissue around models: the systems that help agents find reliable information, verify it, and act on it. Tavily fills that gap by adding real-time web access to Nebius’s production inference platform, helping agents ground their reasoning in current information. Nebius is thus betting that agent platforms will need an integrated stack that combines reasoning, retrieval, grounding, and governance.
The strategic implication is clear: search is moving from an add-on to critical infrastructure for AI production. As agents take on more complex work, the challenge is no longer simply accessing information, but accessing the right information quickly and reliably. If next-gen search infrastructure succeeds, it will make agents less dependent on brittle retrieval pipelines and more capable of operating in real workflows. Search becomes not just how agents find information, but how they understand the world well enough to act.
The Winners of the Agentic Search Race
The rise of next-gen search infrastructure does not mean the winners are already clear. Over the next two years, the category may be shaped by who can solve the operational problems that make search usable in production.
The first test is content access. The most valuable information on the web is not always freely available or legally straightforward to use. Publishers are already pushing back against AI search providers, and Perplexity’s Comet Plus initiative — including a $42.5 million publisher pool and an 80% revenue share for participating publishers — points to one possible path [9]: search providers that turn content owners into partners may gain more durable access to premium information. Those that cannot may face higher costs, narrower coverage, or legal risk.
The second test is platform independence. If OpenAI, Google, Microsoft, or other large platforms bundle “good enough” search directly into model runtimes, many developers may default to the built-in option. Independent providers will need to prove why they still matter. Their case will likely rest on factors like choice, transparency, and cost control. In high-stakes workflows, many enterprises will not want a single model platform to become their only source of truth.
The third test is trust by design. Once agents retrieve information and act on it, search becomes part of the risk surface. Retrieved content may contain prompt injection attacks, sensitive information may be exposed, and excessive agency can turn a bad retrieval decision into a business incident. For agentic search companies, citations, provenance, permissions, audit trails, and observability will not be enterprise add-ons. They will be core product requirements.
The fourth test is economics. Agents search at machine scale. A single task may trigger multiple sub-queries, retries, extractions, comparisons, and verification steps before the model produces an answer or takes action. Each step adds latency, token usage, and infrastructure cost. The future category leaders will not simply make agents search more. They will make agents search more selectively: knowing when to retrieve, when to stop, and how to return only the context the model needs.
Taken together, these tests suggest that the winners in agentic search will not simply be the companies that retrieve the most information. They will be the companies that solve the grounding problem most effectively: delivering information that is fresh, relevant, permission-aware, traceable, and usable inside real workflows.
As foundation models become more capable and widely available, that capability will matter even more. If many companies can access similar models, advantage will shift to the infrastructure around those models: the data integrations, retrieval systems, permissions, memory, evaluation, and governance that turn a general model into a vertical agent.
The future of AI may therefore be shaped not only by who builds the smartest models, but by who builds the systems that help those models understand the world well enough to act in it.
References
[1] Sukharevsky, Alexander, et al. “The Agentic Organization: Contours of the Next Paradigm for the AI Era.” McKinsey, September 26, 2025. Link
[2] Singla, Alex, et al. “The State of AI in 2025: Agents, Innovation, and Transformation.” McKinsey, November 5, 2025. Link
[3] Wade, Janelle Teng, et al. “AI Infrastructure Roadmap: Five Frontiers for 2026.” Bessemer Venture Partners, March 30, 2026. Link
[4] Whitten, Chuck. “Winning in the Agentic Era: A Conversation with Andrew Ng.” Bain, April 28, 2026. Link
[5] Rowan, Jim, et al. “The State of AI in the Enterprise — 2026 AI Report.” Deloitte, January 2026. Link
[6] “Octen Sets New Global Benchmark for Search Infrastructure.” Yahoo! Finance, April 22, 2026. Link
[7] “Parallel Raises at $2 Billion Valuation to Scale Web Infrastructure for Agents.” Yahoo! Finance, April 29, 2026. Link
[8] CTech. “What Nebius Really Bought When It Bought Tavily.” Calcalist Tech, February 11, 2026. Link
[9] ”Introducing Comet Plus.” Perplexity, August 25, 2025. Link
[10] ”AI Agent Adoption 2026: 120+ Enterprise Data Points.” Digital Applied, April 19, 2026. Link
[11] ”Why AI’s Next Phase Will Likely Demand More Computational Power, Not Less.” Deloitte Technology, Media & Telecom Predictions 2026. Link
[12] ”NVIDIA Enters Production With Dynamo, the Broadly Adopted Inference Operating System for AI Factories.” NVIDIA Investor Relations, March 2026. Link
[13] ”AI Infrastructure at Next ’26.” Google Cloud Blog, April 2026. Link
Nicholas Choong is a Venture Fellow with Insignia Ventures Academy and a Development Partner at Enterprise Singapore, where he works at the intersection of startups, frontier technologies, and innovation ecosystem development. If you are building ambitious AI or deeptech companies shaping the future, feel free to reach him at nicholas.choong@insigniaacademy.vc.