MIT and Asari AI’s EnCompass Framework Shows How AI Agents Can Search Smarter Using Large Language Models

Artificial intelligence agents are quickly becoming practical tools for real-world work. Researchers, software engineers, and even business leaders are increasingly relying on AI agents powered by large language models (LLMs) to automate tasks, explore ideas, and solve complex problems. But while LLMs are powerful, they are also unpredictable. They can make mistakes, produce inconsistent outputs, or take unhelpful paths while reasoning through a task.

To address this challenge, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and Asari AI have developed a new framework called EnCompass, designed to make AI agents far more reliable, flexible, and efficient by adding structured search directly into how these agents run.

What EnCompass Is and Why It Matters

At its core, EnCompass is a programming framework for AI agents that allows developers to automatically explore multiple possible execution paths when an AI model’s output is uncertain. Instead of committing to the first response an LLM produces, EnCompass enables agents to backtrack, retry, and search for better solutions at inference time.

This is particularly important because many AI agents rely on LLMs for critical steps such as writing code, translating programs, planning tasks, or generating intermediate reasoning. When an LLM makes an error midway through a workflow, traditional agent implementations require developers to manually write complex logic to detect failures, rewind execution, and try again. That added logic can be just as large and difficult as the original program itself.

EnCompass removes that burden from developers, allowing them to focus on defining what the agent should do, while the framework handles how to search for the best outcome.

Separating Workflow Logic From Search Strategy

One of the most important ideas behind EnCompass is the clean separation between an agent’s workflow and its search strategy.

In most AI agents today, the logic for retrying or exploring alternatives is tightly coupled with the main program. If developers want to switch from simple retries to a more advanced method like beam search or Monte Carlo tree search, they often need to rewrite large portions of their code.

With EnCompass, the agent’s workflow is written once, and the search strategy is defined separately. Developers simply annotate parts of the program where outcomes may vary, such as calls to an LLM. These annotated points are known as branchpoints.

Once branchpoints are defined, EnCompass compiles the agent into a search space representing all possible execution paths the program could take depending on different LLM outputs. The framework then applies a chosen search strategy to explore that space and find the most successful path.

How EnCompass Handles Uncertainty in AI Agents

EnCompass is built around a programming model known as probabilistic angelic nondeterminism. In simple terms, this approach treats uncertain operations—like LLM calls—as choices that can lead to multiple futures. Instead of assuming one output is final, EnCompass assumes that a better outcome might exist and actively searches for it.

This makes an AI agent’s execution resemble a choose-your-own-adventure structure, where different decisions lead to different outcomes. EnCompass navigates this structure using established search techniques, such as:

Beam search, which keeps the most promising options at each step
Monte Carlo tree search, which balances exploration and exploitation
Other customizable or built-in strategies depending on the task

Crucially, these strategies can be swapped in and out without rewriting the agent’s main logic.

Significant Reductions in Coding Effort

One of the most striking findings from the researchers’ experiments is just how much coding effort EnCompass saves.

In a case study involving an agent that translates entire code repositories from Java to Python, implementing search logic manually required hundreds of lines of additional code. When the same agent was implemented using EnCompass, the researchers found that coding effort dropped by up to 80 percent. In concrete terms, EnCompass eliminated 348 lines of code that would otherwise have been necessary to handle retries, scoring, and backtracking.

This reduction makes it much easier for developers to experiment with different agent designs and optimization strategies without getting bogged down in engineering complexity.

Measurable Performance Improvements

Beyond saving time, EnCompass also improves results.

By enabling structured search over multiple LLM outputs, the framework significantly boosts task accuracy. In the code translation experiments, the researchers tested several search strategies and found that a two-level beam search delivered the best overall performance.

Using this approach, agents achieved accuracy improvements ranging from 15 to 40 percent across five different code repositories. These gains were observed at a search budget equivalent to 16 times the number of LLM calls made by an agent without search, highlighting how intelligent exploration can outperform naive single-pass execution.

Where EnCompass Works Best

EnCompass is designed for program-in-control agents, where a traditional program defines the high-level workflow and uses LLMs for specific steps. This includes tasks such as:

Translating large codebases
Discovering transformation rules in structured data
Automating multi-step software engineering workflows

However, the framework is less applicable to fully autonomous agents where an LLM decides every step on the fly without a predefined program structure. In those cases, there is no underlying workflow for EnCompass to augment with search and backtracking.

Broader Implications for AI Development

As LLMs become deeply embedded in everyday software, the need for systematic ways to manage their unpredictability is growing. EnCompass represents an important shift toward treating AI agent execution as a search problem rather than a single-shot inference task.

By allowing developers to explore multiple reasoning paths, EnCompass improves reliability while preserving flexibility. This approach could be especially valuable in areas such as:

Managing massive and evolving code libraries
Designing and executing scientific experiments
Developing complex hardware systems and simulations
Supporting human-AI collaboration in creative and technical domains

Experts in the field have noted that this clean abstraction between logic and search provides a strong foundation for building more dependable AI systems.

What Comes Next for EnCompass

The researchers plan to expand EnCompass to support more general search frameworks and to test it on increasingly complex real-world tasks. They are also exploring how the system can help AI agents collaborate more effectively with humans, particularly in brainstorming and large-scale engineering projects.

As AI agents continue to reshape workflows across industries, tools like EnCompass offer a practical way to harness the strengths of large language models while minimizing their weaknesses.

Research Paper:
https://arxiv.org/abs/2512.03571