The AI Agent is an innovative way to harness the power of LLM. It provides additional tools and capabilities that empower the LLMs to perform web searches, retrieve data from databases, and invoke general services, among other functions. LangChain offers a well-designed and adaptable implementation of AI agents.
In this post, I would like to share how the AI agent is implemented and structured in the LangChain project. You can find more information on the LangChain project LangChain.
The Agent Module
The Agent Definition
An agent is composed of:
- A
LLMChain
type member represents the large language model to be called. - An output parser to parse the result of tool execution.
- A list to hold all available tools.
The key functionality of an agent is the plan()
method, it calls the llm_chain
to give the next tool to be executed, then the tool is executed and a result is parsed by the output_parser
.
Represent the Action of the Agent
To represent the tool to be executed by the agent, and to encapsulate any necessary data for the agent to execute a tool, AgentAction
, AgentStep
, and AgentFinish
are provided:
AgentAction
represents the action to do in the next step which is planned by the llm. To be specific, it is the next tool to be invoked.AgentStep
encapsulates the result of tool execution.AgentFinish
denotes the termination of the entire process. The user’s task is considered complete when anAgentFinish
object is returned by the agent.
The Output Parser
The output parse should implement the abstract parse()
method which is used to get the result of a tool’s invocation.
The Tool Module
The Tool
class represents a tool.
The Implementation of AgentExecutor
LangChain uses the idea AgentExecutor
to actually chain together all components: Agent
, Tools
, OutputParser
, etc. The tools are added and managed by the AgentExecutor
, the iteration of calling the llm for task planning -> the agent picks up the right tool and runs the tool with the parameters given by the llm -> parse the result and feed it to the llm again
is all implemented and controlled by the AgentExecutor
.
At the first place, let’s see how the AgentExecutor
is structured:
As you can see from the uml graph:
- The
AgentExecutor
holds theAgent
object, which is used to make decisions by calling the llm (by invoke theplan()
method defined by theAgent
). Tool
s are registered toAgentExecutor
.- The
max_iterations
andmax_execution_time
are used to control the iteration. - The actual agent execution is defined and implemented in the member function in
AgentExecutor
.
And now let’s have a look at how the AgentExecutor
works under the hood:
def _call(
self,
inputs: Dict[str, str],
run_manager: Optional[CallbackManagerForChainRun] = None,
) -> Dict[str, Any]:
"""Run text through and get agent response."""
# Construct a mapping of tool name to tool for easy lookup
name_to_tool_map = {tool.name: tool for tool in self.tools}
# We construct a mapping from each tool to a color, used for logging.
color_mapping = get_color_mapping(
[tool.name for tool in self.tools], excluded_colors=["green", "red"]
)
intermediate_steps: List[Tuple[AgentAction, str]] = []
# Let's start tracking the number of iterations and time elapsed
iterations = 0
time_elapsed = 0.0
start_time = time.time()
# We now enter the agent loop (until it returns something).
while self._should_continue(iterations, time_elapsed):
next_step_output = self._take_next_step(
name_to_tool_map,
color_mapping,
inputs,
intermediate_steps,
run_manager=run_manager,
)
if isinstance(next_step_output, AgentFinish):
return self._return(
next_step_output, intermediate_steps, run_manager=run_manager
)
intermediate_steps.extend(next_step_output)
if len(next_step_output) == 1:
next_step_action = next_step_output[0]
# See if tool should return directly
tool_return = self._get_tool_return(next_step_action)
if tool_return is not None:
return self._return(
tool_return, intermediate_steps, run_manager=run_manager
)
iterations += 1
time_elapsed = time.time() - start_time
output = self.agent.return_stopped_response(
self.early_stopping_method, intermediate_steps, **inputs
)
return self._return(output, intermediate_steps, run_manager=run_manager)
Codes explained as follows:
- It initializes an empty list for storing intermediate steps and variables for tracking the number of iterations and time elapsed. The start time is recorded using
time.time()
. - The method enters a loop that continues until the
_should_continue
method returnsFalse
. This method checks whether the agent should continue based on the number of iterations and time elapsed. - Within the loop, the
_take_next_step
method is called, which performs the next action in the agent’s process. It calls the agent to ask the llm to give the tool to be executed and run the tool. The result should be parsed by theoutput_parse
hold by the agent into anAgentStep
or anAgentFinish
. - If the output of the next step is an instance of
AgentFinish
, the_return
method is called, which likely finalizes the agent’s process and returns the result. - If the agent hasn’t finished, the output of the next step is added to the intermediate steps. If there’s only one step in the output, the
_get_tool_return
method is called to check if the tool should return directly. If it should, the_return
method is called. - The number of iterations is incremented, and the time elapsed is updated.
- If the loop ends without returning a result, the
return_stopped_response
method of the agent is called, and its output is returned using the_return
method.
Overall Structure
Summary
In this post, we have delved into the implementation details of the LangChain project’s AI agent framework. We have discussed key concepts of an agent, how these components are interconnected, and the code details of the execution iteration.