GenAI fashions are good at a handful of duties resembling textual content summarization, query answering, and code technology. When you have a enterprise course of which will be damaged down right into a set of steps, and a number of these steps entails one among these GenAI superpowers, then it is possible for you to to partially automate your corporation course of utilizing GenAI. We name the software program utility that automates such a step an agent.
Whereas brokers use LLMs simply to course of textual content and generate responses, this primary functionality can present fairly superior conduct resembling the flexibility to invoke backend providers autonomously.
Let’s say that you simply wish to construct an agent that is ready to reply questions resembling “Is it raining in Chicago?”. You can not reply a query like this utilizing simply an LLM as a result of it isn’t a activity that may be carried out by memorizing patterns from giant volumes of textual content. As a substitute, to reply this query, you’ll want to succeed in out to real-time sources of climate info.
There’s an open and free API from the US Nationwide Climate Service (NWS) that gives the short-term climate forecast for a location. Nonetheless, utilizing this API to reply a query like “Is it raining in Chicago?” requires a number of extra steps (see Determine 1):
- We might want to arrange an agentic framework to coordinate the remainder of these steps.
- What location is the consumer eager about? The reply in our instance sentence is “Chicago”. It’s not so simple as simply extracting the final phrase of the sentence — if the consumer had been to ask “Is Orca Island sizzling as we speak?”, the placement of curiosity could be “Orca Island”. As a result of extracting the placement from a query requires having the ability to perceive pure language, you’ll be able to immediate an LLM to establish the placement the consumer is eager about.
- The NWS API operates on latitudes and longitudes. If you need the climate in Chicago, you’ll need to convert the string “Chicago” into some extent latitude and longitude after which invoke the API. That is known as geocoding. Google Maps has a Geocoder API that, given a spot title resembling “Chicago”, will reply with the latitude and longitude. Inform the agent to make use of this device to get the coordinates of the placement.
- Ship the placement coordinates to the NWS climate API. You’ll get again a JSON object containing climate information.
- Inform the LLM to extract the corresponding climate forecast (for instance, if the query is about now, tonight, or subsequent Monday) and add it to the context of the query.
- Based mostly on this enriched context, the agent is ready to lastly reply the consumer’s query.
Let’s undergo these steps one after the other.
First, we’ll use Autogen, an open-source agentic framework created by Microsoft. To comply with alongside, clone my Git repository, get API keys following the instructions supplied by Google Cloud and OpenAI. Swap to the genai_agents folder, and replace the keys.env file along with your keys.
GOOGLE_API_KEY=AI…
OPENAI_API_KEY=sk-…
Subsequent, set up the required Python modules utilizing pip:
pip set up -r necessities.txt
It will set up the autogen module and consumer libraries for Google Maps and OpenAI.
Comply with the dialogue beneath by ag_weather_agent.py.
Autogen treats agentic duties as a dialog between brokers. So, step one in Autogen is to create the brokers that can carry out the person steps. One would be the proxy for the end-user. It is going to provoke chats with the AI agent that we are going to seek advice from because the Assistant:
user_proxy = UserProxyAgent("user_proxy",
code_execution_config={"work_dir": "coding", "use_docker": False},
is_termination_msg=lambda x: autogen.code_utils.content_str(x.get("content material")).discover("TERMINATE") >= 0,
human_input_mode="NEVER",
)
There are three issues to notice concerning the consumer proxy above:
- If the Assistant responds with code, the consumer proxy is able to executing that code in a sandbox.
- The consumer proxy terminates the dialog if the Assistant response incorporates the phrase TERMINATE. That is how the LLM tells us that the consumer query has been absolutely answered. Making the LLM do that is a part of the hidden system immediate that Autogen sends to the LLM.
- The consumer proxy by no means asks the end-user any follow-up questions. If there have been follow-ups, we’d specify the situation beneath which the human is requested for extra enter.
Regardless that Autogen is from Microsoft, it isn’t restricted to Azure OpenAI. The AI assistant can use OpenAI:
openai_config = {
"config_list": [
{
"model": "gpt-4",
"api_key": os.environ.get("OPENAI_API_KEY")
}
]
}
or Gemini:
gemini_config = {
"config_list": [
{
"model": "gemini-1.5-flash",
"api_key": os.environ.get("GOOGLE_API_KEY"),
"api_type": "google"
}
],
}
Anthropic and Ollama are supported as effectively.
Provide the suitable LLM configuration to create the Assistant:
assistant = AssistantAgent(
"Assistant",
llm_config=gemini_config,
max_consecutive_auto_reply=3
)
Earlier than we wire the remainder of the agentic framework, let’s ask the Assistant to reply our pattern question.
response = user_proxy.initiate_chat(
assistant, message=f"Is it raining in Chicago?"
)
print(response)
The Assistant responds with this code to succeed in out an current Google net service and scrape the response:
```python
# filename: climate.py
import requests
from bs4 import BeautifulSoup
url = "https://www.google.com/search?q=climate+chicago"
response = requests.get(url)
soup = BeautifulSoup(response.textual content, 'html.parser')
weather_info = soup.discover('div', {'id': 'wob_tm'})
print(weather_info.textual content)
```
This will get on the energy of an agentic framework when powered by a frontier foundational mannequin — the Assistant has autonomously discovered an online service that gives the specified performance and is utilizing its code technology and execution functionality to supply one thing akin to the specified performance! Nonetheless, it’s not fairly what we needed — we requested whether or not it was raining, and we received again the complete web site as an alternative of the specified reply.
Secondly, the autonomous functionality doesn’t actually meet our pedagogical wants. We’re utilizing this instance as illustrative of enterprise use instances, and it’s unlikely that the LLM will learn about your inside APIs and instruments to have the ability to use them autonomously. So, let’s proceed to construct out the framework proven in Determine 1 to invoke the precise APIs we wish to use.
As a result of extracting the placement from the query is simply textual content processing, you’ll be able to merely immediate the LLM. Let’s do that with a single-shot instance:
SYSTEM_MESSAGE_1 = """
Within the query beneath, what location is the consumer asking about?
Instance:
Query: What is the climate in Kalamazoo, Michigan?
Reply: Kalamazoo, Michigan.
Query:
"""
Now, after we provoke the chat by asking whether or not it’s raining in Chicago:
response1 = user_proxy.initiate_chat(
assistant, message=f"{SYSTEM_MESSAGE_1} Is it raining in Chicago?"
)
print(response1)
we get again:
Reply: Chicago.
TERMINATE
So, step 2 of Determine 1 is full.
Step 3 is to get the latitude and longitude coordinates of the placement that the consumer is eager about. Write a Python perform that can known as the Google Maps API and extract the required coordinates:
def geocoder(location: str) -> (float, float):
geocode_result = gmaps.geocode(location)
return (spherical(geocode_result[0]['geometry']['location']['lat'], 4),
spherical(geocode_result[0]['geometry']['location']['lng'], 4))
Subsequent, register this perform in order that the Assistant can name it in its generated code, and the consumer proxy can execute it in its sandbox:
autogen.register_function(
geocoder,
caller=assistant, # The assistant agent can counsel calls to the geocoder.
executor=user_proxy, # The consumer proxy agent can execute the geocder calls.
title="geocoder", # By default, the perform title is used because the device title.
description="Finds the latitude and longitude of a location or landmark", # An outline of the device.
)
Observe that, at the time of writing, perform calling is supported by Autogen just for GPT-4 fashions.
We now broaden the instance within the immediate to incorporate the geocoding step:
SYSTEM_MESSAGE_2 = """
Within the query beneath, what latitude and longitude is the consumer asking about?
Instance:
Query: What is the climate in Kalamazoo, Michigan?
Step 1: The consumer is asking about Kalamazoo, Michigan.
Step 2: Use the geocoder device to get the latitude and longitude of Kalmazoo, Michigan.
Reply: (42.2917, -85.5872)
Query:
"""
Now, after we provoke the chat by asking whether or not it’s raining in Chicago:
response2 = user_proxy.initiate_chat(
assistant, message=f"{SYSTEM_MESSAGE_2} Is it raining in Chicago?"
)
print(response2)
we get again:
Reply: (41.8781, -87.6298)
TERMINATE
Now that now we have the latitude and longitude coordinates, we’re able to invoke the NWS API to get the climate information. Step 4, to get the climate information, is just like geocoding, besides that we’re invoking a special API and extracting a special object from the online service response. Please have a look at the code on GitHub to see the complete particulars.
The upshot is that the system immediate expands to embody all of the steps within the agentic utility:
SYSTEM_MESSAGE_3 = """
Comply with the steps within the instance beneath to retrieve the climate info requested.
Instance:
Query: What is the climate in Kalamazoo, Michigan?
Step 1: The consumer is asking about Kalamazoo, Michigan.
Step 2: Use the geocoder device to get the latitude and longitude of Kalmazoo, Michigan.
Step 3: latitude, longitude is (42.2917, -85.5872)
Step 4: Use the get_weather_from_nws device to get the climate from the Nationwide Climate Service on the latitude, longitude
Step 5: The detailed forecast for tonight reads 'Showers and thunderstorms earlier than 8pm, then showers and thunderstorms doubtless. A number of the storms might produce heavy rain. Principally cloudy. Low round 68, with temperatures rising to round 70 in a single day. West southwest wind 5 to eight mph. Likelihood of precipitation is 80%. New rainfall quantities between 1 and a pair of inches attainable.'
Reply: It is going to rain tonight. Temperature is round 70F.
Query:
"""
Based mostly on this immediate, the response to the query about Chicago climate extracts the suitable info and solutions the query accurately.
On this instance, we allowed Autogen to pick out the following agent within the dialog autonomously. We are able to additionally specify a special next speaker selection strategy: specifically, setting this to be “guide” inserts a human within the loop, and permits the human to pick out the following agent within the workflow.
The place Autogen treats agentic workflows as conversations, LangGraph is an open supply framework that means that you can construct brokers by treating a workflow as a graph. That is impressed by the lengthy historical past of representing information processing pipelines as directed acyclic graphs (DAGs).
Within the graph paradigm, our climate agent would look as proven in Determine 2.
There are a number of key variations between Figures 1 (Autogen) and a pair of (LangGraph):
- In Autogen, every of the brokers is a conversational agent. Workflows are handled as conversations between brokers that discuss to one another. Brokers bounce into the dialog once they imagine it’s “their flip”. In LangGraph, workflows are handled as a graph whose nodes the workflow cycles by means of primarily based on guidelines that we specify.
- In Autogen, the AI assistant will not be able to executing code; as an alternative the Assistant generates code, and it’s the consumer proxy that executes the code. In LangGraph, there’s a particular ToolsNode that consists of capabilities made out there to the Assistant.
You may comply with alongside this part by referring to the file lg_weather_agent.py in my GitHub repository.
We arrange LangGraph by creating the workflow graph. Our graph consists of two nodes: the Assistant Node and a ToolsNode. Communication inside the workflow occurs by way of a shared state.
workflow = StateGraph(MessagesState)
workflow.add_node("assistant", call_model)
workflow.add_node("instruments", ToolNode(instruments))
The instruments are Python features:
@device
def latlon_geocoder(location: str) -> (float, float):
"""Converts a spot title resembling "Kalamazoo, Michigan" to latitude and longitude coordinates"""
geocode_result = gmaps.geocode(location)
return (spherical(geocode_result[0]['geometry']['location']['lat'], 4),
spherical(geocode_result[0]['geometry']['location']['lng'], 4))
instruments = [latlon_geocoder, get_weather_from_nws]
The Assistant calls the language mannequin:
mannequin = ChatOpenAI(mannequin='gpt-3.5-turbo', temperature=0).bind_tools(instruments)
def call_model(state: MessagesState):
messages = state['messages']
response = mannequin.invoke(messages)
# This message will get appended to the present record
return {"messages": [response]}
LangGraph makes use of langchain, and so altering the mannequin supplier is easy. To make use of Gemini, you’ll be able to create the mannequin utilizing:
mannequin = ChatGoogleGenerativeAI(mannequin='gemini-1.5-flash',
temperature=0).bind_tools(instruments)
Subsequent, we outline the graph’s edges:
workflow.set_entry_point("assistant")
workflow.add_conditional_edges("assistant", assistant_next_node)
workflow.add_edge("instruments", "assistant")
The primary and final strains above are self-explanatory: the workflow begins with a query being despatched to the Assistant. Anytime a device is named, the following node within the workflow is the Assistant which is able to use the results of the device. The center line units up a conditional edge within the workflow, for the reason that subsequent node after the Assistant will not be mounted. As a substitute, the Assistant calls a device or ends the workflow primarily based on the contents of the final message:
def assistant_next_node(state: MessagesState) -> Literal["tools", END]:
messages = state['messages']
last_message = messages[-1]
# If the LLM makes a device name, then we path to the "instruments" node
if last_message.tool_calls:
return "instruments"
# In any other case, we cease (reply to the consumer)
return END
As soon as the workflow has been created, compile the graph after which run it by passing in questions:
app = workflow.compile()
final_state = app.invoke(
{"messages": [HumanMessage(content=f"{system_message} {question}")]}
)
The system message and query are precisely what we employed in Autogen:
system_message = """
Comply with the steps within the instance beneath to retrieve the climate info requested.
Instance:
Query: What is the climate in Kalamazoo, Michigan?
Step 1: The consumer is asking about Kalamazoo, Michigan.
Step 2: Use the latlon_geocoder device to get the latitude and longitude of Kalmazoo, Michigan.
Step 3: latitude, longitude is (42.2917, -85.5872)
Step 4: Use the get_weather_from_nws device to get the climate from the Nationwide Climate Service on the latitude, longitude
Step 5: The detailed forecast for tonight reads 'Showers and thunderstorms earlier than 8pm, then showers and thunderstorms doubtless. A number of the storms might produce heavy rain. Principally cloudy. Low round 68, with temperatures rising to round 70 in a single day. West southwest wind 5 to eight mph. Likelihood of precipitation is 80%. New rainfall quantities between 1 and a pair of inches attainable.'
Reply: It is going to rain tonight. Temperature is round 70F.
Query:
"""
query="Is it raining in Chicago?"
The result’s that the agent framework makes use of the steps to give you a solution to our query:
Step 1: The consumer is asking about Chicago.
Step 2: Use the latlon_geocoder device to get the latitude and longitude of Chicago.
[41.8781, -87.6298]
[{"number": 1, "name": "This Afternoon", "startTime": "2024–07–30T14:00:00–05:00", "endTime": "2024–07–30T18:00:00–05:00", "isDaytime": true, …]
There's a probability of showers and thunderstorms after 8pm tonight. The low will probably be round 73 levels.
Between Autogen and LangGraph, which one must you select? A couple of concerns:
In fact, the extent of Autogen help for non-OpenAI fashions and different tooling might enhance by the point you might be studying this. LangGraph might add autonomous capabilities, and Autogen might present you extra fine-grained management. The agent house is shifting quick!
- ag_weather_agent.py: https://github.com/lakshmanok/lakblogs/blob/main/genai_agents/ag_weather_agent.py
- lg_weather_agent.py: https://github.com/lakshmanok/lakblogs/blob/main/genai_agents/lg_weather_agent.py
This text is an excerpt from a forthcoming O’Reilly e-book “Visualizing Generative AI” that I’m writing with Priyanka Vergadia. All diagrams on this submit had been created by the writer.