July 31, 2024

AI Agent Observability with Langfuse

Easily monitor, trace and debug your AI agents. Explore tools like LangGraph, Llama Agents, Dify, Flowise, and Langflow, and see how Langfuse helps to monitor and optimize your application.

Jannik Maierhöfer

What are AI Agents?

An AI agent is a system that autonomously performs tasks by planning its task execution and utilizing available tools. AI Agents leverage large language models (LLMs) to understand and respond to user inputs step-by-step and decide when to call external tools.

To solve tasks, agents use:

planning by devising step-by-step actions from the given task
tools to extend their capabilities like RAG, external APIs, or code interpretation/execution
memory to store and recall past interactions for additional contextual information

What are Agents Used For?

Common use cases include:

Customer Support: AI agents use RAG to automate responses, autonomously take action and efficiently handle inquiries with accurate information.
Market Research: Agents collect and synthesize information from various sources, delivering accurate and concise summaries to users.
Software Development: AI agents break coding tasks into smaller sub-tasks and then recombine them to create a complete solution.

Design Patterns of AI Agents

An AI agent usually consists of 5 parts: A language model with general-purpose capabilities that serves as the main brain or coordinator, and four sub-modules: a planning module to divide the task into smaller steps, an action module that enables the agent to use external tools, a memory module to store and recall past interactions and a profile module, to describe the behavior of the agent.

AI Agent Design

In single-agent setups, one agent is responsible for solving the entire task autonomously. In multi-agent setups, multiple specialized agents collaborate, each handling different aspects of the task to achieve a common goal more efficiently. These agents are also often referred to as state-based or stateful agents as they route the task through different states.

Single and Multi Agent Designs

What is AI Agent Observability?

Observing agents means tracking and analyzing the performance, behavior, and interactions of AI agents. This includes real-time monitoring of multiple LLM calls, control flows, decision-making processes, and outputs to ensure agents operate efficiently and accurately.

Langfuse is an open-source LLM engineering platform that provides deep insights into metrics such as latency, cost, and error rates, enabling developers to debug, optimize, and enhance their AI systems. Using Langfuse observability, teams can identify and resolve issues, streamline workflows, and maintain high-quality outputs by evaluating agent responses in complex, multi-step AI agents.

Why AI Agent Observability is Important

Debugging and Edge Cases

Agents use multiple steps to solve complex tasks, and inaccurate intermediary results can cause failures of the entire system. Tracing these intermediate steps and testing your application on known edge cases is essential.

When deploying LLMs, some edge cases will always slip through in initial testing. A proper analytics set-up helps identify these cases, allowing you to add them to future test sets for more robust agent evaluations. With Datasets, Langfuse allows you to collect examples of inputs and expected outputs to benchmark new releases before deployment. Datasets can be incrementally updated with new edge cases found in production and integrated with existing CI/CD pipelines.

Tradeoff of Accuracy and Costs

LLMs are stochastic by nature, meaning they are a statistical process that can produce errors or hallucinations. Calling language models multiple times while selecting the best or most common answer can increase accuracy. This can be a major advantage of using agentic workflows.

However, this comes with a cost. The tradeoff between accuracy and costs in LLM-based agents is crucial, as higher accuracy often leads to increased operational expenses. Often, the agent decides autonomously how many LLM calls or paid external API calls it needs to make to solve a task, potentially leading to high costs for single-task executions. Therefore, it is important to monitor model usage and costs in real-time.

Langfuse monitors both costs and accuracy, enabling you to optimize your application for production.

Understanding User Interactions

AI agents analytics allows you to capture how users interact with your LLM applications. This information is crucial for refining your AI application and tailoring responses to better meet user needs.

Langfuse Analytics derives insights from production data, helping you measure quality through user feedback and model-based scoring over time and across different versions. It also allows you to monitor cost and latency metrics in real-time, broken down by user, session, geography, and model version, enabling precise optimizations for your LLM application.

Tools to build AI Agents

You do not need any specific tools to build AI agents. However, there are several open-source frameworks that can help you build complex, stateful, multi-agent applications.

Application Frameworks

LangGraph

LangGraph (GitHub (opens in a new tab)) is an open-source framework by the LangChain team for building complex, stateful, multi-agent applications. LangGraph includes built-in persistence to save and resume state, which enables error recovery and human-in-the-loop workflows.

LangGraph agents can be monitored with Langfuse to observe and debug the steps of an agent.

LangGraph Example

Example of an agent setup created with LangGraph and visualized with Mermaid, featuring two executing agents: One research agent and one agent to tell the current time.

Example Trace in Langfuse

Example traces in Langfuse (view trace (opens in a new tab))

Llama Agents

Llama Agents (GitHub (opens in a new tab)) is an open-source framework designed to simplify the process of building, iterating, and deploying multi-agent AI systems and turn your agents into production microservices.

Langfuse offers a simple integration for automatic capture of traces and metrics generated in LlamaIndex applications.

Llama Agent Example

Llama agent system layout.

No-code Agent Builders

For prototypes and development by non-developers, no-code builders can be a great starting point.

Flowise

Flowise (GitHub (opens in a new tab)) is a no-code builder. It lets you build customized LLM flows with a drag-and-drop editor. With the native Langfuse integration, you can use Flowise to quickly create complex LLM applications in no-code and then use Langfuse to analyse and improve them.

Flowise Example

Example of a catalog chatbot created in Flowise to answer any questions related to shop products.

Langflow

Langflow (GitHub (opens in a new tab)) is a UI for LangChain, designed with react-flow to provide an effortless way to experiment and prototype flows.

With the native integration, you can use Langflow to quickly create complex LLM applications in no code and then use Langfuse to monitor and debug them.

Langflow Example

Example of a chat agent with chain-of-thought reasoning built in Langflow by Cobus Greyling (opens in a new tab).

Dify

Dify (GitHub (opens in a new tab)) is an open-source LLM app development platform. Using their Agent Builder nd variety of templates, you can easily build an AI agent and then grow it into a more complex system via Dify workflows.

With the native Langfuse integration, you can use Dify to quickly create complex LLM applications and then use Langfuse to monitor and improve them.

Dify Example

Example of a Dify Agent that summarizes meetings.

Get Started

If you want to get started with building your AI Agents and monitoring them with Langfuse, check out our end-to-end example of building a simple agent with LangGraph and tracking it with Langfuse.