Building Production grade AI Agents - Part 3

Contents

Building Production grade AI Agents – Part 1 Building Production grade AI Agents – Part 2

Read Part 1 & 2 here

Building Production grade AI Agents – Part 1

Building Production grade AI Agents – Part 2

Over the past month at Breakout, we’ve been busy building and shipping features while welcoming new customers. Amid this organised chaos, I’m excited to share one of my favorite topics in this whole series: leveraging the Actor model to build a comprehensive multi-agent framework.

In our previous posts, we explored the capabilities of large language models and laid the groundwork for building reliable Agentic workflows. Today, I’m taking a deeper dive by introducing a novel architecture that combines these ideas with the Actor model—an approach that brings modular decision-making and robust state management to distributed systems, to build powerful multi-agentic frameworks for your production AI agents.

The Actor model is a computational paradigm designed for handling concurrency and distributed processing. Introduced by Carl Hewitt in 1973, it treats each Actor as an independent unit that encapsulates its state and interacts solely through asynchronous message passing. This means:

Isolation of State: Each Actor maintains its own state, which is never shared directly with others.
Asynchronous Communication: Actors exchange information only by passing messages, reducing the risk of race conditions.
Scalability and Fault Tolerance: Since Actors operate independently, systems built on this model can scale effortlessly and remain resilient under load.

Representation of Actor Framework in an E-commerce system

I was first introduced to this Actor framework while building a Kafka consumer framework with Akka in my early career. Later, at Dream11—India’s largest fantasy sports platform—Using the Actor model, we designed and built the world’s largest multiplayer join engine, capable of handling multiple transactions and validations in a distributed system in under 50ms at a concurrency of 20M join RPM 🤯.

But this post is not about building a high scale system. Sure, that’s one big advantage of the framework, but that’s not exactly the reason why we chose it for our next generation AI Agentic architecture.

What truly captivates me about the Actor model is its simplicity and its ability to directly address complex system challenges:

Modularity: It’s straightforward to build systems where decision-making is broken down into manageable, independent components.
- In an AI Agentic System, this let’s us easily build complex intelligent workflows with more freedom without having to worry about it being a monolithic behemoth which is impossible to debug and scale. Code modularity FTW.
Message Driven: Actor Systems are inherently Message Driven, which blends well with how Most AI Agentic systems work.
Robust State Management: Each Actor handles its own state, making the overall system more reliable and easier to maintain..
- As we discussed in the last post, Proper State management is important in building reliable Agentic Systems, by keeping states local to an Actor we avoid unintended race conditions, and otherwise Intelligence bleed
Scalability: The lightweight, decoupled nature of Actors makes them ideal for high-concurrency environments.
- We can easily build DAG like workflows, parallelising independent tasks to achieve high speed. Most Agentic Systems are painfully slow, But an Actor based reactive system would be as fast as it could get.
- Don’t believe me? Try out the Breakout agent on our website 😉

In the remainder of this post, I’ll walk you through the architecture and the benefits it brings to our workflows.

As mentioned earlier, an actor maintains its own memory to manage its state and has the capability to spawn new actors to handle additional work. Crucially, actors communicate exclusively via message passing, ensuring a clean separation of concerns and state isolation. In our framework, we’ve encapsulated this functionality in an Actor class with its own attached memory.

Our communication model between actors (or between an actor and a controller) is built around two primary methods:

Ask: This method sends a message to an actor and awaits a response. It represents a blocking call where the sender expects a result before proceeding.
Tell: This method sends a message without waiting for a response, allowing the sender to immediately move on to its next task.

These two patterns let us create both synchronous and asynchronous workflows to effectively coordinate complex tasks.

Below is a sample base class for our Actor model in python. We use asyncio library to manage synchronous and asynchronous tasks.

T = TypeVar("T")


class Actor(Generic[T], ABC):
    def __init__(
        self,
        initial_memory: Optional[T],
        actor_id: Optional[str] = None,
    ):
        """
        Initialise actor with either new or persisted memory.
        If no actor_id is provided, generates a new UUID.

        Args:
            initial_memory: Default memory if no persisted state exists
            actor_id: Optional identifier to retrieve persisted memory
        """
        self._id: str = actor_id or str(uuid.uuid4())
        self._memory: T = initial_memory or T()

    @property
    def id(self) -> str:
        """
        Get the actor's identifier.

        Returns:
            str: The actor's unique identifier
        """
        return self._id

    @property
    def memory(self) -> T:
        return self._memory

    def _load_memory(self, actor_id: str) -> T:
        """
        Load persisted memory for this actor.
        Must be implemented by concrete actors.

        Args:
            actor_id: The identifier for the persisted memory
        Returns:
            T: The loaded memory object
        """
        pass

    @abstractmethod
    async def on_receive(self, message: Any):
        """
        Handle incoming messages and orchestrate tasks.
        Must be implemented by concrete actors.

        Args:
            message: The message to be processed
        Returns:
            Any: The result of processing the message
        """
        pass

    async def ask(self, message: Any):
        """
        Blocking call to get a result from the actor.

        Args:
            message: The message to be processed
        Returns:
            Any: The result from on_receive
        """
        return await self.on_receive(message)

    def tell(self, message: Any):
        """
        Non-blocking call to send a message to the actor.

        Args:
            message: The message to be processed
        """
        asyncio.create_task(self.on_receive(message))

    def get_memory(self) -> T:
        """
        Accessor for the actor's local memory.

        Returns:
            T: The actor's memory
        """
        return self._memory

    def set_memory(self, memory: T):
        """
        Mutator for the actor's local memory.

        Args:
            memory: The new memory object
        """
        self._memory = memory

    def persist_memory(self, external_storage: Any):
        """
        Store the actor's memory externally.
        Must be implemented by persistant actors.

        Args:
            external_storage: The storage mechanism to persist memory
        """
        pass

In production systems, error handling is crucial. In an actor-based framework:

Supervision Trees: Similar to Akka, we can implement a supervision strategy where parent actors monitor their children and restart them upon failures.
Graceful Degradation: Actors can implement fallback mechanisms or retries to handle transient errors without compromising the overall system stability. This is especially useful in an LLM context, incase of inference failures.
Back Propagation: An Actor can Propagate it’s status to it’s parent and ultimately to the source, which means the Client can always be aware of what is happening.

The modular decision-making provided by the Actor model makes it an excellent match for workflows involving large language models (LLMs). By isolating state and using asynchronous messaging, you can:

Enhance Modular Workflows: Each component (or agent) can focus on a specific task, such as preprocessing, inference, or post-processing.
Improve Fault Tolerance: Isolated state management minimises the risk of cascading failures in complex AI workflows.

With these principles in mind, you can design a multi-agent framework to create workflows or ReACT agents with each actor having well-defined responsibilities. Imagine creating an AI coder where:

One actor is responsible for planning the project.
Another actor manages modules and files.
A different actor writes the code.
Yet another actor reviews the code.
One more actor tests the code.
And finally, an actor deploys the code.

By separating these responsibilities, you build an extremely powerful and robust system capable of delivering high-quality outcomes—arguably the most important metric in any AI agent product. This division of labor not only enhances clarity and maintainability but also enables parallel development and quicker iteration cycles, ensuring each component can be optimised independently.

The Actor model is more than just a concurrency mechanism—it’s a powerful architectural paradigm that enables modular, scalable, and fault-tolerant systems. By combining it with agentic workflows and modern LLMs, you can build robust AI-driven applications that are both easy to manage and highly performant.

I hope this deep dive helps you see the full potential of the Actor model and implement it in your next AI project. As always, your feedback and questions are welcome—feel free to share your thoughts in the comments!

At Breakout, We are putting these ideas into practice and building the smartest AI Sales Rep in the world. If this sounds like a challenging task to you, Do reach out to me at ashfakh@getbreakout.ai

Building Production grade AI Agents – Part 3

Building Production grade AI Agents – Part 1

Building Production grade AI Agents – Part 2

Leave a Reply Cancel reply