I began thinking about creating an app with Laravel that allows users to interact primarily with AI agents. I’ve been exploring various design ideas, and recently, an interesting concept has come to mind. Let me share it with you.
Paga Fácil
The idea is that users can make payments in Venezuela without ever leaving the WhatsApp app. Compared to the current method, it should be easier and faster.
The user should simply write in natural language that they want to make a new payment, and the service will guide them through the payment step by step without leaving WhatsApp.
Under the hood, we use Cobra Fácil API, which allows us to make transfers from one account to another.
We use passwordless login; every time users log in to the platform, they receive a 6-digit code via email. Users must be authenticated on the platform to perform actions such as creating a payment method or updating their profile.
For now, we only support mobile payment, which is the allowed option by Cobra Fácil.
So, let’s start designing this app.
General Architecture
From the graph, this is the basic flow:
Users send a message to WhatsApp, and WhatsApp sends that message to our Laravel app via a webhook call.
We process the inbound message and send it out to a “Routing System”. The routing system defines to which “Agent” the message is sent.
The “Agent” processes the message; it may interact with other agents in the process. Then, it sends a response back to the WhatsApp service and the users.
We need a way to store the application’s state. For this, we are going to create a “conversations” and “conversation_messages” tables. Let’s see the data architecture now.
Data Architecture
User: This is the basic user, defined by Laravel.
Conversation: The conversation model has
User ID: Because it belongs to a user.
State: This is the current state of the conversation. A determined agent handles the conversation based on this state.
Is Active: To know if the conversation is active. Given we have only one chat, there should be only one active conversation.
Expires At: If the conversation has been active for more than a particular time, we should expire it.
Metadata: This column stores all relevant data in JSON format.
Conversation Message: This model has
Conversation ID: Because it belongs to a conversation.
Role: This is to define whether it is a message from the user or the assistant.
Direction: This defines whether the message is inbound or outbound.
Content: Content of the message.
Payment Method: This model specifies the bank account from which the user transfers the money.
Now, let’s take a closer look at the conversation’s flow.
Conversation Flow
When users start a new conversation, we direct them to Intent Detection, and the Intent Detection Agent manages this state.
Intent Detection Agent
This agent ensures we understand the user’s intent, and based on that, we route them to either Payment Data Collection or Customer Service state.
Here, we get the inbound message, and we pass it to an LLM to decipher the user intent. When the LLM knows what the user wants, it calls a function to transition to one of the allowed states mentioned earlier.
How does the LLM know what the allowed states are?
We create the StateGraph class to define all relations between the states. We call those “edges”. In this class, we can determine things like what the initial state of a conversation is, what state is terminal, what the neighbors of a specific state are, and if a state can transition to another state.
In this case, we pass the neighbor states Intent Detection state (Payment Data Collection and Customer Service).
If one of the intents is detected, we transition to the corresponding state. Since we want the next agent to respond to the user, we call the next agent directly.
Otherwise, the LLM will continue asking questions to understand the user’s intent.
Let’s say the user wants to make a payment. We’ll call the Payment Data Collection Agent then.
Payment Data Collection Agent
This agent gathers all the necessary information to make a payment. It asks questions like What amount is, Recipient name, Recipient phone number, Document ID, among other things.
Once it has all the information, it calls a function to store it in the conversation metadata and calls the next agent.
OTP Capture Agent
This agent retrieves the data from the previous agent and makes the first API call to Cobra Fácil, allowing the user’s bank to send a verification code to the user, ensuring they actually want to initiate a transfer from their bank account. Additionally, it sends a message to the user requesting that they provide this code. Then, we update the conversation state, but we don’t call the next agent. We need to wait for the user to give the OTP code so we can call the next agent.
Payment Execution Agent
With the OTP code and the payment information, this agent makes a second API call to Cobra Fácil to execute the payment. We update the conversation state and tell the user that the payment is in progress.
Payment Status Monitoring
We don’t handle this state like the others. In this case, we monitor the pending payment transaction from Cobra Fácil in a Laravel command. Once the transaction is processed, we update the conversation state and send back a message to the user with the successful payment details. Hence, the last two states are ‘Payment Failed’ and ‘Payment Success’.
Customer Service Agent
On the other hand, if the user asks questions about the service, this agent handles them. It simply responds to any questions the user has about the service.
The conversation automatically expires once the user completes a payment or a specified amount of time has passed without any interaction with the user.
And that’s it! As you can see, we can handle flows by designing a “state machine” where each node is an agent. This approach allows us to create highly flexible conversation flows.
Should I create a package to make these types of applications easier to build?