It’s clear: 2023 was the year of AI. Beginning with the release of ChatGPT, it was a technological revolution. What began as interacting agents quickly started moving to indexing documents (RAG), and now, indexing documents, connecting to data sources, and enabling data analysis with a simple sentence.

With the success of ChatGPT, a lot of people promised last year to deliver large language models (LLMs) soon … and very few of those promises have been fulfilled. Some of the important reasons for that are:

We are building AI agents, not LLMs
People are treating the problem as a research problem, not an engineering problem
Bad data

In this blog, we’ll examine the role of AI agents as a way to link LLMs with backend systems. Then, we’ll look at how the use of intuitive, interactive semantics to comprehend user intent is setting up AI agents as the next generation of user interface and user experience (UI/UX). Finally, with upcoming AI agents in software, we’ll talk about why we need to bring back some principles of software engineering that people seem to have forgotten in the past few months.
I Want a Pizza in 20 Minutes

LLMs offer a more intuitive, streamlined approach to UI/UX interactions compared to traditional point-and-click methods. To illustrate this, suppose you want to order a “gourmet margherita pizza delivered in 20 minutes” through a food delivery app.

This seemingly straightforward request can trigger a series of complex interactions in the app, potentially spanning several minutes of interactions using normal UI/UX. For example, you would probably have to choose the “Pizza” category, search for a restaurant with appetizing pictures, check if they have margherita pizza, and then find out whether they can deliver quickly enough—as well as backtrack if any of your criteria aren’t met. This flowchart expresses the interaction with the app.

We Need More than LLMs

LLMs are AI models trained on vast amounts of textual data, enabling them to understand and generate remarkably accurate human-like language. Models such as OpenAI’s GPT-3 have demonstrated exceptional abilities in natural language processing, text completion, and even generating coherent and contextually relevant responses.

Although more recent LLMs can do data analysis, summary, and representation, the ability to connect external data sources, algorithms, and specialized interfaces to an LLM gives it even more flexibility. This can enable it to perform tasks that involve analysis of domain-specific real-time data, as well as open the door to tasks not yet possible with today’s LLMs.

This “pizza” example illustrates the complexity of natural language processing (NLP) techniques. Even this relatively simple request necessitates connecting with multiple backend systems, such as databases of restaurants, inventory management systems, delivery tracking systems, and more. Each of these connections contributes to the successful execution of the order.

Furthermore, the connections required may vary depending on the request. The more flexibility you want the system to understand and recognize, the more connections to different backend systems will need to be made. This flexibility and adaptability in establishing connections is crucial to accommodate diverse customer requests and ensure a seamless experience
AI Agents

LLMs serve as the foundation for AI agents. To respond to a diverse range of queries, an AI agent leverages an LLM in conjunction with several integral auxiliary components:

The agent core uses the LLM and orchestrates the agent’s overall functionality.
The memory module enables the agent to make context-aware decisions.
The planner formulates the agent’s course of action based on the tools at hand.
Various tools and resources support specific domains, enabling the AI agent to effectively process data, reason, and generate appropriate responses. The set of tools include data sources, algorithms, and visualizations (or UI interactions).

Agent core

The agent core is the “brain” of the AI agent, managing decision-making, communication, and coordination of modules and subsystems to help the agent operate seamlessly and interact efficiently with its environment or tasks.

The agent core receives inputs, processes them, and generates actions or responses. It also maintains a representation of the agent’s knowledge, beliefs, and intentions to guide its reasoning and behavior. Finally, the core oversees the update and retrieval of information from the agent’s memory to help it make relevant, context-based decisions
Memory

The memory module encompasses history memory and context memory components, which store and manage data the AI agent can use to simultaneously apply past experiences and current context to inform its decision-making.

History memory stores records of previous inputs, outputs, and outcomes. These records let the agent learn from past interactions and gain insights into effective strategies and patterns that help it make better-informed decisions and avoid repeating mistakes.

Context memory, meanwhile, enables the agent to interpret and respond appropriately to the specific, current circumstances using information about the environment, the user’s preferences or intentions, and many other contextual factors
Planner

The planner component analyzes the state of the agent’s environment, constraints, and factors such as goals, objectives, resources, rules, and dependencies to determine the most effective steps to achieve the desired outcome.

Here’s an example of a prompt template the planner could use, according to Nvidia:

GENERAL INSTRUCTIONS

You are a domain expert. Your task is to break down a complex question into simpler sub-parts. If you cannot answer the question, request a helper or use a tool. Fill with Nil where no tool or helper is required.

AVAILABLE TOOLS

– Search Tool

– Math Tool

CONTEXTUAL INFORMATION

<information from Memory to help LLM to figure out the context around question>

USER QUESTION

“How to order a margherita pizza in 20 min in my app?”

ANSWER FORMAT

“sub-questions”:[“<FILL>”]

Using this, the planner could generate a plan to serve as a roadmap for the agent’s actions, enabling it to navigate complex problems and strategically accomplish its goals
Tools

Various other tools help the AI agent perform specific tasks or functions. For example:

Retrieval-augmented generation (RAG) tools enable the agent to retrieve and use knowledge base content to generate coherent, contextually appropriate responses.
Database connections allow the AI agent to query and retrieve relevant information from structured data sources to inform decisions or responses.
Natural language processing (NLP) libraries offer text tokenization, named entity recognition, sentiment analysis, language modeling, and other functionality.
Machine learning (ML) frameworks enable the agent to leverage ML techniques such as supervised, unsupervised, or reinforcement learning to enhance its capabilities.
Visualization tools help the agent represent and interpret data or outputs visually, and can help the agent understand and analyze patterns, relationships, or trends in the data.
Simulation environments provide a virtual environment where the agent can sharpen its skills, test strategies, and evaluate potential outcomes without affecting the real world.
Monitoring and logging frameworks facilitate the tracking and recording of agent activities, performance metrics, or system events to help evaluate the agent’s behavior, identify potential issues or anomalies, and support debugging and analysis.
Data preprocessing tools use techniques like data cleaning, normalization, feature selection, and dimensionality reduction to ensure raw data is relevant and high-quality before the agent ingests it.
Evaluation frameworks provide methodologies and metrics that enable the agent to measure its successes, compare approaches, and iterate on its capabilities.

These and other tools empower AI agents with functionality and resources to perform specific tasks, process data, make informed decisions, and enhance their overall capabilities
Adding LLM-based Intelligent Agents to Your Data Is an Engineering Problem, Not a Research Problem

People realized that natural language can make it much easier and forgiving (not to say relaxed) to specify use cases required for software development. Because the English language can be ambiguous and imprecise, this is leading to a new problem in software development, where systems are not well specified or understood.

Fred Brooks outlined many central software engineering principles in his 1975 book The Mythical Man-Month, some of which people seem to have forgotten during the LLM rush. For instance:

No silver bullet. This is the first principle people have forgotten with LLMs. They believe LLMs are the silver bullet that will eliminate the need for proper software engineering practices.

The second-system effect. LLM-based systems are being considered a second system because people treat LLMs as so powerful that they can forget LLM limitations.

The tendency toward an irreducible number of errors. Even if you get the LLM implementation correct, LLMs can hallucinate or even expose additional errors that have been hidden because of lack of a way to exercise the backend in ways we have not been able to in the past.

Progress tracking. I remember the first thing I heard from Brooks’ book was, “How does a project get to be a year late? One day at a time.” I have seen people assuming that if they sweep problems under the rug they will disappear. Machine learning models, and LLMs in particular, inherit the same problems of ill-designed systems with the addition of amplification of bad data, which we will describe later.

Conceptual integrity. This problem has shifted from designing the use cases (or user stories) so that they show the conceptual integrity of the entire system to saying the LLM will bind any inconsistencies in the software magically. For example, if you want to have a user story that solves the order of a food app “I want to order a gourmet margherita pizza in 20 min”, by changing the question to:

Can I get a gourmet margherita pizza delivered in 20 minutes?
Show me all pizza places that can deliver a gourmet margherita pizza in 20 minutes.
Show me all pizza places that can deliver a gourmet margherita pizza in 20 minutes ranked by user preference.

We can easily see that different types of data, algorithms, and visualizations are required to address this problem.

The manual and formal documents. Thanks to hype, this is probably the most forgotten principle in the age of LLMs. It’s not enough to say “develop a system that will tell me how to order things like a gourmet margherita pizza in 20 minutes.” This requires documentation of a whole array of other use cases, required backend systems, new types of visualizations to be created, and—crucially—specifications of what the system will not do. “Things like” seems to have become a norm in LLM software development, as if an LLM can magically connect to backend systems and visualize data it has never learned to understand.

The pilot system. Because of these limitations, software systems with LLM based intelligent agents have not left the pilot stage in several companies simply because they are not able to reason beyond simple questions used as “example of use cases.”

In a recent paper, we addressed the first issue of lack of proper specification of software systems, and showed a way we can create formal specifications for LLM-based intelligent systems, in a way that they can follow sound software engineering principles
Bad Data

In a recent post on LinkedIn, we described the importance of “librarians” to LLM-based intelligent agents. (Apparently, this post was misunderstood, as several teachers and actual librarians liked the post.) We were referring to the need to use more formal data organization and writing methodologies to ensure LLM-based intelligent agents work.

The cloud fulfilled its promise of not requiring us to delete data, just letting us store it. With this came the pressure to quickly create user documentation. This created a “data dump,” where old data lives with new data, where old specifications that were never implemented are still alive, where outdated descriptions of system functionalities persist, having never been updated in the documentation. Finally, documents seem to have forgotten what a “topic sentence” is.

LLM-based systems expect documentation to have well-written text, as recently shown when OpenAI stated that it is “impossible” to train AI without using copyrighted works. This alludes not only to the fact that we need a tremendous amount of text to train these models, but also that good quality text is required.

This becomes even more important if you use RAG-based technologies. In RAG, we index document chunks (for example, using embedding technologies in vector databases), and whenever a user asks a question, we return the top ranking documents to a generator LLM that in turn composes the answer. Needless to say, RAG technology requires well-written indexed text to generate the answers.

RAG pipeline, according to https://arxiv.org/abs/2005.1140
Conclusions

We have shown that there is an explosion of LLM-based promises in the field. Very few are coming to fruition. It is time that in order to build AI intelligent systems we need to consider we are building complex software engineering systems, not prototypes.

LLM-based intelligent systems bring another level of complexity to system design. We need to consider up to what extent we need to specify and test such systems properly, and we need to treat data as a first-class citizen, as these intelligent systems are much more susceptible to bad data than other systems  

We Need More than LLMs

LLMs are AI models trained on vast amounts of textual data, enabling them to understand and generate remarkably accurate human-like language. Models such as OpenAI’s GPT-3 have demonstrated exceptional abilities in natural language processing, text completion, and even generating coherent and contextually relevant responses.

Although more recent LLMs can do data analysis, summary, and representation, the ability to connect external data sources, algorithms, and specialized interfaces to an LLM gives it even more flexibility. This can enable it to perform tasks that involve analysis of domain-specific real-time data, as well as open the door to tasks not yet possible with today’s LLMs.

This “pizza” example illustrates the complexity of natural language processing (NLP) techniques. Even this relatively simple request necessitates connecting with multiple backend systems, such as databases of restaurants, inventory management systems, delivery tracking systems, and more. Each of these connections contributes to the successful execution of the order.

Furthermore, the connections required may vary depending on the request. The more flexibility you want the system to understand and recognize, the more connections to different backend systems will need to be made. This flexibility and adaptability in establishing connections is crucial to accommodate diverse customer requests and ensure a seamless experience

 It’s clear: 2023 was the year of AI. Beginning with the release of ChatGPT, it was a technological revolution. What began as interacting agents quickly started moving to indexing documents (RAG), and now, indexing documents, connecting to data sources, and enabling data analysis with a simple sentence.

With the success of ChatGPT, a lot of people promised last year to deliver large language models (LLMs) soon … and very few of those promises have been fulfilled. Some of the important reasons for that are:

We are building AI agents, not LLMs
People are treating the problem as a research problem, not an engineering problem
Bad data

In this blog, we’ll examine the role of AI agents as a way to link LLMs with backend systems. Then, we’ll look at how the use of intuitive, interactive semantics to comprehend user intent is setting up AI agents as the next generation of user interface and user experience (UI/UX). Finally, with upcoming AI agents in software, we’ll talk about why we need to bring back some principles of software engineering that people seem to have forgotten in the past few months.
I Want a Pizza in 20 Minutes

LLMs offer a more intuitive, streamlined approach to UI/UX interactions compared to traditional point-and-click methods. To illustrate this, suppose you want to order a “gourmet margherita pizza delivered in 20 minutes” through a food delivery app.

This seemingly straightforward request can trigger a series of complex interactions in the app, potentially spanning several minutes of interactions using normal UI/UX. For example, you would probably have to choose the “Pizza” category, search for a restaurant with appetizing pictures, check if they have margherita pizza, and then find out whether they can deliver quickly enough—as well as backtrack if any of your criteria aren’t met. This flowchart expresses the interaction with the app.

We Need More than LLMs

LLMs are AI models trained on vast amounts of textual data, enabling them to understand and generate remarkably accurate human-like language. Models such as OpenAI’s GPT-3 have demonstrated exceptional abilities in natural language processing, text completion, and even generating coherent and contextually relevant responses.

Although more recent LLMs can do data analysis, summary, and representation, the ability to connect external data sources, algorithms, and specialized interfaces to an LLM gives it even more flexibility. This can enable it to perform tasks that involve analysis of domain-specific real-time data, as well as open the door to tasks not yet possible with today’s LLMs.

This “pizza” example illustrates the complexity of natural language processing (NLP) techniques. Even this relatively simple request necessitates connecting with multiple backend systems, such as databases of restaurants, inventory management systems, delivery tracking systems, and more. Each of these connections contributes to the successful execution of the order.

Furthermore, the connections required may vary depending on the request. The more flexibility you want the system to understand and recognize, the more connections to different backend systems will need to be made. This flexibility and adaptability in establishing connections is crucial to accommodate diverse customer requests and ensure a seamless experience
AI Agents

LLMs serve as the foundation for AI agents. To respond to a diverse range of queries, an AI agent leverages an LLM in conjunction with several integral auxiliary components:

The agent core uses the LLM and orchestrates the agent’s overall functionality.
The memory module enables the agent to make context-aware decisions.
The planner formulates the agent’s course of action based on the tools at hand.
Various tools and resources support specific domains, enabling the AI agent to effectively process data, reason, and generate appropriate responses. The set of tools include data sources, algorithms, and visualizations (or UI interactions).

Agent core

The agent core is the “brain” of the AI agent, managing decision-making, communication, and coordination of modules and subsystems to help the agent operate seamlessly and interact efficiently with its environment or tasks.

The agent core receives inputs, processes them, and generates actions or responses. It also maintains a representation of the agent’s knowledge, beliefs, and intentions to guide its reasoning and behavior. Finally, the core oversees the update and retrieval of information from the agent’s memory to help it make relevant, context-based decisions
Memory

The memory module encompasses history memory and context memory components, which store and manage data the AI agent can use to simultaneously apply past experiences and current context to inform its decision-making.

History memory stores records of previous inputs, outputs, and outcomes. These records let the agent learn from past interactions and gain insights into effective strategies and patterns that help it make better-informed decisions and avoid repeating mistakes.

Context memory, meanwhile, enables the agent to interpret and respond appropriately to the specific, current circumstances using information about the environment, the user’s preferences or intentions, and many other contextual factors
Planner

The planner component analyzes the state of the agent’s environment, constraints, and factors such as goals, objectives, resources, rules, and dependencies to determine the most effective steps to achieve the desired outcome.

Here’s an example of a prompt template the planner could use, according to Nvidia:

GENERAL INSTRUCTIONS

You are a domain expert. Your task is to break down a complex question into simpler sub-parts. If you cannot answer the question, request a helper or use a tool. Fill with Nil where no tool or helper is required.

AVAILABLE TOOLS

– Search Tool

– Math Tool

CONTEXTUAL INFORMATION

<information from Memory to help LLM to figure out the context around question>

USER QUESTION

“How to order a margherita pizza in 20 min in my app?”

ANSWER FORMAT

“sub-questions”:[“<FILL>”]

Using this, the planner could generate a plan to serve as a roadmap for the agent’s actions, enabling it to navigate complex problems and strategically accomplish its goals
Tools

Various other tools help the AI agent perform specific tasks or functions. For example:

Retrieval-augmented generation (RAG) tools enable the agent to retrieve and use knowledge base content to generate coherent, contextually appropriate responses.
Database connections allow the AI agent to query and retrieve relevant information from structured data sources to inform decisions or responses.
Natural language processing (NLP) libraries offer text tokenization, named entity recognition, sentiment analysis, language modeling, and other functionality.
Machine learning (ML) frameworks enable the agent to leverage ML techniques such as supervised, unsupervised, or reinforcement learning to enhance its capabilities.
Visualization tools help the agent represent and interpret data or outputs visually, and can help the agent understand and analyze patterns, relationships, or trends in the data.
Simulation environments provide a virtual environment where the agent can sharpen its skills, test strategies, and evaluate potential outcomes without affecting the real world.
Monitoring and logging frameworks facilitate the tracking and recording of agent activities, performance metrics, or system events to help evaluate the agent’s behavior, identify potential issues or anomalies, and support debugging and analysis.
Data preprocessing tools use techniques like data cleaning, normalization, feature selection, and dimensionality reduction to ensure raw data is relevant and high-quality before the agent ingests it.
Evaluation frameworks provide methodologies and metrics that enable the agent to measure its successes, compare approaches, and iterate on its capabilities.

These and other tools empower AI agents with functionality and resources to perform specific tasks, process data, make informed decisions, and enhance their overall capabilities
Adding LLM-based Intelligent Agents to Your Data Is an Engineering Problem, Not a Research Problem

People realized that natural language can make it much easier and forgiving (not to say relaxed) to specify use cases required for software development. Because the English language can be ambiguous and imprecise, this is leading to a new problem in software development, where systems are not well specified or understood.

Fred Brooks outlined many central software engineering principles in his 1975 book The Mythical Man-Month, some of which people seem to have forgotten during the LLM rush. For instance:

No silver bullet. This is the first principle people have forgotten with LLMs. They believe LLMs are the silver bullet that will eliminate the need for proper software engineering practices.

The second-system effect. LLM-based systems are being considered a second system because people treat LLMs as so powerful that they can forget LLM limitations.

The tendency toward an irreducible number of errors. Even if you get the LLM implementation correct, LLMs can hallucinate or even expose additional errors that have been hidden because of lack of a way to exercise the backend in ways we have not been able to in the past.

Progress tracking. I remember the first thing I heard from Brooks’ book was, “How does a project get to be a year late? One day at a time.” I have seen people assuming that if they sweep problems under the rug they will disappear. Machine learning models, and LLMs in particular, inherit the same problems of ill-designed systems with the addition of amplification of bad data, which we will describe later.

Conceptual integrity. This problem has shifted from designing the use cases (or user stories) so that they show the conceptual integrity of the entire system to saying the LLM will bind any inconsistencies in the software magically. For example, if you want to have a user story that solves the order of a food app “I want to order a gourmet margherita pizza in 20 min”, by changing the question to:

Can I get a gourmet margherita pizza delivered in 20 minutes?
Show me all pizza places that can deliver a gourmet margherita pizza in 20 minutes.
Show me all pizza places that can deliver a gourmet margherita pizza in 20 minutes ranked by user preference.

We can easily see that different types of data, algorithms, and visualizations are required to address this problem.

The manual and formal documents. Thanks to hype, this is probably the most forgotten principle in the age of LLMs. It’s not enough to say “develop a system that will tell me how to order things like a gourmet margherita pizza in 20 minutes.” This requires documentation of a whole array of other use cases, required backend systems, new types of visualizations to be created, and—crucially—specifications of what the system will not do. “Things like” seems to have become a norm in LLM software development, as if an LLM can magically connect to backend systems and visualize data it has never learned to understand.

The pilot system. Because of these limitations, software systems with LLM based intelligent agents have not left the pilot stage in several companies simply because they are not able to reason beyond simple questions used as “example of use cases.”

In a recent paper, we addressed the first issue of lack of proper specification of software systems, and showed a way we can create formal specifications for LLM-based intelligent systems, in a way that they can follow sound software engineering principles
Bad Data

In a recent post on LinkedIn, we described the importance of “librarians” to LLM-based intelligent agents. (Apparently, this post was misunderstood, as several teachers and actual librarians liked the post.) We were referring to the need to use more formal data organization and writing methodologies to ensure LLM-based intelligent agents work.

The cloud fulfilled its promise of not requiring us to delete data, just letting us store it. With this came the pressure to quickly create user documentation. This created a “data dump,” where old data lives with new data, where old specifications that were never implemented are still alive, where outdated descriptions of system functionalities persist, having never been updated in the documentation. Finally, documents seem to have forgotten what a “topic sentence” is.

LLM-based systems expect documentation to have well-written text, as recently shown when OpenAI stated that it is “impossible” to train AI without using copyrighted works. This alludes not only to the fact that we need a tremendous amount of text to train these models, but also that good quality text is required.

This becomes even more important if you use RAG-based technologies. In RAG, we index document chunks (for example, using embedding technologies in vector databases), and whenever a user asks a question, we return the top ranking documents to a generator LLM that in turn composes the answer. Needless to say, RAG technology requires well-written indexed text to generate the answers.

RAG pipeline, according to https://arxiv.org/abs/2005.1140
Conclusions

We have shown that there is an explosion of LLM-based promises in the field. Very few are coming to fruition. It is time that in order to build AI intelligent systems we need to consider we are building complex software engineering systems, not prototypes.

LLM-based intelligent systems bring another level of complexity to system design. We need to consider up to what extent we need to specify and test such systems properly, and we need to treat data as a first-class citizen, as these intelligent systems are much more susceptible to bad data than other systems