Language models in generative AI – does size matter?

While SLMs may be useful for smaller generative AI applications or edge AI deployments, there is another field where they have great potential. Agentic AI, the latest iteration of generative AI, uses multiple agents trained to fulfill specific tasks in order to produce results. The aim here is to create and support a process from beginning to end with multiple, specialized agents. Whereas LLM services can be useful for responding generically to queries and interacting with users, agentic AI takes advantage of specialized SLMs to provide more targeted responses that support different steps in an end-to-end process.

With different autonomous agents involved at different steps, SLMs can play an important role in how you design agentic systems. The reason for this is that multi-agent applications can use a lot more resources than stand-alone AI applications to reach their end result. A generative AI application will use a certain number of tokens to process a response, e.g. for embedding requests into vectors. Tokens correspond to the number of words used in prompts, with longer and more complex prompts consuming more tokens.

Each component in an application will consume tokens to respond to a request. Depending on the number of agents and steps within a process, the number of tokens will be significantly higher for agentic AI, as each agent will create a response that consumes tokens, then pass that on to the next step (in turn consuming tokens) to create the next response (consuming tokens again), before the final response is created and sent back to the user. Capgemini estimates that, for a service carrying out one request per minute in response to one sensor event, a single-agent service would cost around $0.41 per day, while a multi-agent system would cost around $10.54 — approximately 26 times more expensive.