This latest release also includes cost-based load balancing, enabling Kong to route requests based on token usage and pricing. For example, low-complexity prompts can go to cheaper models, while high-value tasks route to premium providers. This is especially helpful for companies using multiple LLMs for different use cases, allowing them to optimize for both performance and budget.
Kong
This visual outlines the breadth of Kong AI Gateway features, including LLM orchestration, load balancing, prompt management, and more.
Additionally, Kong now supports pgvector, extending semantic capabilities like routing, caching, and guardrails to Postgres-based databases. This gives platform teams more flexibility when designing AI pipelines within existing cloud-native environments like AWS Relational Database Service or Azure Cosmos DB.