The next step is also being secure. The challenge is, if the LLM can run any possible query against the database, then how do you make sure you don’t exfiltrate and leak information? We’ve built this technology that we call parameterized secure views in the database itself, that lets you define the right secure barriers and encodes the security policies that you need, so that the LLM can generate any query it wants, but with respect to the logged-in user we will not let them see any information that they are not supposed to see. We’ll also, on an information-theoretical basis, not leak information that they should not have access to.
Heller: I know you’ve spent a lot of time thinking about the future of databases and generative AI. Where are we headed?
Krishnamurthy: Part of my thinking here has evolved over the last couple of years, but for 50 years the world of databases has been at least SQL databases where it was all about producing exact results. I like to say databases had one job: store the data, don’t lose the data, and then when you ask a question, give the exact result. OK, maybe two jobs. It was all about exact results because we’re dealing with structured data. I think the biggest change that’s happening right now is that we are no longer just dealing with structured data. We’re also dealing with unstructured data. When you combine structured and unstructured data, the next step is that it’s not just about exact results but about the most relevant results. In this sense databases start to have some of the capabilities of search engines, which is about relevance and ranking, and what becomes important is almost like precision versus recall for information retrieval systems. But how do you make all of this happen? One key piece is vector indexing. In other words, you have structured data, which is in the database, but we have other kinds of information, unstructured data, semi-structured data.



