You can use the same query tools to search vector indexes as well as the rest of your data, giving you the option to search based on similarities in your data or by exact matches. This approach is similar to how large-scale search engines work and will help find and rank results from large semistructured data sets, for example, searching for relevant reviews on an e-commerce site. Fabric requires a vector policy for each Cosmos DB container, which defines size, dimensionality, and the underlying distance function used to search for similar vectors. Search technologies like DiskANN require a high dimensionality, with at least 1,000 dimensions (and a maximum of 4,096).
Querying Cosmos DB in Fabric
When you query data stored in Cosmos DB through Fabric’s OneLake, you’re working with a mirrored copy of your Cosmos DB data. As you store data, it’s copied across in the Delta Parquet format used in Fabric, allowing you to use any of the supported query tools, including the desktop Power BI for ad hoc analysis. Queries here can be made across all your operational data, not just Cosmos DB, treating it as a unified whole and still taking advantage of Cosmos DB’s feature set for applications that need to use that data.
This also allows you to take advantage of other Fabric features with your Cosmos DB data, for example, using it to quickly add embeddings and a vector index to your data, so it can be used as part of the grounding data for an AI application based on retrieval-augmented generation (RAG).