Voxel51
Similarity search for quality control
Once you find one problematic annotation, similarity search becomes a powerful tool to find all related errors. Click on a mislabeled sample and instantly retrieve the most similar images to check if they have the same systematic labeling problem.
FiftyOne’s similarity search transforms “find more like this” from manual tedium into instant discovery. Index your data set once, then instantly retrieve visually similar samples through point-and-click or programmatic queries.
import fiftyone as fo
import fiftyone.brain as fob
import fiftyone.zoo as foz
# Load dataset
dataset = foz.load_zoo_dataset("quickstart")
# Index images by similarity
fob.compute_similarity(
dataset,
model="clip-vit-base32-torch",
brain_key="img_sim"
)
# Sort by most likely to contain annotation mistakes
mistake_view = dataset.sort_by("mistakenness", reverse=True)
# Query the first sample and find 10 most similar images
query_id = mistake_view.take(1).first().id
similar_view = dataset.sort_by_similarity(query_id, k=10, brain_key="img_sim")
# Launch App to view similar samples and for point-and-click similarity search
session = fo.launch_app(dataset)
Key capabilities include instant visual search through the App interface, object-level similarity indexing for detection patches, and scalable back ends that switch from sklearn to Qdrant, Pinecone, or other vector databases for production.



