A 7-Billion Document Odyssey: How Hybrid AI Solutions Drive Scale and Efficiency

A knowledge management customer needed to embed seven billion library records as part of a large-scale information retrieval solution. They already had valuable NVIDIA GPUs for AI inference and wanted to leverage their on-premises hardware rather than moving everything to the cloud. However, they weren’t entirely sure whether their existing infrastructure could handle such a massive embedding job—particularly because no public benchmarks were available for their specific generation of GPUs. The customer sought a reliable way to test and validate the capabilities of their hardware while also choosing the most suitable text embedding model for the task.

To address this challenge, Quest1 partnered with deepset and NVIDIA to conduct a methodical benchmarking exercise. Referencing available NVIDIA benchmarks while customizing the tests for the customer’s specific hardware, the team evaluated multiple embedding models on real-world performance metrics. This process allowed them to determine the feasibility of indexing billions of documents on-premises. By ensuring that all tests aligned with the customer’s infrastructure and security requirements, the exercise validated whether on-premises implementation was truly viable. With deepset’s expertise in LLM-based solutions—encompassing Retrieval Augmented Generation (RAG), Intelligent Document Processing (IDP), and agentic applications—the team ensured the solution was both fully optimized and ready for potential expansion, such as a cloud-to-ground or hybrid approach.

Ultimately, the benchmarking provided the customer with clear, quantifiable evidence that their existing on-premises infrastructure was capable of handling the seven billion document indexing task. As a result, they did not need major hardware upgrades, saving them considerable expense. This confirmation further boosted the customer’s confidence in their ability to handle large-scale AI tasks locally, while also benefiting from a tailored recommendation on the most efficient embedding model. Not only did the approach deliver high performance at a lower cost, it also positioned the customer to explore a flexible, hybrid AI strategy in the future—one that could seamlessly combine the control of on-premises solutions with the scalability of the cloud. By working with Quest1 and NVIDIA, deepset helped the customer realize a robust, future-proof setup that would evolve with their growing AI needs.

A 7-Billion Document Odyssey: How Hybrid AI Solutions Drive Scale and Efficiency

Got any questions?