How to search things from a collection is one problem.
How to keep things (in a collection) for a searching is another problem.
And the latter one could be a really big problem, if you have to keep “3,307,998,701 web pages” like Google does.
Google File System: Technical paper, by Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. This is a technical paper that explains Google’s custom scalable cluster filesystem for storing their gigantic database of the entire Web across thousands of low-cost PCs.
From Google Weblog.