Abstract | The promises inherent in users coming together to form data
sharing network communities, bring to the foreground new problems formulated
over such dynamic, ever growing, computing, storage, and networking
infrastructures. A key open challenge is to harness these highly
distributed resources toward the development of an ultra scalable, efficient
search engine. From a technical viewpoint, any acceptable solution
must fully exploit all available resources dictating the removal of any
centralized points of control, which can also readily lead to performance
bottlenecks and reliability/availability problems. Equally importantly,
however, a highly distributed solution can also facilitate pluralism in informing
users about internet content, which is crucial in order to preclude
the formation of information-resource monopolies and the biased visibility
of content from economically-powerful sources. To meet these challenges,
the work described here puts forward MINERVAâï申ï申, a novel search
engine architecture, designed for scalability and efficiency. MINERVAâï申ï申
encompasses a suite of novel algorithms, including algorithms for creating
data networks of interest, placing data on network nodes, load balancing,
top-k algorithms for retrieving data at query time, and replication algorithms
for expediting top-k query processing. We have implemented the
proposed architecture and we report on our extensive experiments with
real-world, web-crawled, and synthetic data and queries, showcasing the
scalability and efficiency traits of MINERVAâï申ï申. |