A longitudinal analysis of search engine index size
Publication year
2015Publisher
Istanbul, Turkey : Boğaziçi University
ISBN
9789755183817
In
Salah, A. A.; Tonta, Y.; Salah, A. A. A. (ed.), Proceedings of ISSI 2015 Istanbul: 15th International Society of Scientometrics and Informetrics Conference, pp. 71-82Annotation
15th International Society of Scientometrics and Informetrics Conference (ISSI-2015), 29 juni 2015
Publication type
Article in monograph or in proceedings
Display more detailsDisplay less details
Editor(s)
Salah, A. A.
Tonta, Y.
Salah, A. A. A.
Sugimoto, C.
Al, U.
Organization
Communicatie- en informatiewetenschappen
Languages used
English (eng)
Book title
Salah, A. A.; Tonta, Y.; Salah, A. A. A. (ed.), Proceedings of ISSI 2015 Istanbul: 15th International Society of Scientometrics and Informetrics Conference
Page start
p. 71
Page end
p. 82
Subject
Language & Speech Technology; Language in Society; NederlabAbstract
One of the determining factors of the quality of Web search engines is the size and quality of their index. In addition to its influence on search result quality, the size of the indexed Web can also tell us something about which parts of the WWW are directly accessible to the everyday user. We propose a novel method of estimating the size of a Web search engine's index by extrapolating from document frequencies of words observed in a large static corpus of Web pages. In addition, we provide a unique longitudinal perspective on
the size of Google and Bing's indexes over more than eight years, from March 2006 until January 2015. We find that index size estimates of these two search engines tend to vary dramatically over time, with Google generally possessing a larger index than Bing. This result raises doubts about the reliability of previous one-off estimates of the size of the indexable Web. We find that much of this variability can be explained by changes in the indexing and ranking infrastructure of Google and Bing, as well as the distributed nature of Web search engines. This casts further doubt on whether Web search engines can be used reliably for cross-sectional Webometric studies.
This item appears in the following Collection(s)
- Academic publications [246764]
- Electronic publications [134215]
- Faculty of Arts [30043]
- Open Access publications [107738]
Upload full text
Use your RU credentials (u/z-number and password) to log in with SURFconext to upload a file for processing by the repository team.