Google Scholar to overshadow them all? Comparing the sizes of 12 academic search engines and bibliographic databases
Author: Michael Gusenbauer
Notes: An interesting study using a search term strategy to examine the size of databases. Wondering whether something analagous can be used to estimate the size of sets based on partial data. Here they use a range of searches from single letters and numbers through to common words to estimate the scale of various databases. They find Google Scholar is significantly larger in terms of search results.
Abstract: Information on the size of academic search engines and bibliographic databases (ASEBDs) is often outdated or entirely unavailable. Hence, it is difficult to assess the scope of specific databases, such as Google Scholar. While scientometric studies have estimated ASEBD sizes before, the methods employed were able to compare only a few databases. Consequently, there is no up-to-date comparative information on the sizes of popular ASEBDs. This study aims to fill this blind spot by providing a comparative picture of 12 of the most commonly used ASEBDs. In doing so, we build on and refine previous scientometric research by counting query hit data as an indicator of the number of accessible records. Iterative query optimization makes it possible to identify a maximum number of hits for most ASEBDs. The results were validated in terms of their capacity to assess database size by comparing them with official information on database sizes or previous scientometric studies. The queries used here are replicable, so size information can be updated quickly. The findings provide first-time size estimates of ProQuest and EbscoHost and indicate that Google Scholar’s size might have been underestimated so far by more than 50%. By our estimation Google Scholar, with 389 million records, is currently the most comprehensive academic search engine.
Gusenbauer, Michael. 2018. “Google Scholar to Overshadow Them All? Comparing the Sizes of 12 Academic Search Engines and Bibliographic Databases.” Scientometrics, November. https://doi.org/10.1007/s11192-018-2958-5.