Google Scholar to overshadow them all? Comparing the sizes of 12 academic search engines and bibliographic databases | SpringerLink

Author: Michael Gusenbauer

Notes: An interesting study using a search term strategy to examine the size of databases. Wondering whether something analagous can be used to estimate the size of sets based on partial data. Here they use a range of searches from single letters and numbers through to common words to estimate the scale of various databases. They find Google Scholar is significantly larger in terms of search results.

Abstract: Information on the size of academic search engines and bibliographic databases (ASEBDs) is often outdated or entirely unavailable. Hence, it is difficult to assess the scope of specific databases, such as Google Scholar. While scientometric studies have estimated ASEBD sizes before, the methods employed were able to compare only a few databases. Consequently, there is no up-to-date comparative information on the sizes of popular ASEBDs. This study aims to fill this blind spot by providing a comparative picture of 12 of the most commonly used ASEBDs. In doing so, we build on and refine previous scientometric research by counting query hit data as an indicator of the number of accessible records. Iterative query optimization makes it possible to identify a maximum number of hits for most ASEBDs. The results were validated in terms of their capacity to assess database size by comparing them with official information on database sizes or previous scientometric studies. The queries used here are replicable, so size information can be updated quickly. The findings provide first-time size estimates of ProQuest and EbscoHost and indicate that Google Scholar’s size might have been underestimated so far by more than 50%. By our estimation Google Scholar, with 389 million records, is currently the most comprehensive academic search engine.

Gusenbauer, Michael. 2018. “Google Scholar to Overshadow Them All? Comparing the Sizes of 12 Academic Search Engines and Bibliographic Databases.” Scientometrics, November.

Source: Google Scholar to overshadow them all? Comparing the sizes of 12 academic search engines and bibliographic databases | SpringerLink

Empirical analysis and classification of database errors in Scopus and Web of Science – ScienceDirect

Authors: Franceschini F, Maisano D & Mastrogiacomo L

Comment: This is an article studying various errors in Scopus and Web of Science (WoS) databases. These include citation indexing errors, missing links, missing DOIs, incorrect author names, etc. Manual check was done on a sample of errors. After classification of errors, it found that the distributions of errors were very different between Scopus and WoS.

Abstract: In the last decade, a growing number of studies focused on the qualitative/quantitative analysis of bibliometric-database errors. Most of these studies relied on the identification and (manual) examination of relatively limited samples of errors.

Using an automated procedure, we collected a large corpus of more than 10,000 errors in the two multidisciplinary databases Scopus and Web of Science (WoS), mainly including articles in the Engineering-Manufacturing field. Based on the manual examination of a portion (of about 10%) of these errors, this paper provides a preliminary analysis and classification, identifying similarities and differences between Scopus and WoS.

The analysis reveals interesting results, such as: (i) although Scopus seems more accurate than WoS, it tends to forget to index more papers, causing the loss of the relevant citations given/obtained, (ii) both databases have relatively serious problems in managing the so-called Online-First articles, and (iii) lack of correlation between databases, regarding the distribution of the errors in several error categories.

The description is supported by practical examples concerning a variety of errors in the Scopus and WoS databases.

Source: Empirical analysis and classification of database errors in Scopus and Web of Science – ScienceDirect

Dimensions: A competitor to Scopus and the Web of Science? – ScienceDirect

Author: Mike Thelwall

Comment: This articles compares samples of data from Scopus with Dimensions. It found that the citation counts in Dimensions are in line with those obtained in Scopus. The slightly lower numbers in citations give indication that the coverage of Dimensions may not be much greater than that of Scopus. Although, this is not explicitly checked.

Abstract: Dimensions is a partly free scholarly database launched by Digital Science in January 2018. Dimensions includes journal articles and citation counts, making it a potential new source of impact data. This article explores the value of Dimensions from an impact assessment perspective with an examination of Food Science research 2008–2018 and a random sample of 10,000 Scopus articles from 2012. The results include high correlations between citation counts from Scopus and Dimensions (0.96 by narrow field in 2012) as well as similar average counts. Almost all Scopus articles with DOIs were found in Dimensions (97% in 2012). Thus, the scholarly database component of Dimensions seems to be a plausible alternative to Scopus and the Web of Science for general citation analyses and for citation data in support of some types of research evaluations.

Source: Dimensions: A competitor to Scopus and the Web of Science? – ScienceDirect

Comprehensiveness of national bibliographic databases for social sciences and humanities: Findings from a European survey | Research Evaluation | Oxford Academic


Linda Sīle Janne Pölönen Gunnar Sivertsen Raf Guns Tim C E Engels Pavel Arefiev Marta Dušková Lotte Faurbæk András Holl Emanuel Kulczycki, Bojan Macan Gustaf Nelhans Michal Petr Marjeta Pisk Sándor Soós Jadranka Stojanovski Ari Stone Jaroslav Šušol Ruth Teitelbaum


This article reviews in detail the collection of SSH research output in 13 national bibliographic databases across Europe, as potential alternative and more comprehensive sources for bibliometric analysis. Many are created and maintained for the purposes of national research funding and evaluation. The authors found some variation in collection criteria and research output types as well as variations in terminology and language across the databases. Some are publicly available and could be useful sources of data. although not necessarily citation data. Links to 21 databases are included in Supplementary table  2. Funded through ENRESSH.


This article provides an overview of national bibliographic databases that include data on research output within social sciences and humanities (SSH) in Europe. We focus on the comprehensiveness of the database content. Compared to the data from commercial databases such as Web of Science and Scopus, data from national bibliographic databases (e.g. Flemish Academic Bibliographic Database for the SSH (VABB-SHW) in Belgium, Current Research Information System in Norway (CRISTIN)) are more comprehensive and may, therefore, be better fit for bibliometric analyses. Acknowledging this, several countries within Europe maintain national bibliographic databases; detailed and comparative information about their content, however, has been limited. In autumn 2016, we launched a survey to acquire an overview of national bibliographic databases for SSH in Europe and Israel. Surveying 41 countries (responses received from 39 countries), we identified 21 national bibliographic databases for SSH. Further, we acquired a more detailed description of 13 databases, with a focus on their comprehensiveness. Findings indicate that even though the content of national bibliographic databases is diverse, it is possible to delineate a subset that is similar across databases. At the same time, it is apparent that differences in national bibliographic databases are often bound to differences in country-specific arrangements. Considering this, we highlight implications to bibliometric analyses based on data from national bibliographic databases and outline several aspects that may be taken into account in the development of existing national bibliographic databases for SSH or the design of new ones.


Source: Comprehensiveness of national bibliographic databases for social sciences and humanities: Findings from a European survey | Research Evaluation | Oxford Academic

Charting Equity in Higher Education: Drawing the Global Access Map

Authors: Graeme Atherton, Constantino Dumangane, Geoff Whitty

London, UK: Pearson, 2016


This report focuses largely on student access to higher education globally. “Experts” in fifty countries were surveyed about HE access data, and the authors analysed available data, developing a Global Access Data map.

It discusses the difficulties in data collection and comparison across countries because of different practices, definitions and measurement of indicators and limitations in data availability beyond gender and SES. These factors thwarted the authors’ intentions to develop a Global Equity Index. In response they developed a Global Equity Data Charter for Higher Education. Includes useful data sources.

Web site summary:

We know the economic benefit to individuals and to communities of increased levels of Higher Education (HE) participation. We also know that participation in HE has been expanding steadily; we anticipate there will be half a billion students participating in postsecondary education by 2030. But what do existing data tell us about who is accessing HE, and who is currently missing out? Specifically, what do we know about equity in access to high quality HE? Knowing that we are best able to manage what we measure, are institutions, nations, and international organisations capturing HE access data by critical social indicators (such as SES, gender, disability, or geographic remoteness to name but a few)?

Charting Equity in Higher Education: Drawing the Global Access Map, is the newest entry into the Open Ideas at Pearson series of global thought leadership. In researching the piece, which was supported by Pearson and the University of Newcastle (Australia), the authors undertook:

  • a survey of current data collection practices in 50 countries,
  • a review of existing data sources, and
  • deep dive case studies on six key countries (United States, United Kingdom, South Africa, Australia, India, and Columbia).

In this short and sharp final report, the authors identify and discuss five key messages, based on their examination of the evidence:

  1. Existing data suggest inequalities in access to HE are pervasive, spanning countries around the world, regardless of size or wealth.
  2. There are significant limitations to the data, with little data being collected beyond gender and SES. Further, different countries and regions have their own dominant concerns as regards equality, grounded in social, economic and political history.
  3. Comparisons across countries are important but difficult because of the various ways social indicators are defined and measured.
  4. Access means more than entry and participation; it also means completion of a high quality programme.
  5. Political will and resources shape data collection.

The authors initiated this work in the hopes of developing a Global Equity Index. Current data, however, made the construction of a rigorous, credible index challenging. To move this area forward, the authors have issued a call to action in the form of a Global Equity Data Charter – a series of actions to be undertaken by institutions, nations, and international organisations to help Higher Education Institutions and governments understand and address inequalities in who benefits from HE.

Source: Charting Equity in Higher Education: Drawing the Global Access Map

Mapping Australian higher education 2018 | Grattan Institute

Mapping Australian higher education 2018


Andrew Norton and Ittima Cherastidtham, The Grattan Institute

A report on Australian higher education over a range of years that looks at policy and funding, student enrolment characteristics, the student experience, the HE workforce, research, employment outcomes and HE providers. There is some discussion of student equity, and of gender pay gaps of graduates, but not specifically for HE staff. Figures are mostly overall for Australia, not by institution. However, it provides useful background/overview and some data sources

Web page overview

The graduate gender pay gap in Australia is narrowing, with more women in paid work than ever before. Women’s earnings generally outpaced men’s over the past decade – but the pay gap remains large.

Female university graduates are now expected to earn 27 per cent less than men – or $750,000 less – over their career. Ten years earlier, the gap was 30 per cent.

The median-income female graduate from 2016 can expect to earn about $2 million over her career. Early-career female graduates from 2016 are earning about 4 per cent more (after allowing for inflation) than their counterparts from 2006. Early-career male graduates from 2016, by contrast, are earning about 3 per cent less than their counterparts from a decade earlier.

The driving force behind women’s earnings growth over the past decade is a big increase in the number of women with children staying in the workforce – up by nearly 10 percentage points among graduates aged 25-34, and 5 percentage points among graduates aged 35-44.

This is a policy success story. As paid maternity leave has become more widely available, more women are choosing to stay employed when they become mothers, rather than quitting the workforce. And this trend is expected to continue. As subsidies make childcare more affordable for women returning to work, more are doing so full-time.

Gender equality in the workforce is not yet a reality in Australia, but it is slowly getting closer.

More broadly, growth in professional jobs in Australia did not keep up with the growing number of graduates over the decade, and recent graduates are getting less financial benefit from their degrees than earlier graduates at the same point in their careers.

In early 2017, 28 per cent of recent graduates who were looking for full-time work were yet to find it four months from completion, up from 15 per cent in early 2008, before the global financial crisis.

Earnings either grew weakly or declined over the past decade for early-career graduates from all disciplines except education, nursing and medicine. A median-income male graduate in science, commerce or law earned less in 2016 than in 2006, although law graduates still have above-average incomes.

Although the labour market remains tough for young graduates, it has improved since its lowest point in 2014, reflecting recent growth in professional jobs.

Mapping Australian higher education 2018, the fifth in a series going back to 2012, shows that in 2016 a record 41 per cent of Australian 19-year-olds were enrolled in higher education institutions.

After a decade of rapid growth, domestic commencing bachelor-degree enrolments are now growing slowly and so higher education participation will plateau over the next few years.

International student enrolments are still booming, bringing in more than $9 billion in fee revenue in 2017. China and India are the largest source countries.

Australian public universities still receive more than half their cash flow from government grants or loans, but are becoming less reliant on government.

Public spending on research has fallen in recent years, although total research spending by universities is up slightly, to $11 billion in 2016.


Source: Mapping Australian higher education 2018 | Grattan Institute

Accuracy of affiliation information in Microsoft Academic: Implications for institutional level research evaluation

Authors: Ranjbar-Sahraei B.; Eck, N.J. van; Jong R. de

Comment: This is a summary of results for a poster presented at the STI 2018 Conference in Leiden. The work compares research output recorded by both Microsoft Academic (MA) and Web of Science (WoS) for Leiden University. A first level automated matching is done, revealing differences across MA and WoS. Then, a sample of 100 is drawn from each of the disagreeing parts of the comparison. Manual checking of these found that MA contained affiliation errors.

Abstract: In this work, we study the accuracy of affiliation information in Microsoft Academic (MA). To conduct this study, we have considered the full set of publications assigned to Leiden University (LU) as provided by two different data sources: MA and Web of Science (WoS). The results of this study suggest that a considerable number of publications in MA have missing or wrong affiliation information.

Source: Accuracy of affiliation information in Microsoft Academic: Implications for institutional level research evaluation

The History, Deployment, and Future of Institutional Repositories in Public Universities in South Africa – ScienceDirect

Author: Siviwe Bangani

Another interesting paper about IRs in South Africa (SA). Web data was collected, together with interviews been conducted. A detailed history of IRs in SA is given. While many of the South African universities have signed various international declarations and initiative on OA, they often don’t have an institutional policy on OA. Various factors (obstacles and enablers) are listed. Amount of funding is relatively low compared to other countries. Varying IR sizes, types of objects in IRs, multiple language support and issues, and suggestions for development are presented and discussed.

This paper investigates the history, deployment, and content of institutional repositories (IRs) in public universities in South Africa. Some of the local, national and international drivers and enablers that ensure the establishment and survival of the institutional repositories are identified. Lastly, an attempt is made to determine the future of the IRs. Findings include that South African universities were among the first universities in the world to host IRs with the first IR established in 2000. The most prevalent and dominant content in South African public university collections are electronic theses and dissertations (ETDs). There are signs that this is changing as more libraries cover research outputs emanating from the universities. African languages are sparsely represented in IRs in South Africa. The majority of universities in the country signed the Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities, and the Budapest Open Access Initiative. Many of them do not have their own open access policy. The driving factors include the decline in government subsidy, increase in journal subscriptions, depreciation of the South African currency, and addition of the Value Added Tax (VAT) of 14% on electronic resources by the South Africa taxman while the enabling factors include the international open access mandates, the Carnegie Foundation grants, and the National Research Foundation’s statement on open access.

Bangani S (2018) The History, Deployment, and Future of Institutional Repositories in Public Universities in South Africa. The Journal of Academic Librarianship 44(1): 39-51.

Source: The History, Deployment, and Future of Institutional Repositories in Public Universities in South Africa – ScienceDirect

Institutional Repositories in Chinese Open Access Development: Status, Progress, and Challenges – ScienceDirect

Authors: Jing Zhong & Shuyong Jiang

Comment: An interesting paper interrogating institutional repositories (IR) in China. These IRs were accessed via ROAR, OpenDOAR, SouOA and CHAIR, though many URL links were broken. The article highlighted the slow development of OA repositories in China and attributed this to the lack of policy and support at all levels. At the end of the article, it mentioned that the Chinese Academy of Sciences and the National Natural Science Foundation of China, in May 2014, released an Open Access policy statement requiring that its funded research papers be made open access in IRs within 12 months after their publication. It would be interesting to follow-up on whether this had made any significant impact.

Open Access (OA) movement in China is developing with its own track and speed. Compared to its western counterparts, it moves slowly. However, it keeps growing. More significantly, it provides open and free resources not only to Chinese scholars, but also to those of China studies around the world. The premise is whether we can find them in an easy and effective fashion. This paper will describe the status of the OA movement in China with a focus on institutional repositories (IR) in Chinese universities and research institutes. We will explore different IR service modules and discuss their coverage, strengths, limitation, and most importantly implications to the East Asian Collection in the US.

Zhong J & Jiang S (2016) Institutional Repositories in Chinese Open Access Development: Status, Progress, and Challenges. The Journal of Academic Librarianship 42(6): 739-744.

Source: Institutional Repositories in Chinese Open Access Development: Status, Progress, and Challenges – ScienceDirect

Elsevier journals — some facts

Author: Timothy Gower
Blogpost April 24, 2014

Comment: This long blog post discusses the author’s attempts, successful in many cases, to obtain the costs of Elsevier journal subscriptions at the UK Russell Group of universities. It includes some amusing detailed correspondence with JISC and the universities. Also  related discussion around APCs and their impact on subscription costs, Elsevier costs in some US universities, Brazil. Also in the post and related comments are some useful data sources and related analysis.

Introduction: A little over two years ago, the Cost of Knowledge boycott of Elsevier journals began. Initially, it seemed to be highly successful, with the number of signatories rapidly reaching 10,000 and including some very high-profile researchers, and Elsevier making a number of concessions, such as dropping support for the Research Works Act and making papers over four years old from several mathematics journals freely available online. It has also contributed to an increased awareness of the issues related to high journal prices and the locking up of articles behind paywalls….

I  have come to the conclusion that if it is not possible to bring about a rapid change to the current system, then the next best thing to do, which has the advantage of being a lot easier, is to obtain as much information as possible about it. Part of the problem with trying to explain what is wrong with the system is that there are many highly relevant factual questions to which we do not yet have reliable answers.

