Evidence of Open Access of scientific publications in Google Scholar: a large-scale analysis

Authors: Alberto Martín-Martín, Rodrigo Costas, Thed van Leeuwen, Emilio Delgado López-Cózar

Comment: This articles made use of Google Scholar (GS) to access links to available full texts of articles and reviews (limited to those with DOIs) in Web of Science (for 2009 and 2014). A python script was used to query GS (across a pool of IP addresses off-campus) for each DOI in the sample. Extracting data from GS took 3 months. Sources that provided full texts were then classified using DOAJ (publishers), OpenROAR, ROAR (repositories) and CrossRef (open license). This included manually checking about 1000 hosts as well. These were combined to determine OA status of individual DOI. Data was processed in R. The results were summarised and compared across disciplines and countries. Also, the numbers were similar to other recent large-scale studies on OA status that used similar data sets. This article also gave a good review on the literature of OA publication, licensing and copyright issues.

Abstract: This article uses Google Scholar (GS) as a source of data to analyse Open  Access (OA) levels across all countries and fields of research. All articles and reviews with a DOI and published in 2009 or 2014 and covered by the three main citation indexes in the Web of Science (2,269,022 documents) were selected for study. The links to freely available versions of these documents displayed in GS were collected. To differentiate between more reliable (sustainable and legal) forms of access and less reliable ones, the data extracted from GS was combined with information available in DOAJ, CrossRef, OpenDOAR, and ROAR. This allowed us to distinguish the percentage of documents in our sample that are made OA by the publisher (23.1%, including Gold, Hybrid, Delayed, and Bronze OA) from those available as Green OA (17.6%), and those available from other sources (40.6%, mainly due to ResearchGate). The data shows an overall free availability of 54.6%, with important differences at the country and subject category levels. The data extracted from GS yielded very similar results to those found by other studies that analysed similar samples of documents, but employed different methods to find evidence of OA, thus suggesting a relative consistency among methods.

Martín-Martín, A., Costas, R., van Leeuwen, T., & Delgado López-Cózar, E. (2018). Evidence of Open Access of scientific publications in Google Scholar: a large-scale analysis. https://doi.org/10.17605/osf.io/k54uv

Source: Evidence of Open Access of scientific publications in Google Scholar: a large-scale analysis