How to measure scientific impact: investigating the structure of Web of Science publications

doi.org/10.1016/j.joi.2023.101379 

Known as the most popular deliverable of scientific research, article is considered a main carrier of new knowledge and information, presenting innovative findings, demonstrating unique contributions, and promoting openness and transparency in science. Given the ever growing publications in science, it has always been an important topic how to quantify an article’s scientific prestige to fairly reflect its contribution to scientific progress. We develop an eigenvector centrality metric, Article’s Scientific Prestige (ASP) index, based on recent advances in pagerank and optimisation that addresses this bias in a way this is computationally tractable with large-scale citation data.

Our approach is motivated by a specific application: measuring scientific quality/importance of each individual article in the Web of Science (WoS) citation network. WoS is an internet search platform that provides comprehensive citation data for 254 academic disciplines, including Natural Science, Technology, Social Sciences, Humanities, Arts etc. The WoS citation data contains 63,092,643 unique articles published in 65,045 journals with 953,967,411 citations over 40 years from 1981 to 2020.

It is apparent that articles have different scientific impacts. The top 1% influencing papers are easy to spot. They introduce new terms and names and initiate research in a new area. The bottom 20-30% is also easy, as they are never cited and thus have negligible impact. It becomes challenging and yet unanswered how to evaluate the rest articles.

#Citations (citation count) and Journal Grade have long been used as an indicator how much attention an article has received in the society of science. Our main question of interest is to what extent the #Citations and Journal Grade are helpful to assess the importance of an individual article. Though popularly adopted in all kinds of evaluation one can easily find counterexamples where either way fails. What is the relation between ASP and the two metrics, in terms of disciplines and over time. Lastly, which metric is more reliable and under which situations.

Our findings

  • ASP and #Citations are similar for the top 10% articles, though ASP is more sensitive to articles with less direct but deeper citations. For the remaining articles, ASP is not aligned with #Citations.
  • ASP and #Citations follow a polynomial law with long right skewness. However, the ASP is more concentrated and has much fewer extreme values.
  • Articles published in journals with different grades again show right-skewed distribution of citations, where the concentration of few or none cited articles dominate. Articles shouldn’t be evaluated by the journal’s grade.
  • Biology, Medicine, Geography and Science are the disciplines with the highest medians in ASP and #Citations, however the deviations in ASP are much narrower than in #Citations. There is little change over time.
  • There is little relation between ASP and #authors or #references.

As we are moving towards an information-based society, there is a noticeable change regarding the evaluation on scientific impact of article, scholar and university. While 20 years ago, it was tough to get comprehensive spectrum of the scientific contributions at all, these are now large scale citations data and high techniques to handle the complex dependence. This change motivates the research collaborations between National University of Singapore (Singapore, https://www.math.nus.edu.sg), Zuse Institute Berlin (Germany, https://www.zib.de) and Institute of Statistical Mathematics (Japan, https://www.ism.ac.jp/index_e.html).

This collaborative research initiative has three goals:

  • to bring together advanced data science, statistics, machine learning and optimization with the practical knowledge of the university libraries to conduct theoretical and empirical research on proper measurement of research, the critical factor in the knowledge economy.

  • to bring together advanced data science, statistics, machine learning and optimization with the practical knowledge of the university libraries to conduct theoretical and empirical research on proper measurement of research, the critical factor in the knowledge economy.
  • to promote collaborations between researchers at both NUS, ZIB and ISM to move our collaboration to a level, from which we integrate research expertise and resources for the ongoing research project initiatives.

Ying Chen (NUS), Keisuke Honda (ISM) and Thorsten Koch (ZIB)