Leaving the gold standard

Guest Post by Cameron Neylon

See also a briefing paper written by Cameron Neylon for Jisc on the Complexities of Citation.

Citations, we are told, are the gold standard in assessing the outputs of research. When any new measure or proxy is proposed the first question asked (although it is rarely answered with any rigour) is how this new measure correlates with the “gold standard of citations”. This is actually quite peculiar, not just because it raises the question of why citations came to gain such prominence, but also because the term “gold standard” is not without its own ambiguities.

http://bit.ly/2zPATuI *

http://bit.ly/2zPATuI *

The original meaning of “gold standard” referred to economic systems where the value of currency was pegged to that of the metal; either directly through the circulation of gold coins, or indirectly where a government would guarantee notes could be converted to gold at a fixed rate. Such systems failed repeatedly during the late 19th and early 20th centuries. Because they coupled money supply – the total available amount of government credit – to a fixed quantity of bullion in a bank, they were incapable of dealing with large-scale and rapid changes. The Gold Standard was largely dropped in the wake of World War II and totally abandoned by the 1970s.

But in common parlance “gold standard” means something quite different to this fixed point of reference, it refers to the best available. In medical sciences the term is used to refer to treatments or tests that currently are regarded as the best available. The term itself has been criticised over the years, but it is perhaps more ironic that this notion of “best available” is actually in direct contradiction to intent of the currency gold standard – that value is fixed to a single reference point for all time.

So are citations the best available measure, or the one that we should use as the basis for all comparisons? Or neither? For some time they were the only available quantitative measure of the performance of research outputs. The only other quantitative research indicator being naive measures of output productivity. Although records have long been made of journal circulation in libraries – and one time UK Science Minister David Willetts has often told the story of choosing to read the “most thumbed” issue of journals as a student –  these forms of usage data were not collated and published in the same ways as the Science Citation Index. Other measure such as research income, reach, or even efforts to quantify influence or prestige in the community have only been available for analysis relatively recently.

If the primacy of citations is largely a question of history, is there nonetheless a case to be made that citations are in some sense the best basis for evaluation? Is there something special about them? The short answer is no. A large body of theoretical and empirical work has looked at how citation-based measures correlate with other, more subjective, measures of performance. In many cases at the aggregate level those correlations or associations are quite good. As a proxy at the level of populations citation based indicators can be useful. But while much effort has been expended on seeking theories that connect individual practice to citation-based metrics there is no basis for the claim that citations are in any way better (or to be fair, any worse) than a range of other measures we might choose.

Actually there are good reasons for thinking that no such theory can exist. Paul Wouters, developing ideas also worked on by Henry Small and Blaise Cronin, has carefully investigated the meaning that gets transmitted as authors add references, publishers format them into bibliographies, and indexes collect them to make databases of citations. He makes two important points. First that we should separate the idea of the in text reference and bibliographic list – the things that authors create – from the citation database entry – the line in a database created by an index provider. His second point is that, once we understand the distinction between these objects we see clearly how the meaning behind the act of the authors is systematically – and necessarily – stripped out by the process. While we theorists may argue about the extent to which authors are seeking to assign credit in the act of referencing, all of that meaning has to be stripped out if we want citation database entries to be objects that we can count. As an aside the question of whether we should count them, let alone how, does not have an obvious answer.

It can seem like the research enterprise is changing at a bewildering rate. And the attraction of a gold standard, of whatever type, is stability. A constant point of reference, even one that may be a historical accident, has a definite appeal. But that stability is limited and it comes at a price. The Gold Standard helped keep economies stable when the world was a simple and predictable place. But such standards fail catastrophically in two specific cases.

The first failure is when the underlying basis of trade changes, when the places work is done expands or shifts, when new countries come into markets, or when the kinds of value being created changes. Under these circumstances the basis of exchange changes and a gold standard can’t keep up. Similar to the globalisation of markets and value chains, the global expansion of research and the changing nature of its application and outputs with the advent of the web puts any fixed standard of value under pressure.

A second form of crisis is a gold rush. Under normal circumstances a gold standard is supposed to constrain inflation. But when new reserves are discovered and mined hyperinflation can follow. The continued exponential expansion of scholarly publishing has lead to year on year inflation of citation database derived indicators. Actual work and value becomes devalued if we continue to cling to the idea of a citation as a constant gold standard against which to compare ourselves.

The idea of a gold standard is ambiguous to start with. In practice citation data-based indicators are just one measure amongst many, neither the best available – whatever that might mean – nor an incontrovertible standard against which to compare every other possible measure. What emerges more than anything else from the work of the past few years on responsible metrics and indicators is the need to evaluate research work in its context.

There is no, and never has been, a “gold standard”. And even if there were, the economics suggests that it would be well past time to abandon it.

A briefing paper written for Jisc by Cameron Neylon – “The Complexities of Citation: How theory can support effective policy and implementation” – is available open access from the Jisc Repository.

Cameron Neylon

Cameron Neylon

About the author: Cameron Neylon is an advocate for open access and Professor of Research Communications at the Centre for Culture and Technology at Curtin University. You can find out more about his work and get in touch with Cameron via his personal page Science in the Open.



*Featured image: “A real bag of gold” by cogdogblog@flickr, used under the terms of a Creative Commons Attribution license.

All citations are created equal

(Only some are more equal than others)

Today we introduce you to one of the Jisc-funded PhD. students working at The Knowledge Media Institute (KMi), which is a part of the Open University and is located in Milton Keynes. David Pride is one of the team working as a part of the joint Jisc/OU CORE project (COnnecting REpositores) which offers Open Access to over eight million research papers.
David Pride completed his MSc. in Computer Science (with distinction) at The University of Hertfordshire in 2016 before starting his PhD. at KMi in February of this year. David’s PhD. supervisor is Dr. Petr Knoth and his thesis topic is looking at web-scale research analytics for identifying high performance and trends in academic research. In short, this involves using state-of-the-art Text and Data Mining techniques to analyse datasets containing millions of academic papers to attempt to identify highly impactful and influential research.

http://bit.ly/2AilG21 *

http://bit.ly/2AilG21 *

At KMi, all PhD. students must complete a pilot project study within their first year. For his, David chose to undertake a review of several previous studies that have attempted to automatically categorise citations according to type, sentiment and influence. Current bibliometrics methods, from the renowned Journal Impact Factor (JIF) to the h-index­ for individual authors, treat all citations equally. There is much empirical evidence demonstrating that treating citations all equally in this manner means that basic citation counts do not reflect the true picture of how a paper may be being used. A piece of research may be highly cited because of its ground-breaking content or because it introduces a new methodology. However, it could also be highly cited because it is a survey paper that provides a rich background to a particular domain. Conversely, a paper may engender citations that refute or disagree with the original work. Whilst most citations are overtly neutral in sentiment there is a certain percentage of negative citations. Yet, currently, all these citations are treated equally.

David’s work is also focused on developing new metrics that can leverage the full content of an academic paper to evaluate its quality rather than relying on citation counts alone.  He therefore continued the work of previous studies in using machine learning and natural language processing tools to automatically classify citations according to type and ‘influence’. Influence itself is an interesting concept and, in this case, refers to how influential the cited paper was on the citing paper, i.e. was the citation central to understanding the new work or was it perfunctory, or merely mentioned as part of the literature review for example. If information regarding how a paper is being cited is available to academics, researchers and reviewers this provides a much richer insight than currently available with basic citation counts.

Building on the work of Valenzuela et al. (2015) and Zhu et al. (2015) David developed a system to classify citations in a paper as either incidental or influential. Despite running into several difficult steps along the way, the results of the experiments were overall extremely positive and the resulting short paper was presented at the TPDL (Theory and Practice of Digital Libraries) 2017 Conference and was published in the Springer Lecture Notes on Computer Science. A full version of the paper was later accepted to the ISSI (International Society of Scientometrics and Informetrics (2017) where David presented his results to conference in Wuhan, China.

Moving forward, David intends to address one of the major failings in this domain which is the lack of a massive scale human-annotated dataset of citations to use when training classifiers for this task. It is believed that the results obtained previously can be significantly improved with a larger initial training set. Citation data is unbalanced in nature, negative citations for example representing only about 4% of all citations. Training a classifier to accurately identify these citations requires a dataset of sufficient magnitude to contain enough examples of every class. A large-scale reference set which contains citations annotated according to type, sentiment and influence would be an extremely valuable asset for researchers working in this domain.

In the coming months, David will also be researching the peer review process and how well this correlates with current methodologies for tracking research excellence. He has some  interesting data he is currently looking at and we’re looking forward to seeing what he produces in 2018!

*Featured image: “measurement” by flui., used under the terms of a Creative Commons Attribution license.