We discussed the project findings in a meeting on 18th October with a group of researchers, HEI library/data management representatives and publishers.
The meeting inlcuded presentations from Mike Thelwall, U of Wolverhampton on the project results from Johanna McEntyre, EMBL-EBI on mining Data availablity statements for GWAS data, from Marcus Munafò, U of Bristol on sharing GWAS summary statistics from a researcher’s perspective and from Victoria Moody, Jisc on the wider context of tracking and recognising research data.
Preprint of the study
Final project report
Description of study and results
See below for a short summary that captures the main points of the discussion after the presentations on the day of the meeting. If you think there’s anything important we haven’t touched on, have any further thoughts, feel free to add your thoughts via the comments section.
Summary of the discussion at the meeting
- While the overall percentage (10.6%) of articles reporting a complete set of GWAS summary statistics is low in the data availability study, we can note in both the study and the GWAS catalog an increase of the sharing of GWAS summary statistics in the last few years. (The GWAS catalog provides manually curated data from all GWAS).
- There are most likely many different mechanisms for sharing GWAS summary statistics. For example, some researchers share the data via the GWAS catalog but don’t link them to the corresponding articles (yet). Data availability statements (DAS) might not reflect that data is available as e.g. authors might not include data they hold in the statement as they are still working with it.
- The culture of sharing GWAS summary data and what is stated in the research articles does not necessarily match. Also, some parts of the community (e.g. working on research related to certain diseases) don’t share the data as much so again this very much depends on community norms. To get a fuller picture it would be interesting to compare the data used in/results of the data availability study with the GWAS summary statistics held in the GWAS catalog and to do a qualitative study about sharing practices of researchers working with this type of data. It is difficult to say how the results of the data availability study would compare to other data types. It could be ‘the best of a bad lot’.
- It’s becoming more and more standard for journals to require DAS. PLOS for example, has had a mandate requiring standardised DAS to be included in publications since 2014. Author compliance with filling in statements usually depends on the strength of the journal data policy, some ‘require’ authors to fill them in, some ‘encourage’ it.
- There is clearly some room for editorial improvement and author guidance on how to fill in DAS to make them more useful. However, it is difficult to scale checks by editors so automating this process would be helpful. Tools to screen submissions would be useful to flag up that DAS are missing but identifying which data sets need to be available is more challenging.
- Funder mandates and journal data policies are good, but the research community needs to be on board too. Ideally, policies need to reinforce existing communities norms. From a researcher’s perspective the social norms around data sharing are more important than funder mandates as they usually aren’t enforced. The example of sharing GWAS summary statistics shows that as soon as the data is available researchers find new often not anticipated ways of reusing them. (The preprint of the study describes the data and how they could be reused in more detail).
- The REF team at U of Bristol are interested in using the tool to extract DAS developed as part of the project to identify which proportions of their researchers’ outputs have DAS. This could be useful to enhance the environment component of its submission to the REF.
- Other studies have also developed tools that can show which articles have DAS or that can find and classify them (e.g. ‘the citation advantage of linking publications to reserach data‘ or ‘Data sharing in PLOS ONE‘). These tools can find the statements, but it would be more valuable to dig deeper and to try to find DOIs or other identifiers, focus on aspects in DAS that matter to resesearch communities. Also, the data could be perfectly described in a DAS but not allow for a critical assessment of the research results as the analysis depends on proprietary software (commercial software is deeply embedded in the work of some disciplines) or it is not complete.
- The Belmont Forum, a consortium of funders have developed a common data and digital output policy. This includes a template for DAS for Belmont Forum projects and also bespoke DMPs which then link back to terms and guidelines for filling in the DAS. It might be useful to consider the fields and terms used in DMPs when developing templates for DAS to make it easier to fill them in.
- How do we make DAS more machine readable? To be of value the statements need to link to data (and software) and need to include community recognised identifiers. Step (1) would be to standardise what they are called (e.g. data availability, data access or data sharing statements etc.) Journals would need to make them more findable, e.g. put them in front of the paywall. They would need to be used by all journals and they would need to be put in the same place.
- It would be useful if more journals could provide specific advice on completing data availability statements. (Not just ‘all the data is shared’) including e.g. more precise descriptions of the type of data.
- Researchers must jump through several hoops to get a paper published and to share the underlying data in a reusable format. Being able to demonstrate data management skills can help researchers with gaining a promotion. However, roles in institutions which require data management skills often fall under the ‘professional support’ rather than the ‘academic’ job categories which shows that there are cultural problems that need to be addressed.
- Linking data (and code) to research articles doesn’t change the focus on the article as the primary research output and the issues around this. There are new initiatives such as the Octopus which aim to change the way in which research is published.
- Badges for open research practices could be an incentive to shift practice.
- More ‘research on research’ is needed e.g. on how data has been reused; showing which kind of new research could/has happened due to data being shared.