Aim of the study
The aim of the study was to assess
1) the extent of data sharing of summary statistics of primary human genome-wide association studies (GWAS) as an example of data sharing in favourable circumstances in a particular discipline, and
2) whether such checks can be automated as a step towards more general automatic shared data discovery and quality control.
This type of data is particularly suitable for sharing because it is a relatively standardised research output, is straightforward to use in future studies (e.g., for secondary analysis), may be already stored in a standardised format for internal sharing within multi-site research projects, and is generated by a discipline that has strong existing norms regarding data sharing. A follow-up study included the development of a tool to extract data availability statements from the PMC open access collection and classifying them according to key dimensions.
Preprint summarising the study
Final project report
Project findings include
- only 10.6% of primary human GWAS studies either share or offer to share their summary statistics data in any form.
- in a field where data sharing statements are widely used, it is possible to extract information about whether data was shared.
- data sharing statements are vague about what is shared, and there is no single standard or policy adopted by all journals in a specific field regarding what should be included in a data availability statement.
- descriptions of the exact nature of the data available would help not only automation but also researchers scanning multiple articles to find relevant data for a new study.
- We conclude that if more journals required data sharing statements and employed guidelines to ensure that the shared data was described in detail, or provided virtual rewards for these activities, then this would support the level of automated data discovery that would be necessary to monitor data sharing and systematically identify shared data for later re-use.
We discussed the project results in a meeting on 18th October with a group of researchers, HEI library/data management representatives, publishers and funders.
See a summary of the discussion at the meeting in the next blog post.