RDA COVID-19 Recommendations and guidelines

The objectives of the RDA COVID-19 Working Group (CWG) are:

to clearly define detailed guidelines on data sharing under the present COVID-19 circumstances to help stakeholders follow best practices to maximize the efficiency of their work, and to act as a blueprint for future emergencies; to develop guidelines for policymakers to maximise timely data sharing and appropriate responses in such health emergencies; to address the interests of researchers, policy makers, funders, publishers, and providers of data sharing infrastructures.

Source : RDA COVID-19 Recommendations and guidelines 1st release - open for comments https://www.eoscsecretariat.eu/eosc-liaison-platform/post/rda-covid-19-recommendations-and-guidelines-1st-release-open-comments

FAIR and timely

“FAIR principles [means] that data, software, models and other outputs should be Findable, Accessible, Interoperable and Reusable”

“A balance between achieving ‘perfectly’ FAIR outputs and timely sharing is necessary with the keygoal of immediate and open sharing as a driver.”

Metadata

“While rich metadata is desirable, even a minimum set of key fields/descriptors is valuable” [RDA20]

“The use of common metadata standards, as adopted by one’s relevant discipline, as well as vocabularies, are highly recommended”

“metadata should describe the data as well as the terms under which it can be accessed and reused.”

“Ideally, data and metadata should be exposed via machine readable endpoints (e.g. RDF, APIs)”

“Where there are restrictions on accessing or using datasets, metadata should be shared openly to enable discovery (e.g. CC0or CC-BYlicenses).”

Documentation

“Research outputs need to be documented, which includes documentation of methodologies used to define and construct data, data cleaning, data imputation, data provenance and so on”

“Software should provide documentation that describes at least the libraries, algorithms, assumptions and parameters used”

“Equally, research context, methods used to collect data, and quality-assurance steps taken are important”

“When sharing datasets, other relevant outputs (or documents) should also be made available, such as codebooks, lab journals, or informed consent form templates,”

Use of Trustworthy Repositories

“To facilitate data quality control, timely sharing and sustained access, data should be deposited in trustworthy data repositories (TDRs)”

“Whenever possible, these should be trustworthy data repositories (TDRs) that have been certified, subject to rigorous governance, and committed to longer-term preservation of their data holdings.”

“As the first choice, widely used disciplinary repositories are recommended for maximum accessibility and assessability of the data, followed by general or institutional repositories”

“Using existing open repositories is better than starting new resources.”

“By providing persistent identifiers, demanding preferred formats, rich metadata, etc., certified trustworthy repositories already guarantee a baseline FAIRness of and sustained access to the data, as well as citation.”

Ethics and Privacy

“Access to individual participant data andtrial documents should be as open as possible and as closed as necessary, to protect participant privacy and reduce the risk of data misuse”

What are the blocking factors for sharing data?

  • non-machine-readable data (e.g., PDF)
  • heterogeneous measurement standards
  • divergent metadata formats
  • lack of version control
  • fragmented datasets
  • delays in releasing data
  • non-standard definitions and reporting parameters
  • lack of metadata
  • unavailable or undocumented computer code
  • frequently changing web addresses
  • copyright and usage conditions
  • translation requirements
  • consents
  • approvals
  • legal restrictions
  • lack or no integration of clinical, eHealth, surveillance, and research systems within and across jurisdictions or providers

Major difficulty

Lack of contextual data needed to study the evolution of disease in sub-populations. e.g.,

  • healthy sub-populations that are vulnerable to serious long term effects following recovery that we don’t know about yet because we don’t have the data and because we are focusing on deaths
  • age-specific vulnerabilities
  • disadvantaged sub-populations with limited health care
  • vulnerabilities evident in severe disease associated with comorbidities
  • vulnerabilities due to environmental conditions
  • vulnerabilities due to social and cultural norms
  • following sequelae and immunity

Glossary

FAIR
FAIR principles

Data, software, models and other outputs should be Findable, Accessible, Interoperable and Reusable

[WI16]

FAIRER
FAIRER principles
Findable, Accessible, Interoperable, Reusable, Ethical, and Reproducible [RDA20]
OMICS
Omics are defined as data from cell and molecular biology [RDA20]
RDA
Reasearch Data alliance https://www.rd-alliance.org
SPIRIT
Standard Protocol Items: Recommendations for Interventional Trials
TDR
Trustworthy Data Repositories
Trustworthy Data Repositories (TDRs) are repositories that have been certified, subject to rigorous governance, and committed to longer-term preservation of their data holdings. [RDA20]

Citations

[RDA20](1, 2, 3, 4) https://www.rd-alliance.org/system/files/RDA%20COVID-19%3B%20recommendations%20and%20guidelines%2C%201st%20release%2024%20April%202020.pdf
[WI16]The FAIR Guiding Principles for scientific data management and stewardship. Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18