Frequently Asked Questions

What WCM-supported data storage tools integrate with WIDRR?

The following list of data storage tools are WCM-supported and/or WCM-implemented. They comply with the new Data Retention policies, and integrate with WIDRR

What does the Cornell University Research Data Retention policy mean for WCM researchers?

Section 1.3.8 details the responsibilities of WCM faculty regarding collection and retention of research data and should be read carefully by WCM researchers.

Your main responsibilities include:

  • Data management plan: WCM faculty must create, abide by, and fund a data management plan that specifies where they will deposit data at the close-out of research. 
    • What is close-out of research? For funded research, close-out is whichever comes first of these two events:
      • The end of the grant or contract agreement OR
      • 60 days prior to faculty member leaving institution 
  • Data retention: Faculty must enter the required metadata information and method description into the WCM Institutional Data Repository for Research (WIDRR). Please see the FAQ, “What kind of data do I need to deposit?” for further details.
    • When does the data retention need to occur? After publication or within three yearsafter final project closeout of all funded or unfunded research.
    • How long does the data need to be retained? Primary data and supporting images must be available for at least six years after publication. If data and images are used in subsequent publication, or the original publication is cited in another publication or grant application by the same faculty member(s), the data must be available for an additional six years from the date of the most recent citation

What kind of data do I need to deposit?

Faculty must enter a data catalog record (marked public or private) into the WCM Institutional Data Repository for Research (WIDRR). That record should include the location of the datasets (including raw data) and provide the methods file sufficient for replication and audit of the research. It is not mandatory to deposit the datasets into WIDRR if it is available in an existing data repository. 

In addition, faculty should:

  1. Include a pointer link to any datasets that are in an existing data repository. A list of NIH-accepted repositories can be found here: https://www.nlm.nih.gov/NIHbmic/domain_specific_repositories.html
    1. There is a drop-down choice in WIDRR with choices of data repositories, but you can manually enter it as well. Please contact the library (https://library.weill.cornell.edu/ask-us) to add other data repositories to the drop-down options.
    2. If the datasets are stored in a WCM secure location (e.g., Box, OneDrive, network fileshare), please supply the accessible share link.
  2. Associate a dataset to a milestone(s) and a project.
  3. Provide the methods file describing all the steps of the analysis starting from the raw data input file up to the published data output file. The faculty must specify all the software, parameters and code used in the methods file. Intermediate experiments and data that are not necessary for replication of the research do not need to be preserved.

Examples of what types of data to provide:

Basic and translational science

  • URL to dbGaP for BAM file(s)
  • Path to the WCM file share with proteomics data set
  • File of mass spectrometry output
  • Microscopy image files
  • Original and post-processed animal radiology images
  • Python script used to perform analysis
  • Text file describing steps for experiments
  • Copy of paper lab notebooks
  • Link to their electronic lab notebook (e.g., LabArchives, OneNote) containing the raw data

Clinical science:

  • Path to WCM OneDrive location containing clinical trials data
  • URL to NIH All of Us Research Program Researcher Workbench data set
  • File containing research-ready EHR data with protected health information (PHI)
  • REDCap de-identified data set, codebook, and/or case report form
  • SAS, STATA, and R files containing code (plus data files)
  • Word document detailing how to transform raw EHR data into research-ready data

How long does data need to be retained?

Primary data and supporting images must be available for at least six years after publication. If data and images are used in subsequent publication, or cited in a subsequent publication or grant application by faculty, then data must be available for an additional six years.

What is raw data?

WCM definition of raw data for retention purposes is the following:

“Data considered as raw data are any final file generated by instruments used to collect raw data prior to any additional filtration, data cleaning, or analysis work.”

WCM leaves to the PI the responsibility to determine which appropriate format to use for their raw according to the standards in their respective fields (i.e., the raw data format usually required by journals in the field). 

Please contact the library (https://library.weill.cornell.edu/ask-us) with any further questions.

Should I archive the raw data of a paper I authored but not as first or last author (co-author)?

If you are collaborating with a study team external to WCM, including institutions located abroad, and this collaboration results in publication(s), you are responsible for archiving the raw data that pertains to your contributions in the publication(s) (e.g., if your contribution was a figure, then provide the raw data & methods file for that figure). The same logic applies to grant applications.

IT Glossary

Type an acronym or term you would like a definition for.