Who is the Research Data Retention policy for?
These policies apply to the Cornell community based out of Ithaca, Weill Cornell Medicine, and Cornell Tech.
Who should read the Research Data Retention Policy?
Anyone at Cornell University, WCM, or CornellTech who are involved in the design, conduct, or reporting of research at Cornell University.
What is the purpose of the Research Data Retention policy?
- This policy defines the shared responsibilities of Cornell University and Weill Cornell University together with Cornell researchers in collecting, retaining, securing, accessing, publishing, and sharing research data.
- The policy’s main stipulation is that research data must be preserved in sufficient detail for an adequate period of time to comply with sponsor requirements, federal, state, and local regulations, and inquiries involving the research. This allows the University to respond to any questions about accuracy, authenticity, primacy, and compliance with laws and regulations governing the research.
What does the Cornell University Research Data Retention policy mean for WCM researchers?
Section 1.38 details the responsibilities of WCM faculty regarding collection and retention of research data and should be read carefully by WCM researchers. Your main responsibilities include:
- Data management plan: WCM faculty must create, abide by, and fund a data management plan that specifies where they will deposit data at the close-out of research.
- What is close-out of research? For funded research, close-out is whichever comes first of these two events:
- The end of the grant or contract agreement OR
- 60 days prior to faculty member leaving institution
- Data deposit: Faculty must enter required metadata and method description into the WCM Institutional Data Repository for Research (WIDRR). Data itself necessary for research replication and audit can be deposited into WIDRR, or in an accepted national repository with instructions for access deposted into WIDRR.
- When does deposit need to occur? After publication or within three years after final project closeout of all funded or unfunded research.
- How long does data need to be retained? Primary data and supporting images must be available for at least six years after publication. If data and images are used in subsequent publication, or if the original publication is cited in another publication or grant application by the same faculty member(s), they must be available for an additional six years from the date of the most recent citation.
What kind of data do I need to deposit?
Faculty must enter into the WCM Institutional Data Repository for Research (WIDRR), at minimum, the location of their datasets (including raw data), and provide a methods file sufficient for replication and audit of the research.
In addition, faculty should:
- Include a pointer link to any datasets that are in a WCM-designated external respository. A list of accepted repositories can be found here: https://www.nlm.nih.gov/NIHbmic/domain_specific_repositories.html. If your repository is absent from this list, please contact the Library.
- Associate a dataset to a milestone(s) and a project.
- Provide a methods file describing all the steps of your analysis starting from the raw data input file up to the published data output file. Faculty should specify all the software, parameters, and code used in the methods file. Intermediate experiments and data that are not necessary for replication of the research do not need to be preserved.
Examples of what types of data to provide:
Basic and translational science
- URL to dbGaP for BAM file(s)
- Path to the WCM file share with proteomics data set
- File of mass spectrometry output
- Microscopy image files
- Original and post-processed animal radiology images
- Python script used to perform analysis
- Text file describing steps for experiments
- Copy of paper lab notebooks
- Link to your electronic lab notebook (e.g. LabArchives, OneNote) containing the raw data
Clinical science
- Path to WCM OneDrive location containing clinical trials data
- URL to NIH All of Us Research Program Researcher Workbench data set
- File containing research-ready EHR data with protected health information (PHI)
- REDCap de-identified data set, codebook, and/or case report form
- SAS, STATA, and R files containing code (plus data files)
- Word document detailing how to transform raw data into research-ready data
When does the deposit of data into WIDRR need to occur?
Data should be deposited into WIDRR upon any of these three research milestones:
- within six months of publication
- wthin three years of final project closeout of all funded or unfunded research (i.e. your grant ended)
- 60 days prior to faculty member leaving position at Weill Cornell Medicine
What is considered close-out of research?
- For funded research, close-out is whichever comes first of these two events:
- The end of the grant or contract agreement OR
- 60 days prior to faculty member leaving institution
How long does data need to be retained?
Primary data and supporting images must be available for at least six years after publication. If data and images are used in a subsequent publication, or cited in a subsequent publication or grant application by faculty, then data must be available for an additional six years.
What do we mean by raw data?
The WCM definition of raw data for retention purposes is the following:
- Data considered as raw data are any final file generated by instruments used to collect raw data prior to any additional filtration, data cleaning, or analysis work. If you have any question about what qualifies as raw data, please contact a Samuel J. Wood Library data retention specialist.
WCM leaves to the PI the responsibility to determine which appropriate format to use for their raw data according to the standards in their respective fields (i.e. the raw data format usually required by journals in the field).
Should I archive the raw data of a paper I authored but not as first or last author (co-author)?
If you are collaborating with researchers outside of the institution, including institutions located abroad, and this collaboration results in (a) published manuscript(s), you are responsible for archiving the raw data that pertains to your contributions in the publication(s). The same logic applies to grant applications.
Who will have access to the research data in the repository?
Cornell University and WCM: Cornell and WCM have the right to access all research data generated under their auspices, supported by their funds, or conducted using their facilities. They have the right to take custody of research data.
WCM Faculty, PI: WCM faculty, PI, or other researcher who leaves the University may request a copy of research data for projects on which they have worked. Submit requests to Senior Associate Dean of Research or Senior Associate Dean of Clinical Research.
Section 1.3.10 contains full details on Research Data Access.
What does the Research Data Retention Policy say about publication?
- The Principal Investigator (PI) has the right and responsibility to ensure that research is accurately reported, as well as to select the vehicle most appropriate for publication.
- The PI must ensure that any figures, tables, images, data, or assertions included in the publication can be defended from the raw data, and that they are not manipulated.
Section 1.3.11 contains full details on Publication.
What are Cornell and WCM's rights and responsibilities in regard to research data?
Cornell and Weill Cornell Medicine own the research data and related property rights that arise from the activities of their researchers and others who use university resources. This includes resources provided through grants, contracts, awards, or gifts.
What are the Principal Investigator’s (PI) rights and responsibilities in regard to research data?
Within the limits set by the superseding authority of Cornell, agreements with collaborators, and any applicable terms within sponsored agreement, the Principal Investigator (PI) has the right and authority to control the use of and access to any research data conducted under their management, including data used in publications or presentations. The PI is responsible for maintaining and retaining data, and ensuring their integrity. Full list of responsibilities found in section 1.3.6.
How do I access the WCM Data Retention & Export Control Attestation?
The WCM Data Retention & Export Control Attestation, a 20-minute course covering both new Cornell University policies, will be available as of July 11, 2022 in the Learning module of WBG (LMS). The Attestation records your understanding of and agreement with the research data policies.
How to access the WCM Data Retention & Export Control Attestation:
1. Navigate to the Weill Business Gateway (wbg.weill.cornell.edu) and click the Learning tile. Or use the link: http://sf-lms.weill.cornell.edu/
2. On the My Learning homepage, there is a Find Learning search box. Type WCM Data Retention & Export Control Attestation in the search box and click Go.
For full instructions view this Knowledge Article: HowTo: Find a Course in SuccessFactors Learning Management System
If the Attestation has been assigned to you, follow these instructions: HowTo: Start a Course in the SuccessFactors Learning Management System
When do I have to take the WCM Data Retention & Export Control Attestation?
As of July 11, 2022, Faculty have 60 days to take the course and attest that they abide by its policies. The Attestation should be complete no later than September 16, 2022.