Data Documentation and Metadata

Data Documentation

Data documentation will ensure that your data will be understood and interpreted by any user. It will explain how your data were created, the context for the data, the structure of the data and their contents, and any manipulations to the data.

What is important to document?

Context of data collection
Data collection methodology
Structure and organization of data files
Data validation and quality assurance
Data manipulations through data analysis and use conditions

Data-level documentation

Variable names and descriptions
Definition of codes and classification schemes
Codes of, and reasons for, missing values
Definitions of specialty terminology and acronyms
Algorithms used to transform data
File format and software used

Example Readme Files

README.txt files are text files that allow researchers to keep textual notes on their digital data files. These README.txt files contain documentation that is easily and immediately understandable. They allow you to add notes about the organization and content of your digital files and folders, which helps other researchers or colleagues to navigate the data. Ideally, README.txt files are kept at the top level of a project folder to provide the purpose of the project, the relevant summary and contact details, and general organization of files. Think of them like the first page of your lab notebook.

All-purpose, structured README.txt: example 1 and example 2 use an Open Science Framework template
Comprehensive README.txt: Part of a survey dataset, this readme documents the data analysis process and explains each file and folder. The dataset includes license files. The dataset has also been published in Zenodo, capturing many of the metadata fields suggested in the Metadata section below.
Simple README.txt: For easy data management, add a simple README.txt to your project folders

Metadata

Metadata describes the origin, purpose, time, geographic location, creator, access, and terms of use of the data. Information in the metadata is used to retrieve and index data in a repository or archive, and enables citation of the data. Metadata can be harvested for data sharing through the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH).

There are a variety of metadata standards, usually for a particular file format or discipline. Some examples include the following:

An excellent guide to medical metadata for research data is Johns Hopkins Guide to Documenting Research Data.

Consult these directories for comprehensive lists and tools of discipline-specific metadata.

The Wood Library can help you select the most appropriate metadata standard to use. Contact Wood Library.

When creating metadata, a best practice is to use controlled vocabulary or standard terminology for your discipline. Using a controlled vocabulary or an authority list will help in the retrieving and indexing of your data.

Consider keeping metadata records in a spreadsheet, CSV file, or tab-delimited file. Additional information to interpret the metadata, such as explanations of variables, codes, acronyms, abbreviations, or algorithms, should be included as accompanying documentation.

Suggested Metadata Elements

The Wood Library suggests the following metadata elements. In their simplest form, these can be included as part of a README.txt file. The Open Science Framework README.txt template contains a minimal set of elements.

suggested metadata elements

Related Articles