Skip to Main Content

Data Management Toolkit

Documentation & Metadata

Documentation


Documentation gives data context and meaning through the inclusion of metadata and supplementary files. Proper documentation should include descriptions of the research workflow, decisions made during the research process, and how the data has been manipulated.

The goal of data documentation is to give enough meaning to the data so as to limit future uncertainty about the data. Documentation is essential to the usability and longevity of data.


Metadata

Metadata is structured data that describes your dataset. There are both general purpose and discipline-specific metadata standards that can be adapted for a research project. The Department of Defense created the Department of Defense Discovery Metadata Specification (DDMS) in 2003, which is a metadata standard loosely based on Dublin Core - a widely-used metadata vocabulary. If possible, DoD projects should use DDMS. For projects outside of the DoD, the UK's Digital Curation Center (DCC) maintains an up-to-date list of discipline-specific metadata standards.

Not all disciplines have their own metadata standards and some projects will fit neatly into an existing standard. If that's the case, a researcher can use a general purpose metadata schema or create their own. If one opts to create their own metadata schema, the reasons for doing so should be included in their DMP.

Generally, the following metadata is included in a metadata standard:

  • Title
  • Principle investigator(s)
  • Date created
  • Unique identifier
  • Subject
  • Funders
  • Rights
  • Access information
  • Description
  • Recommended citation

When writing this portion, ask yourself:

What metadata is required to understand this data?

Not all data requires the same metadata. The complexity of metadata can vary between projects, no two will look the same. Think about your data. If you were a researcher searching for this data, what keywords would you use to search for it? What in the data would need explanation? Those will be areas you need to provide metadata for.

Will you use a pre-existing metadata standard?

As stated above, projects done in support of the DoD should use DDMS metadata when possible. If you are not using DDMS, explain what standard you are using and why, or why you have opted not to use a standard.


Supplementary Files

Supplementary files are separate from your data files, but deposited and saved with them.

 

Readme Files

Metadata can be saved as a readme file and kept with the data it describes. When writing a readme file, start early in your research project, use an outline, update the file throughout the project, and deposit the readme file along with your data.

For more information on writing a readme file, see Cornell University's guide to writing "readme" style metadata and readme file template.

 

Data Dictionaries

A data dictionary is a file that describes each element of data in a database or system. Many database and software engineering tools include built-in data dictionaries, which generate documentation automatically. If writing a data dictionary from scratch, use a template like this data dictionary template from the U.S. Department of Agriculture AG Data Commons.

Data dictionaries often include:

  • Names and definitions of data objects
  • Properties of data elements
  • Entity-relationship and system-level diagrams
  • Reference data
  • Missing data
  • Quality-indicator codes
  • Business rules