Skip to main content

Research data management: Data description

Research data

Describing and documenting data is essential in ensuring that the researcher, and others who may need to use the data, can make sense of the data and understand the processes that have been followed in the collection, processing, and analysis of the data.

Research data are any physical and/or digital materials that are collected, observed, or created in research activity for purposes of analysis to produce original research results or creative works. 

Research data can be generated for different purposes and through different processes, and can be divided into different categories such as numerical, descriptive or visual. Moreover, data may be raw or analysed, experimental or observational, confidential or publicly accessible. Research data can include laboratory notebooks, field notebooks, primary research data, questionnaires, audiotapes, videotapes, models, photographs, films and test responses.

View research project data summary from example data management plan. 

Example research project data summary

This is an example research project data summary from the data management plan for a fictitious research project.

This project involves the following data:

1. 50 physical datasheet records of underwater transect surveys
2. One digital record of transcribed survey data
3. Appr. 60 digital photographs of unidentified or ambiguous species taken by Canon PowerShot S100 cameras
4. One digital survey analysis file

Documenting research and metadata

Documentation is an essential component of research data management and allows researchers to make sense of data in the future. Providing sufficient information and standardised descriptions to highlight the key aspects of the research data will:

  • Establish context
  • Improve visibility
  • Potentially increase citations

Essential elements to consider when describing your data:

 

 

 

 

 

 

General overview

 

 

 

 

 

 

 

Title

 

Name of the dataset

Creator

Names and addresses of the organisations or people who created the data

Identifier

A unique number used to identify the data

Date

Key dates associated with the data, including project start and end date, time period and other important dates associated with the data. Preferred format is yyyy-mm-dd

Method

How the data were generated, equipment and software used (model and version numbers), formulae, algorithms, experimental protocols, and other things one might include in a lab notebook

Processing

How the data have been altered or processed (e.g. normalised)

Source

Citations to data derived from other sources, including location and access details of the source data

Content description

 

 

Subjects

Subjects include keywords or phrases describing the subject or content of the data. It can also include Field of Research codes and Socio-economic objective codes as defined by ANZSRC

Place

All applicable physical locations

Variable list

All variables in the data files, where applicable

Code list

Explanation of codes or abbreviations used in either the file names or the variables in the data files (e.g. '999 indicates a missing value in the data')

 

Technical description

 

 

 

 

 

File inventory

All files forming part of the dataset, including extensions (e.g., photo1023.jpeg’, ‘participant12.pdf’)

File Formats

Formats of the data, e.g., SPSS, HTML, PDF, JPEG, etc.

File structure

Organisation of the data file(s) and layout of the variables, where applicable

Version

Unique date/time stamp and identifier for each version

Necessary Software

Names of any special-purpose software packages required to create, view, analyse, or otherwise use the data

Access

Rights

Any known intellectual property rights, statutory rights, licenses, or restrictions on use of the data

Access information

Where and how your data can be accessed by other researchers

 

View data organisation and structure from example data management plan.


Metadata:

One element of data documentation is the description of the data, known as metadata. Metadata is often defined literally, as data about data, which refers to the information used to describe an item's attributes in a standardised format e.g. the author's name and title of a book in a library catalogue. Metadata can be used to describe physical items as well as digital items in many different forms, from free text (such as readme files) to standardised, structured, machine-readable content.

Metadata is key to ensuring that resources will survive and continue to be accessible into the future through:

  • Aid resource discovery
  • Organise electronic resources
  • Promote interoperability
  • Provide digital identification
  • Support archiving and preservation

Appropriate storage of documentation and metadata is just as important as the storage of the research data itself, as the metadata provides a descriptive meaning to raw research data.

Metadata can be made available without sharing the dataset. More information on sharing can be found under Publication.

Some disciplines have their own metadata schema. Each will have their own specified elements and structure.

View common metadata examples

Discipline

Metadata standard

General

Dublin Core (DC)

Metadata Object Description Schema (MODS)

Metadata Encoding and Transmission Standard (METS)

Arts

Categories for the Description of Works of Art (CDWA)

Visual Resources Association (VRA Core)

Astronomy

Astronomy Visualization Metadata (AVM)

Biology

Darwin Core

Ecology

Ecological Metadata Language (EML)

Geographic

Content Standard for Digital Geospatial Metadata (CSDGM)

Social sciences

Data Documentation Initiative (DDI)

 

 

 

 

 

 

 

 

 

 

             

The Edina Data Centre has created a video which explains metadata through a range of examples. 

Controlled vocabulary

A controlled vocabulary reflects agreement on terminology used to label concepts. When research communities agree to use a common language for the concepts in datasets, then the discovery, linking, understanding and reuse of research data are improved. Research Vocabularies Australia (RVA) is one of the tools, which make it easy to find and use controlled vocabularies used in research as well as assisting Australian research organisations to publish, re-purpose, create, and manage their own controlled vocabularies.

Describe the data

   Research Data Management by Janneke Staaks 

More resources