Skip to main content

Research Data Management: Data description

Research data

Describing and documenting data is essential in ensuring that the researcher, and others who may need to use the data, can make sense of the data and understand the processes that have been followed in the collection, processing, and analysis of the data.

Research data is any physical and/or digital materials that are collected, observed, or created in research activity for purposes of analysis to produce original research results or creative works. 

Research data can be generated for different purposes and through different processes, and can be divided into different categories such as numerical, descriptive or visual. Moreover, data may be raw or analysed, experimental or observational, confidential or publicly accessible. Research data can include laboratory notebooks, field notebooks, primary research data, questionnaires, audiotapes, videotapes, models, photographs, films and test responses.

View research project data summary from example data management plan. 

Example research project data summary

This is an example research project data summary from the data management plan for a fictitious research project.

This project involves the following data:

1. 50 physical datasheet records of underwater transect surveys
2. One digital record of transcribed survey data
3. Appr. 60 digital photographs of unidentified or ambiguous species taken by Canon PowerShot S100 cameras
4. One digital survey analysis file

Documenting research and metadata

Documentation is an essential component of research data management and allows researchers to make sense of data in the future. Providing sufficient information and standardised descriptions to highlight the key aspects of the research data will:

  • Establish context
  • Improve visibility
  • Potentially increase citations

Essential elements to consider when describing your data:







General overview










Name of the dataset


Names and addresses of the organisations or people who created the data and their unique identifiers (ORCID, ResearcherID, etc)


A unique number used to identify the data (DOI, Handle, IGSN)


Key dates associated with the data, including project start and end date, time period and other important dates associated with the data. Preferred format is yyyy-mm-dd


How the data was generated, equipment and software used (model and version numbers), formulae, algorithms, experimental protocols, and other things one might include in a lab notebook


How the data has been altered or processed (e.g. normalised)


Citations to data derived from other sources, including location and access details of the source data

Content description




Subjects include keywords or phrases describing the subject or content of the data. It can also include Field of Research codes and Socio-economic objective codes as defined by ANZSRC


All applicable physical locations

Variable list

All variables in the data files, where applicable

Code list

Explanation of codes or abbreviations used in either the file names or the variables in the data files (e.g. '999 indicates a missing value in the data')


Technical description






File inventory

All files forming part of the dataset, including extensions (e.g., photo1023.jpeg’, ‘participant12.pdf’)

File Formats

Formats of the data, e.g., SPSS, HTML, PDF, JPEG, etc.

File structure

Organisation of the data file(s) and layout of the variables, where applicable


Unique date/time stamp and identifier for each version

Necessary Software

Names of any special-purpose software packages required to create, view, analyse, or otherwise use the data



Any known intellectual property rights, statutory rights, licenses, or restrictions on use of the data

Access information

Where and how your data can be accessed by other researchers


View data organisation and structure from example data management plan.


One element of data documentation is the description of the data, known as metadata. Metadata is often defined literally, as data about data, which refers to the information used to describe an item's attributes in a standardised format e.g. the author's name and title of a book in a library catalogue. Metadata can be used to describe physical items as well as digital items in many different forms, from free text (such as readme files) to standardised, structured, machine-readable content.

Metadata is key to ensuring that resources will survive and continue to be accessible into the future through:

  • Aid resource discovery
  • Organise electronic resources
  • Promote interoperability
  • Provide digital identification
  • Support archiving and preservation

Appropriate storage of documentation and metadata is just as important as the storage of the research data itself, as the metadata provides a descriptive meaning to raw research data.

Metadata can be made available without sharing the dataset. More information on sharing can be found under Publication.

Some disciplines have their own metadata schema. Each will have their own specified elements and structure.

View common metadata examples


Metadata standard


Dublin Core (DC)

Metadata Object Description Schema (MODS)

Metadata Encoding and Transmission Standard (METS)


Categories for the Description of Works of Art (CDWA)

Visual Resources Association (VRA Core)


Astronomy Visualization Metadata (AVM)


Darwin Core


Ecological Metadata Language (EML)


Content Standard for Digital Geospatial Metadata (CSDGM)

Social sciences

Data Documentation Initiative (DDI)












The Edina Data Centre has created a video which explains metadata through a range of examples. 

Readme.txt files

A readme.txt file is a collection of very simple metadata provided alongside a dataset when researchers publish their data. It describes key details of the dataset just in case end users access the dataset without seeing or finding the metadata beforehand. Files like these help make published data more robust and improve the long-term usability of the data.


If you publish your data through Curtin, the library will automatically create a readme.txt file for you from the submitted information. If you are publishing in a subject or discipline specific repository and would like help creating a useful readme.txt, please email us at

Controlled vocabulary

A controlled vocabulary reflects agreement on terminology used to label concepts. When research communities agree to use a common language for the concepts in datasets, then the discovery, linking, understanding and reuse of research data are improved. Research Vocabularies Australia (RVA) is one of the tools, which make it easy to find and use controlled vocabularies used in research as well as assisting Australian research organisations to publish, re-purpose, create, and manage their own controlled vocabularies.

Describe the data

   Research Data Management by Janneke Staaks 

More resources