Research data management

Benefits

All research data has value beyond the original project. By publishing your data and making it available to others, you can improve the impact of your research significantly and help make the data more findable, accessible and reusable.

Positive outcomes of data publication include:

Improving the body of knowledge in your discipline - all research is built on earlier understanding; by contributing the growth of that understanding, your whole discipline or field of research will improve.
Reliability - your research is more reliable with increased reproducibility; publication of research data is a key factor in improving reproducibility.
Citations - making the data underlying your publication available has been proven to improve article citation counts.
Professional connections - When other researchers are able to access, understand and reuse your data and are able to contact you regarding your research, your professional networks will improve faster.

Publication to the Curtin Research Data Collection is provided to Curtin researchers by the library.

Curtin Research Data Collection
Information regarding data publication and the Curtin Research Data Collection, including how to publish your data.
Benefits of data sharing for you
Information from Springer Nature on the benefits for researchers from data publication.
Data Management Expert Guide - Archive & Publish
Information from the the Consortium of European Social Science Data Archives (CESSDA) about publishing and sharing research datasets.

Open data recommendations and mandates

In order to maximise the benefit of funded research, an increasing number of funders are recommending or requiring data arising from projects to be openly published - by making the data or metadata available, the research findings become far more findable and reusable by other researchers. An example are the guidelines for ARC Grants which strongly recommend open data publication.

In order to address concerns about research reproducibility and integrity, some journals and major publishers now require that the data supporting any articles published is made publicly available:

BioMed Central
Elsevier
Nature
PLOS
Springer

These data publication requirements can be sometimes waived if the data is sensitive or confidential, or if the data cannot be sufficiently anonymised.

If your funding agreement or publication plans require you to publish your data, please get in touch with the Research Data Team, who will be able to help you publish it to the Curtin Research Data Collection.

Curtin Research Data Collection
Information regarding data publication and the Curtin Research Data Collection.

Identifiable data

The sensitive nature of some research may generate research datasets that would risk identifying the research participants if the dataset was made publicly available.

In these cases, before publication the dataset should be changed in order to remove the information that might lead to the unwilling identification of the participants. This process of changing is often called de-identification or anonymisation. By removing these identifying elements, a researcher can still reap the benefits of making their data available, while respecting the privacy of their research subjects.

Methods of anonymising include:

Aggregating locational or population data
Transcription redaction
Replacing respondent identifiers with generic identifiers

Methods such as these are outlined in the links below from the ARDC and the UK Data Service.

ARDC De-identification Guide
Information and links to Australian and international practical guidelines and resources on how to de-identify datasets.
Anonymisation
An excellent resource from the UK Data Service outlining methods of de-identifying qualitative and quantitative datasets.
Data Anonymisation (McGill University Workshop Series 2023) An in-depth video series with slides demonstrating theoretical and practical approaches for qualitative and quantitative data anonymisation and de-identification.

Permissions and licensing

To make it easy for your published dataset be reused appropriately, it’s critical to let other researchers know clearly what they can and can’t do with the dataset. These permissions are usually set out and clarified by applying a license to the published dataset.

Applying a standard, robust, well-defined license to a dataset means that anyone reusing that dataset will be confident in what they have permission to do and they can act in legal certainty.

For licensing datasets, Curtin generally recommends using the Creative Commons Attribution License (CC BY). When the data is in the form of software/code, the BSD 3-Clause Software Licence is recommended. Both these licenses will allow anyone to reuse your dataset for any purpose, as long as they give you proper attribution and credit.

Research Data Rights Management Guide
Very useful information from the ARDC on licensing research datasets and related issues.
Creative Commons Attribution (CC BY) license details
Information on the CC BY license, recommended for publishing datasets.
The 3-Clause BSD License
Information on the 3-Clause BSD License, recommended for publishing software or code.

Persistent identifiers

One of the benefits of publishing your data through Curtin’s data publication service is that the data will be given a DOI (Digital Object Identifier). Persistent identifiers such as DOIs are used to identify a particular resource and avoid any ambiguity - you can be certain that if people are referring to an item with the same DOI, they are referring to the exact same resource. DOI links are easily created by adding the https://doi.org/ address before a DOI. If the digital location of the resource changes, the DOI can updated to point to the new location.

If your dataset or research output meets the criteria listed in the link for the Curtin Research Data Collection below, you can obtain a DOI for free.

Another identifier becoming more common are IGSNs (International Geo Sample Number). These provide unique identifiers for physical samples and are currently used in mineral sampling and processing. More information can be found about IGSNs at the link below.

Researcher identifiers

Another commonly used identifier are ORCIDs, which identify individual researchers. They are also free to obtain and are widely used in publication and may sometimes be required to publish. They also help connect systems of measuring research impact to the correct researcher.

Curtin Research Data Collection
Information about the Curtin Research Data Collection.
ISGNs
The ARDC page for IGSNs (International Geo Sample Numbers).
ORCID and researcher identifiers
The Curtin Library guide on ORCID, including how to create one and how link it to your research.

Embargoes

An embargo is a request by a researcher to delay the publication of their dataset until a specified time. Embargoes (or embargo periods) are most commonly used when researchers want to publish datasets, but are currently unable to due to reasons such as data sensitivity, impending publication plans or industry/funder agreements. This delay can help researchers who still wish to receive the benefits of publishing data when external factors currently prohibit its publication.

Additionally, researchers may choose to make the metadata for a dataset available immediately but only provide access to actual dataset after the embargo period.

As applying an embargo restricts the Accessibility of the data, researchers may be asked to provide a case or justification to funders/publishers for the application of the embargo.