Make your data FAIR
Answer
This article contains the following sections:
- What are the FAIR Data Principles?
- Strategies to make data FAIR
- Findable
- Accessible
- Interoperable
- Reusable
- Further points to consider
- Enhancing the FAIRness of data requires planning
- FAIR enables a continuum of increasing usability
- FAIR data and open data are not the same
- FAIR applies to both data and metadata
- FAIR is not a standard
- FAIR Data Checklist
What are the FAIR Data Principles?
The FAIR Data Principles were formally published in 2016, and act as a guideline for enhancing the reuse of research data. This increases the value of data, increases the impact, transparency and reproducibility of the research, and accelerates knowledge discovery.
The term ‘FAIR’ is an acronym used to convey the importance of making data Findable, Accessible, Interoperable and Reusable by both humans and machines. Including machines in scope is important since humans increasingly rely on computational systems to efficiently retrieve and analyse data. By following the FAIR Principles, researchers can ensure that their data are well-described, easy to locate, and readily reusable in diverse contexts.
The FAIR Principles are widely endorsed by research communities, governments, funders and other stakeholders e.g. the G20, UKRI and UNESCO.
Strategies to make data FAIR
Findable
FAIR Principles specify:
- F1. (meta)data are assigned a globally unique and persistent identifier
- F2. data are described with rich metadata (defined by R1 below)
- F3. metadata clearly and explicitly include the identifier of the data it describes
- F4. (meta)data are registered or indexed in a searchable resource
Recommended practices:
- Deposit data in a trusted data repository that will assign a unique persistent identifier (e.g. a DOI) to the data, and ensure that the metadata is indexed in search services (e.g. Google, Scopus, Web of Science).
- Provide rich metadata that meaningfully describes the data. Metadata can be included in the published metadata record or in a supplementary file such as a README. The more generous the metadata record, the more specifically findable it becomes. It is difficult to generalise about a minimum ‘richness’ of metadata, but to illustrate, consider including a description of the data’s content and structure, and outlining the research context (e.g. its purpose, objectives, methodology and findings).
- Ensure that the metadata incorporates a citation for the data and includes the unique persistent identifier in that citation – some repositories do this for you.
- Provide a data access statement in associated publications to explain where and how the data can be accessed and any restrictions on accessing the data.
Accessible
Accessible means that once discovered, data and its metadata can be accessed or retrieved by humans and computers.
FAIR Principles specify:
- A1. (meta)data are retrievable by their identifier using a standardized communications protocol
- A1.1 the protocol is open, free, and universally implementable
- A1.2 the protocol allows for an authentication and authorization procedure, where necessary
- A2. metadata are accessible, even when the data are no longer available
Recommended practices:
- Ensure the data can be accessed or retrieved via its unique persistent identifier (e.g. a DOI) using an open, standard protocol (e.g. HTTP), which trusted data repositories typically support.
- If data cannot be openly shared (e.g. for privacy or confidentiality reasons):
- choose trusted data repositories that support controlled or restricted access.
- publish a metadata-only record (i.e. a description that does not include the data files) that explains how the data can be accessed and under what conditions.
- If data is retired or otherwise unavailable, ensure that the metadata remains openly accessible for discovery and transparency.
Interoperable
Interoperable means data can be integrated with other data, software and workflows. This requires data and metadata to be supplied in formats that can be easily used and interpreted by humans and computers.
FAIR Principles specify:
- I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
- I2. (meta)data use vocabularies that follow FAIR principles
- I3. (meta)data include qualified references to other (meta)data
Recommended practices:
- Use widely adopted and preferably open file formats wherever possible.
- Use established standards for metadata, where available. For listings of metadata standards, consult FAIRsharing.
- Use controlled vocabularies, taxonomies, thesauri, and ontologies, where possible. For listings of these tools, consult FAIRsharing.
- Use metadata to provide meaningful references to any related resources (e.g. publications, other data) and use unique persistent identifiers (e.g. DOIs) where available.
Reusable
Reusable means that data and its metadata are well-described, enabling correct and appropriate reuse by humans and computers.
FAIR Principles specify:
- R1. meta(data) are richly described with a plurality of accurate and relevant attributes
- R1.1. (meta)data are released with a clear and accessible data usage license
- R1.2. (meta)data are associated with detailed provenance
- R1.3. (meta)data meet domain-relevant community standards
Recommended practices:
- Ensure that documentation and metadata provide a rich description of the data’s attributes (e.g. using a README file, data dictionary to explain variables, codebook to explain thematic codes) and share supporting resources (e.g. protocols, survey templates, consent templates) in order to enable correct interpretation and support the widest variety of reuse cases.
- Ensure that documentation and metadata provide detailed information about the provenance of the data i.e. how it was created and processed, and by whom. As appropriate, provenance should include details of why, where, and when data were generated, together with information supporting confidence in and validity of the data. The richer the provenance, the more trustworthy the data, and the better others can assess whether the data is suitable for the intended reuse and what further processing might be needed to reuse it appropriately.
- Where available, use discipline-specific metadata standards and follow community norms for data archiving and sharing. For listings of metadata standards, consult FAIRsharing.
- Provide an appropriate license with the data to clarify its reuse conditions. Wherever possible, apply open licenses (e.g. Creative Commons licenses such as CC0, CC BY) to maximise reuse. Only use more restrictive licenses where justified e.g. to protect legitimate commercial interests.
Further points to consider
Enhancing the FAIRness of data requires planning
To enhance the FAIRness of your data it’s important to embed recommended FAIR practices (outlined above) into data management activities from the outset. Writing a data management plan before the research starts will help you translate the FAIR principles into concrete, actionable steps. This also ingrains FAIRness by design rather than as an afterthought, which is more efficient on time and costs than implementing measures later in the project lifecycle.
FAIR enables a continuum of increasing reusability
Data should not be considered as being only either ‘FAIR’ or ‘not FAIR’. Instead, data can be made incrementally more FAIR. Whilst making data ‘completely FAIR’ is a desirable outcome, it may be more pragmatic to be ‘FAIR enough’ for a particular purpose or use case.
FAIR data and open data are not the same
Data can be both FAIR and open, just one of these, or neither.
Open data is data that can be freely used, re-used and distributed by anyone. In contrast, it may be necessary to control access to data if there are legitimate reasons for protecting it (e.g. due to legal, ethical, contractual, or intellectual property constraints). Nonetheless, data with access restrictions can still be FAIR. This is achieved by ensuring that rich metadata and documentation are publicly available, which explains the data, its provenance, and the specific conditions under which it can be accessed and re-used. This approach follows the guiding maxim that "data should be as open as possible and as closed as necessary".
A key difference between open data and FAIR data is that open data focuses on the unrestricted release of data, whereas FAIR ensures that data is organised and documented in such a way that it is easily accessible and reusable by others. Nonetheless, FAIR data and open data are not competing concepts but are different facets of effective data stewardship, with FAIR providing a flexible approach that supports openness while also addressing cases where data must remain restricted.
In essence, not all FAIR data is open, but for data to be truly open and reusable in a research context, it should be FAIR.
FAIR applies to both data and metadata
In the FAIR principles, “(meta)data” refers to both the data and the metadata (i.e. the information about that data). The provision of rich metadata is essential for FAIR since it allows data to be found and to be interpreted correctly and appropriately.
FAIR is not a standard
FAIR offers high-level, domain-independent guiding principles. This enables communities of practice to choose how to translate those principles into practical applications for their disciplinary contexts. To illustrate, several international organisations and projects (e.g. GO FAIR, FAIR-IMPACT, Research Data Alliance, ELIXIR, Research Software Alliance) are providing an overarching framework and coordination for FAIR data initiatives across various research domains.
FAIR Data Checklist
The following checklist summarises key points to help you consider how FAIR your data are, and what measures could be taken to improve its FAIRness:
Jones, S., & Grootveld, M. (2017). How FAIR are your data?. https://doi.org/10.5281/zenodo.5111307