What is research data?
Answer
Defining research data
“Research data are the evidence that underpins the answer to the research question, and can be used to validate findings regardless of its form (e.g. print, digital, or physical).” Concordat on Open Research Data
Research data may be primary data generated or collected by the researcher, or secondary data that is collected and derived from existing sources by further analysis.
Research data can also include information about the means necessary to verify findings or replicate results, such as computer code, methods, instruments used, and essential interpretive and contextual information such as specifications of variables.
In practice, research data can look very different depending on the field of study. To illustrate, they might include documents, spreadsheets, laboratory notebooks, fieldwork observations, diaries, questionnaires, transcripts, codebooks, audio and video recordings, physical samples, biological and medical specimens, artworks, archives, etc.
Types of research data
Research data can be broadly grouped into five categories:
Observational data: Recorded in real time by observation of activities. This data can be unique and irreplaceable. Examples include remote sensing data, species abundance surveys, ethnographies, archaeological samples.
Experimental data: Generated in controlled environments. This data is often reproducible but that may be expensive. Examples include analyses of material properties, clinical trial data, measurements of crop yields for different levels of ozone exposure.
Simulation data: Generated from test models. The model and metadata (data about the model, code, computing environment, and input conditions) may be more important than the output data that is generated. Examples include climate and economic models.
Derived data: Generated by processing or combining existing data, often from many sources. Examples include compiled databases that aggregate data from multiple secondary sources, collections of digitised materials, corpora collected by text mining.
Reference data: Published and curated data, typically as part of managed collections. Examples include national statistics archives, image archives, gene banks, crystal structure databases.