In the world of research there is agreement that the social science community must improve its capability for transparency and replicability through appraising and promoting trust in publicly-funded research. Research means evidence which means data. Research funders and publishes of journal articles now expect researchers to make explicit and share the data sources they have used to underpin their findings.

The sharing of data is often the last thing on the priority list of a busy researcher, from senior academic to PhD student, and data being shared often suffer from being a ‘quick and dirty’ upload. Research data get uploaded in various repositories around the world, run by specialist data centres, by university repositories and by journals, and almost every ‘data publisher’ uses a different way of checking data they acquire. Data quality is not always rigorously assessed, partly due to the lack of skills of repository managers in appreciating disciplinary issues or the detail of data.

Based on a detailed appreciation of what makes a high quality dataset, and what checks can be made and how errors might be noted, this project aims to pass on this expertise to the research and data publishing communities via an easy-to-use tool that assesses quantitative data for known quality issues; and to create associated training materials that make explicit data quality assessment of numeric data. QAMyData will offer an easy-to-use tool/service that automatically detects some of the most common problems in numeric data and creates a ‘data health check’. Submission can be done multiple times until a ‘clean bill of health’ is produced, and issues identified are remedied. Clean data receives a clean bill of health certificate plus outputs a high quality codebook/data dictionary – both useful takeaways to demonstrate quality assurance for onwards submission to a journal or data repository.

In summary, the tool will be useful to those people charged with having to, or wanting to, share their research data, or reuse less than clean data. The associated training to be delivered through the UK Data Service, AQMen and NCRM can help to improve awareness of what makes high quality data.

The proposed key outputs are:

  • A beta lightweight, open-source tool for implementation of quality assessment of research data. The system will create for the user: a report or ‘data health check’ that identifies the most common problems in data submitted; a code book; and suggestions, including code, for fixing errors or cleaning the data.
  • A short evaluation report based on evaluation of the tool, algorithm and upload process with researchers, teachers, students and data repositories, including partner international data archives, university data repositories and journal repositories, and peer review set-ups.
  • A practical work-through guide and presentation focusing on understanding and assessing data quality for data production and analysis to support research methods training, repository ingest processes and peer review of data.
  • A limited advocacy and promotional campaign of the tools and materials showing how to attain high data quality and in support of the transparency and replication agendas; piggy backing on NCRM, other repositories and journals media outlets.
  • A workshop and a webinar, via NCRM, on assessing data quality, as part of their AQMeN Data Science for Social Research Workshops.
  • A workshop and a webinar, via UKDS, on the tool itself and the underpinning algorithms, aimed at repository managers and workflow.

Our Advisory Board has a number of high profile data repositories and publishers on board who we will be working with to help scope and evaluate the tools and functionality.


Principal Investigator: Louise Corti, UK Data Service

Co-Investigator: Vernon Gayle, AQMen, University of Edinburgh

Funders: National Centre for Research Methods (NCRM), Economic and Social Research Council (ESRC)

Project dates: 8 January 2018 – 8 January 2019 (12 months)

Back to top  

Discover UK Data Service