What is a README file?
It is a text document which provides a clear and concise description of all relevant details about data collection, processing, analysis, and naming convention.
It should be easy to write and easy to read in order to be understandable by yourself and others in the future.
Best practices for your README file
- Creating a README file at the beginning of your research process, and updating it consistently throughout your research, will help you to compile a final README file when your data is ready for deposit.
- Write your README file as a plain text file and avoid proprietary formats whenever possible. However, PDF is acceptable when formatting is important.
- Follow the scientific conventions for your discipline.
- Store a README file with each distinct dataset that explains your file naming convention along with any abbreviations or codes you have used and any other necessary documentation.
- Locate the README file at the root of the project rather than in a sub-folder
- Use blank lines or dashes to separate the document into paragraphs, and use bullet points or ordered lists instead of long paragraphs
- Paragraphs order and title sections are not fixed, simply make it understandable
USEFUL LINKS:
- Cornell University’s Research Data Management Service Group’s Readme writing guide and template
- Naming convention best practices from UK Data Service
- Massachusetts Institute of Technology examples and guidance on organizing files and naming conventions
- Harvard’s Research Data Management Readme file explanation and template
- Harvard Biomedical Data Management’s README File Checklist
- EPFL University’s README file best practices and template
What should your README file contain?
1. General information
- Title of the dataset
- Contact information of the researcher/PI/data manager
- Date of data collection
- Funding sources or sponsorship that supported the research
2. Sharing and access information
- Licenses/restrictions, or limitations of reuse
- Links to publications that cite or use the data
3. Data and files overview
- A list of all files (or folders, as appropriate per dataset organization) contained in the dataset, with a brief description
- Guidelines to file naming with examples
- A complete list of any codes/abbreviations used
- Column headings for tabular data
- If the dataset includes multiple files that relate to one another, the relationship between the files or a description of the file structure
4. Methodological information
- Description of methods for data collection or generation (include links or references to publications or other documentation containing experimental design or protocols used)
- Description of methods used for data processing (describe how the data were generated from the raw or collected data)
- Any software or instrument-specific information needed to understand or interpret the data
- Definitions of all variables, abbreviations, missing data, codes, and unit of measurement
- Description of any quality-assurance procedures performed on the data