Link Search Menu Expand Document

What is README?

A README is a guide to your dataset and is usually a plain text file to maximize its usability and long-term preservation potential. The purpose of a README is to assist other researchers to understand your dataset, its contents, provenance, licensing and how interact with it. README files are generally named README, readme.txt or and are included as a component of a dataset.

In short, A README is the portable, durable way to provide information to other researchers about how to use your dataset.

Looking for a cheat sheet? Check out our one-pager!

Table of contents

README or Metadata for repositories?

When you deposit your data to data repositories (e.g. Dataverse or FRDR), you are asked ask to provide metadata. A README complements but does not replace the metadata for repositories.

The best practice is to record information in both the repository’s metadata and the README. The repository’s metadata will support findability within and between data repositories while the README is portable and continues to describe the dataset after it has been separated from its original context. In all cases, you should use any conventions appropriate to your discipline to record the information about your dataset.

Exercise 1

Please help us to make sense of a dataset.

Access a dataset:

Clark, Luke, 2019, “Role Reversal: The Influence of Slot Machine Gambling on Subsequent Alcohol Consumption”,, Borealis, V1, UNF:6:zsehCAz4agntvPwDZF03OA== [fileUNF]

Download the data file “Gambling_Alcohol_Study” in the Original File Format. Examining the data, try to answer the following questions:

  1. Describe different gambling conditions in this study.
  2. A variable named “ResultingBAC”, what does it mean?
  3. How was the data collected?

Please share your experience in Padlet.

How Do I Create a Readme?

The Content

Core elements of any README include:

  • Contact information for the researcher(s)
  • The use license for your data (unless that is included in a separate file)
  • Your data collection methods (protocols, sampling, instruments, coverage, etc.)
  • The structure of files
  • Naming conventions for files, if applicable
  • The sources you used
  • Your quality assurance work (data validation, check-ing)
  • Any data manipulations or modifications
  • Data confidentiality and permissions
  • The names of labels and variables adn explanations of codes and classifications -i.e. a data dictionary or a codebook

Cornell University’s Guide to writing “readme” style metadata provides a thorough description of the content that you should consider including in your README file.

The Style

How you write your README is as important as the information you include. Always remember to be as clear as possible. The following are some best practices related to data documentation:

  • Don’t use jargon
  • Define terms and acronyms
  • Make it machine-readable (avoid special characters)

The Process

Document your work as you go, so you don’t lose track of any details. If you wait until the end of your project, you might already have lost or forgotten valuable information.

You can create a README using any text editor (e.g. TextEdit, Notepad++,, Sublime Text) or word processor (e.g. Word, LibreOffice). Save your README as UTF-8 encoded text. Using plain text helps preserve your information because it relies on durable, open standards rather than proprietary formats. If you’re using GitHub, your README should be written using Markdown syntax (

Exercise 2

Now, let’s practice what we just learned. Please download this README template , choose one data project that you are doing right now and spend 5-7 minutes filling the template.

Pay special attention to the variables list. A dataset without naming variables is not useful. How would your peers know what a variable named OxIntake13 means?


You are now ready to write up a good README file so other researchers can understand your dataset with no problems!