Problems with Spreadsheets
Spreadsheets are good for data entry, but in reality we tend to use spreadsheet programs for much more than data entry. We use them to create data tables for publications, to generate summary statistics, and make figures. Laying out spreadsheets in this way often adds some difficulty when we want to take our data from the spreadsheet and use it in another program. Additional white space, merged cells, colour and grids may aid readability but are not easily handled by other programs that take our spreadsheet as an input to further analysis.
Generating statistics and figures in spreadsheets should be done with caution. The graphical, drag and drop nature of spreadsheet programs means that it can be very difficult, if not impossible, to replicate your steps (much less retrace anyone else’s). This is particularly true if your stats or figures require complex calculations. Furthermore, when performing calculations in a spreadsheet, it’s easy to accidentally apply a slightly different formula to multiple adjacent cells. This often makes it difficult to demonstrate data quality and consistency in our analysis.
Even when we are aware of some of the limitations that data in spreadsheets presents, often we have inherited spreadsheets from another colleague or data provider. In these situations we cannot exercise any control in its construction or entry of the data within it. Nevertheless it is important to be aware of the limitations these data may present, and know how to assess if any problems are present and how to overcome them.
Using Spreadsheets for Data Entry and Cleaning
However, there are circumstances where you might want to use a spreadsheet program to produce “quick and dirty” calculations or figures, and some of these features can be used in data cleaning, prior to importation into a statistical analysis program. We will show you how to use some features of spreadsheet programs to check your data quality along the way and produce preliminary summary statistics.
In this lesson, we will assume that you are most likely using Excel as your primary spreadsheet program - there are other programs with similar functionality but Excel seems to be the most commonly used.