The main theme or idea that should without a doubt pervade your classes on each of the two topics of data analysis and probability is that elementary school students require real experiences with situations involving data and with situations involving chance. They will best learn about these concepts on an intuitive or informal level without the rules and algorithms traditionally associated with them. Students need an opportunity to reflect on and interpret situations and work at describing them. A graph or descriptive statistic must be intuitively meaningful to students or the production of graphs and statistics is nothing more than meaningless busy work. This calls for real data, collected and interpreted by students for the purpose of answering questions of interest to them.
Data Gathered to Answer Questions
Data is collected and analyzed for the purpose of answering questions about the population from which the data is gathered. This simple-sounding statement moves the focus of data analysis from "how to make graphs" to "what does the data tell us?" Analysis of graphs tells us about the "shape of the data." A secondary purpose is that of communicating information about the data to others. At the primary level students ask questions about themselves as a class. As they grow more aware of the world around them their explorations can grow to comparisons with other groups including group data found on the Web.
Attribute activities, originally became popular as a means of developing logical reasoning skills, are included here but are restricted to those involving classification. Often we have an ill-structured collection of information and must make sense of it. Classification is the place to begin. I am aware that there are educators who continue to love attribute activities as logic or problem-solving endeavors. I too enjoy them but alas have found no research to support their effectiveness.
The book explores all of the usual data-graphing techniques that you are likely to run across. The less formal approaches of stem-and-leaf plots, and line plots are well worth including, especially if teachers have not been exposed to those techniques in a content course. Not only are these methods easy to use, they are easy to interpret and discuss. Note that box-and-whisker plots are seen as descriptive of the distribution of data rather than as a graph.
Computer software, especially graphing packages and spreadsheets, and the graphing calculator are making the techniques of graphing less important than the interpretation. These devices allow teachers and students to explore different representations for a set of data to decide which approach best conveys the appropriate message. We need to do a better job of exploiting these approaches since they very clearly place the focus on interpretation rather than on graph construction techniques.
Only simple statistics that are easily understood are explored. Of these, the arithmetic mean is perhaps the most important. There is evidence that indicates students are quite capable of computing the mean (at least they do so when it is referred to as an average), but this only reflects the ability to add and divide. Research suggests that there is little understanding of what the statistic represents, the effect that 0 or a large outlier in the data set may have on the mean, or even that it in some way is representative of the data used to compute it (Bright and Hoeffner). Several activities are suggested to help with some of this misunderstanding. Two distinct meanings are developed: a leveling concept (which gives rise to the algorithm) and a balance concept that is the one used by statisticians. The latter is a feature of the Connected Math Project as well as the older Used Numbers series and is one I highly recommend you explore, at least with middle school teachers.
Distribution of Data
A theme throughout the chapter is that data sets have a "shape," a general characterization of the data as whole. At the elementary level the shape of data is a subjective consideration. Students might, for example, notice that there are more tall bars at one end of the graph than at the other. Box-and-whisker plots provide a graphical way to represent the spread of data rather than the data itself. The range and quartile values are statistics that begin to describe the shape of data numerically. Think of these as precursors of the standard deviation statistic that is more appropriately developed at the high-school level.
Scatter Plots and Best-Fit Lines
I've emphasized that data is gathered to answer questions. Many times, the question involves a search for possible relationships between two or more measures. When survey data is gathered to answer questions such as this, the resulting scatter-plot graph may only be suggestive of an existing relationship. A mathematical description of these inferred relationships is usually a line of best fit. In this section the concept of a best-fit line is explored. Both graphical and numeric methods are developed for determining the median-median line, an excellent early alternative to the typical regression line.
This section provides a very important connection to algebra. The use of coordinates to plot data and then the development of an equation to represent a relationship is an obvious use of algebraic ideas. Best-fit lines were mentioned in Chapter 15 in the discussion of finding functional relationships in the real world.