14. Quantitative Data Analysis
Dr. Rochelle Stevenson
🎯 Learning Objectives
- Understand the importance of levels of measurement in quantitative data analysis.
- Describe the utility of frequency distributions.
- Distinguish between descriptive and inferential statistics.
- Distinguish between the three measures of central tendency: mode, median and mean.
- Describe the utility of measures of dispersion.
- Describe the different graphical formats used to display univariate distributions.
- Explain the role quantitative data have played historically in justifying racist policies, including those targeting Indigenous peoples.
We are inundated with quantitative data on a daily basis – numbers are everywhere. We see important social issues being debated using numerical data in the public domain on social media and the news media, such as intimate partner violence, pandemic epidemiology (including COVID-19), fear of crime, immigration, correlates of offending, and many other important social topics. Numerical findings on these topics can be interpreted and used in inappropriate ways by laypersons as well as people who represent themselves as “authorities” in these media. The challenge is that when people are not well-versed in how the scientific process operates, they may incorrectly use research and numerical results as evidence for and therefore confirmation of their perspectives or positions based on a misunderstanding of what the statistics actually represent.
This chapter is a simplified overview of the process of quantitative data analysis for information gathered from experiments, surveys, content analysis, and other data represented by numbers. We start by discussing why understanding statistics is critical in everyday life, and we review some of the foundational concepts in statistics. We then discuss the process of data analysis, including data entry, data cleaning, and univariate analyses. This chapter is structured to highlight the process of data analysis, from the point at which we have completed collecting our data on some interesting social phenomenon, to presenting data in a graphical format. We conclude by considering the importance of including Indigenous peoples and their voices and perspectives in the collection, analysis, and interpretation of numerical data, especially when conducting research involving Indigenous peoples and their communities.
Overview of Terminology in Quantitative Data Analysis
Although the process of operationalization is beyond the scope of this chapter, the end process of operationalization, where we define the categories in our variables with numbers for the purpose of quantitative analysis, first requires an understanding of levels of measurement (LOMs) (see the full discussion in chapter 6a). Recall that LOMs refer to how the attributes of our variables are related to one another; are they rank ordered, do we know the distance between the ranks, is there a true zero point, or are they simply different categories? Determining the level of measurement of our variables involves an appraisal of how categories in each variable are coded with numbers and how these numbers are used in statistical tests. LOMs are critical because they determine the range of legitimate mathematical operations that can be performed on specific variables of interest (see Fox et al., 2013; Gau, 2019; Bachman & Paternoster, 2004). LOMs determine:
- The legitimate statistics that can be interpreted for your variables.
- The types of descriptive/univariate statistics that can be reported and the types of inferential tests that can be undertaken.
In other words, determining the LOM of variables is always the starting point prior to running any statistical analysis. Please refer to Table 14.1 for a review of the four LOMs: nominal, ordinal, interval and ratio.
📹 Stop and Take a Break!
| Level of Measurement | Simplified Definition | Examples |
|---|---|---|
| Nominal | Numbers represent categories only.
No ranking or meaningful distance between categories. |
gender, ethnicity, colour |
| Ordinal | Numbers are categories with a natural ordering or ranking.
No meaningful distance between categories. |
education, opinion scales, socioeconomic status |
| Interval | Numbers are categories with a natural ordering and equal distance between the categories.
Absence of a natural zero point. |
IQ, Temperature in degrees Celsius |
| Ratio | Numbers are categories with a natural ordering and equal distance between the categories.
Presence of a natural zero point. |
number of drinks in the past month, self-reported offences, final exam scores. |
🧠 Stop and Take a Break!
Overview of the Process of Quantitative Data Analysis
Once we have collected our quantitative data and we are clear on the level of measurement of each variable, now what? Rather than calculate all the statistics by hand, we can use a statistical analysis program, like Statistical Package for Social Sciences (SPSS), to run all the analyses we may want or need to address our research question. We can import our data directly into SPSS from online survey programs, such as Qualtrics and Survey Monkey, or manually enter the data ourselves into the analysis program, which may be the case if we are using content analysis or experiments. But before we jump into the analysis, we need to define our variables and clean our data.
Say we are conducting a survey of university students asking about substance use. As part of the demographic information we collect, we ask about their gender in an open question, allowing the respondent to type in their gender rather than choose from a selected list. As respondents may type in any gender identity, part of defining our variable for gender is going through the responses and placing them into categories so that we can run analyses with gender as an independent variable. We would then assign a number to each category so that our computer program can work with the variable (e.g., woman = 1; transwoman = 2; non-binary = 3; and so on), remembering that gender is also a nominal variable and so the numbering of the categories does not represent any ordering or ranking. Part of data cleaning is assessing the data we have collected for errors. For example, what if a respondent offered that their gender was “orange”? This would not be a valid response to the question about gender identity, and so we would need to determine how we would code this response. We could code it as “no response,” as the person realistically did not respond to the question asked, or we could interpret the person’s response as preferring not to answer and code it that way. The point of data cleaning, including defining our variables, is that each respondent should have a value entered for each variable, even if that value is “missing data” for respondents who skipped that question.
After we confirm the accuracy of our data, we generally examine all our variables in summary tables called frequency distributions. These distributions provide a visual display of frequency counts (how many times categories of a particular variable appear in the data ) so that researchers can easily interpret the data and more easily identify any potential additional data entry errors not identified while entering or double-checking the data.
Let’s look at an example. Table 14.2 shows the frequency distribution of the variable of political affiliation in a study on perceptions of capital punishment in a sample of adults. We can see that there are seven missing values. Missing values can happen for several reasons. A survey respondent may skip a survey question because they didn’t want to answer the question. Or, the question (variable) may not be applicable to the respondent; for example, if a respondent replies that they do not have children, we would expect that they would skip a question about the ages of their children. Missing values may also be due to data entry errors. We can also see from Table 14.2 that there is one response that is not a legitimate value for our categories of political affiliation: the value 5. This is likely due to a data entry error. This frequency distribution enables us to identify and correct such errors before we start the data analysis process. Checking each variable for data accuracy using frequency distributions is a good practice, and programs like SPSS make it easy!
| Question: What party do you most identify with? | |||||
| Response | Party | Frequency | Percent | Valid Percent | Cumulative Percent |
|---|---|---|---|---|---|
| Valid | Conservative | 127 | 51.2 | 52.7 | 52.7 |
| Liberal | 51 | 20.6 | 21.2 | 73.9 | |
| NDP | 45 | 18.1 | 18,7 | 92.5 | |
| No Party | 17 | 6.9 | 7.1 | 99.6 | |
| 5 | 1 | 0.4 | 0.4 | 100.0 | |
| Total | 241 | 97.2 | 100.0 | ||
| Missing | System | 7 | 2.8 | ||
| Total | 248 | 100.0 | |||
Statistically Analyzing and Presenting Quantitative Data
Now that our quantitative data have been cleaned, errors have been addressed, and data have been presented in the form of frequency distributions, we are ready to consider the types of statistical analyses that can be performed on our data. The purpose of quantitative data analysis is to uncover patterns and relationships in large aggregations of data so that we can describe the data in meaningful ways.
Two types of statistics can be derived from numerical data. Descriptive statistics are statistics that represent in or more variables in our sample. We can discuss the percentages in categories of our variables (frequency distributions), typical or average cases (measures of central tendency), and the dispersion, or spread, of our variables (measures of dispersion). In other words, any conclusions based on descriptive analyses apply to the specific participants or elements we have studied only.
Conclusions based on inferential statistics go beyond the sample to say something about the population from which the sample was drawn (see Fox et al., 2013; Gau, 2019; Schulenberg, 2016). Inferential statistics involve formal hypothesis-testing procedures in making these inferences. Inferential statistics include bivariate (i.e., two variables) and multivariate (i.e., more than two variables) analyses. See Table 14.3 for the key features of each. This chapter focuses on univariate statistics. For more information about bivariate or multivariate analyses, any text authored by Neil J. Salkind can be a good place to start.
| Descriptive Statistics | Inferential Statistics |
|---|---|
| focus on describing the sample only | use the sample to make evidence-based statements about the population |
| often focus on one variable at a time (univariate); can also focus on more than one variable at a time | focus on two variables (bivariate) or more than two variables (multivariate)
|
common univariate statistics include:
|
common bivariate statistical analyses include:
common multivariate analyses include:
|
Descriptive Statistics: Univariate Analysis
Univariate analysis focuses on describing the patterns in one variable at a time. There are three categories of univariate statistics:
- Frequency distributions, which display frequencies and percentages;
- Measures of central tendency, which are the mean, median, and mode;
- Measures of dispersion, which include the range, standard deviation, and variance. (see Gau, 2019; Maxfield & Babbie, 2018)
Frequency Distributions
The first way we can describe variables in our data is frequency or how often a value of a variable occurs in the data. Frequency distribution tables are a tool used in identifying frequencies. Not only do these tables assist us with data cleaning and identifying errors as discussed earlier, but they also provide information from which we can discuss the categories of our variables in meaningful ways. For example, let’s examine the data presented in Table 14.4 below, looking at participants’ subjective well-being.
| Subjective well-being | |||||
|---|---|---|---|---|---|
| Response | Frequency | Percent | Valid Percent | Cumulative Percent | |
| Valid | Very dissatisfied | 161 | 0.9 | 0.9 | 0.9 |
| 1 | 71 | 0.4 | 0.4 | 1.4 | |
| 2 | 136 | 0.8 | 0.8 | 2.2 | |
| 3 | 243 | 1.4 | 1.4 | 3.6 | |
| 4 | 321 | 1.9 | 1.9 | 5.5 | |
| 5 | 1326 | 7.8 | 7.8 | 13.3 | |
| 6 | 1237 | 7.3 | 7.3 | 20.6 | |
| 7 | 3011 | 17.8 | 17.8 | 38.4 | |
| 8 | 4945 | 29.2 | 29.2 | 67.5 | |
| 9 | 2199 | 13.0 | 13.0 | 80.5 | |
| Very satisfied | 3306 | 19.5 | 19.5 | 100.0 | |
| Total | 16956 | 100.0 | 100.0 | ||
Note: Data drawn from the 2015 General Social Survey on Time Use
In the first column, we can see the categories for our “subjective well-being” variable. This ordinal-level variable ranges from a minimum value of 0, “very dissatisfied,” to a high of 10, “very satisfied.” The “frequency” column presents the number of participants in each category. We can see that 1,326 participants rated their subjective well-being at 5, meaning “neither satisfied nor dissatisfied.” The “percent” column provides the percentage of the respondents who are in each category that includes respondents with missing values. In this sample, 7.8% of the sample rated their subjective well-being at 5. The “valid percent” column provides the percentage of the respondents who are in each column that excludes respondents with missing values (they are the same here because these data do not include missing values). If our data have missing values, it is common to use the “valid percent” column when discussing the categories of a variable. The “cumulative percent” column sums the percentages in each category. It is not often used but can be helpful when identifying the median (defined later in this chapter) and some other features of our data.
These frequency tables allow us to discuss specific features of our data. In the frequency distribution in Table 14.4, we can see that the majority of respondents have rated their subjective well-being as 7 or greater, meaning that people are generally satisfied with their own well-being. In addition, we can see that the most frequently reported satisfaction level is 8 (29.2%), and less than 6% of the sample reported being dissatisfied with their subjective well-being.
After we examine our univariate frequency distributions, we can take a look at descriptive statistics for our sample data: measures of central tendency, which include the mean, median and mode, and measures of dispersion, which include the standard deviation, range, and variance (see Fox & Levin, 2013; Gau, 2019). The selection of the most appropriate descriptive statistics to use is dependent on the LOMs.
Measures of Central Tendency
Measures of central tendency are the most common, typical or “average” response in our data (Palys, 1997). The mode is the most frequently occurring category in a variable and is best used to represent or describe nominal-level data (Schulenberg, 2016). For example, Table 14.5 shows the variable of whether respondents feel like they are constantly under stress. Of the two categories (yes and no), the most frequently occurring category, or the mode, is “no” (11,794 respondents). We can then say that the majority of respondents do not feel like they are constantly under stress.
| Response | Frequency | Percent | Valid Percent | Cumulative Percent | |
|---|---|---|---|---|---|
| Valid | Yes | 5221 | 30.7 | 30.7 | 30.7 |
| No | 11794 | 69.3 | 69.3 | 100.0 | |
| Total | 17015 | 100.0 | 100.0 | ||
Note: Data drawn from the 2015 General Social Survey on Time Use
The measure of central tendency frequently used for ordinal data is called the median. The median is the category that falls on the 50th percentile or the case that divides the frequency distribution in half (see Schulenberg, 2016), as depicted in Figure 14.1. While we can also use the median to describe interval and ratio-level data, the median is the most appropriate for ordinal data because the data are ordered from the lowest to the highest values.

When identifying the median, we first locate its position. The position of the median is equal to the case in the middle of the distribution. Once we find the position of the median, we need to identify the “case” associated with that position (see Schulenberg, 2016). Consider the data in Table 14.4, which present the level of subjective well-being. Looking at the cumulative percent column, we can see that the median (or the 50% point) is located among the respondents who rated their subjective well-being as 8. So, we can state that the median level of subjective well-being among the sample studied is at the higher end of the satisfaction rankings.
The mean is the measure of central tendency best used to describe interval- or ratio-level variables (Schulenberg, 2016) and the one we are most familiar with. When you ask your professor for the class average for your exam, you are requesting the mean. The formula for calculating the mean is presented in the formula shown in Figure 14.2.
![]()
Figure 14.2: Formula for Calculating the Mean
To calculate the mean, we sum exam scores (each X) for all students in class and divide by the number of students in the class. When we have a lot of extremely low or high scores, sometimes referred to as outliers, the mean becomes a less accurate representation of interval and ratio-level data. For example, if most students scored between 70% and 90% on their exam but several students scored 0%, this would bring the mean down significantly. In cases where we have extreme scores that skew the mean, it is helpful to also provide the median to better describe our sample.
Figure 14.3 shows a graphical way to understand how the mean, median, and mode describe data. It is an example of perfectly symmetrical data with no outliers, and this does not often happen in the real world. These data are symmetrical because each half of the distribution are mirror images of each other; in other words, the mean, median, and mode are the same.

Figure 14.3: Symmetrical Distribution with the Same Measures of Central Tendency
Measures of dispersion
Measures of dispersion are complimentary to measures of central tendency (Palys, 1997). Despite knowing the centre of the data, we also often want to know how much data varies from the centre, such as knowing what the lowest and highest scores are. Consider Figure 14.4, where the blue distribution represents Class A’s exam scores (A) and the red distribution represents Class B’s exam scores (B). Although both distributions have the same mode, median and mean, the blue distribution of exam scores has much more dispersion – or spread – around its mean than the red distribution. This means that students in Class A had more high and low scores as compared to the students in Class B, whose scores clustered closer to the class average. Measures of dispersion allow us to describe how much our sample varies around its measure of central tendency. All measures of dispersion discussed here require at least ordinal-level variables.

Figure 14.4: Distributions with the Same Measures of Central Tendency and Different Dispersions
The most basic measure of dispersion is referred to as the range. The range represents the difference between the highest score and lowest score in our data. For example, if the lowest age in our sample of criminology students is 18 and the highest age is 24, our range is 6 years.
When we have data that are measured at the interval or ratio level, we can assess dispersion – or the “spread” – around our mean by using the variance and the standard deviation. While the details and formulas of these measures are beyond the scope of this text and will be explored further in your statistical analysis courses later on, it is important here to reiterate that the way we measure the dispersion around the mathematical average, or mean, will depend on the level of measurement of the data in question. The question ultimately becomes how different each “score” or participant is from the average score or participant. You will see variance-related concepts are used in inferential tests in future courses.
Figure 14.5 shows positively skewed data and negatively skewed data. When we have a positively skewed variable, the mean will be higher than the median, and the median will be higher than the mode. This occurs because the mean is pulled higher by outliers on the positive side of the distribution (higher scores). When we have a negatively skewed variable, the mean will be lower than the median, and the median will be lower than the mode. This occurs because the mean is pulled lower by outliers on the negative side of the distribution (lower scores) (see Fox et al., 2013).


Figure 14.5: Asymmetrical Distributions: A Positively (a) and Negatively (B) Skewed Distribution
Presenting Quantitative Data
It is often useful to audiences to present our univariate distributions in a graphical format instead of in tables. Univariate charts and graphs are a powerful way to visualize and communicate data. Some of these basic options for univariate distributions include pie charts, bar charts, and line graphs (see Palys, 1997; Schulenberg, 2016).
Pie charts are generally used to represent variables that have a small number of categories, usually measured at the nominal or ordinal level. Figure 14.6 illustrates solved homicides in Canada in 2024, broken down by the relationship between the accused and the victim. As we can see from the pie chart, the most common relationship category is acquaintance (39%), followed by intimate relationship (19%). A pie chart allows us to quickly see how the categories are distributed.

Bar charts are generally used to represent nominal, ordinal, and grouped interval data with several categories. The example in Figure 14.7 is a bar graph of the number of homicide victims in Canada over a 5-year period, from 2019 to 2023. We can quickly see that there was an annual increase in homicide victims, peaking in 2022 with 882 victims. Like pie charts, bar charts are useful to visually compare differences across categories.

Line graphs are used to represent variables measured at the interval or ratio level. The example in Figure 14.8 is a line graph that shows the homicide victimization rates in Canada by gender and Indigenous identity over a period of 13 years. Looking at the graph, we can quickly see that Indigenous women and girls are victims of homicide at a much higher rate than non-Indigenous women and that men and boys have the second-highest rate of homicide among the three groups.

The sharing of stark findings, such as those presented in Figure 14.8, without culturally sensitive interpretation can perpetuate harmful stereotypes, erode trust in research, and allow misinterpretation to justify discriminatory policies. So, in addition to pointing out the disproportionately high percentage of Indigenous women and girls as victims of homicide, the findings should be couched in a discussion of systemic challenges faced by Indigenous peoples that may influence these percentages. These challenges may include historical trauma, the forced removal of children to residential schools, and the reserve system. Additionally, systemic factors such as the discretionary decisions of law enforcement and prosecutors to respond to victimization against Indigenous peoples should be mentioned, such as the National Inquiry Into Missing and Murdered Indigenous Women and Girls (2019). The way in which we interpret quantitative data such as these as they pertain to Indigenous peoples is also addressed in the next section.
🧠 Stop and Take a Break!
Including Indigenous Voices with the Use of Western Models of Science and Statistics
Western methods of science have been criticized as being a tool of colonialism that have aided in the perpetuation of Western systems of dominance of Indigenous groups in Canada. The focus of Western science on aggregated data, an outsider perspective and expertise being the domain of the academic researcher has silenced other sources of knowing as unworthy of consideration in traditional science (Deckert, 2016). Historically, the scientific method has been used within imperial systems to justify the subordinate position of Indigenous peoples. Policies influenced by quantitative data have discriminated against and harmed these groups through overtly racist institutions, such as residential schools and the reserve system (Deckert, 2016) and thus have led to intergenerational trauma for Indigenous communities (McQuaid et al., 2017). Moreover, in current times, research on these communities has been used to justify other forms of governmental intervention, such as the removal of children from Indigenous parents and the disproportionate incarceration of Indigenous peoples (Deckert, 2016; Wilk et al., 2017). These actions have resulted in resentment from Indigenous communities because they are over-researched and yet silenced within academic research (Morisano et al., 2018). Statistical data, which are a cornerstone of Western science, are often viewed as objective by both academics and the general population. Think about discussions you have had in class, with other students and with friends and family about crime rates, alcohol and drug use, high-school graduation rates, and other social problems with regard to Indigenous peoples. Now think about how many of those conversations have incorporated Indigenous voices on these issues.
Bearing all this in mind, it is important that quantitative researchers exercise caution when undertaking research with Indigenous communities. It should be apparent from the information provided in previous quantitative and qualitative methods chapters that quantitative research is particularly susceptible to silencing voices of Indigenous groups. This has led to a re-evaluation of the research process, particularly criminological research, involving Indigenous peoples (Morisano et al., 2018). More recent work on Indigenous knowledge has demonstrated the necessity and usefulness of including Indigenous voices in the applied sciences, including resource development and ecological management (see Ulluwishewa et al., 2008). The importance of Indigenous knowledge and perspectives has been acknowledged in ethical frameworks for research in Indigenous communities, data collection and analysis and ownership and the dissemination of data from research on Indigenous persons and their communities (Morisano et al., 2018, 2024). As discussed at greater length in chapter 9 and chapter 12, participatory action research is an emerging framework for research undertaken in collaboration with vulnerable groups that includes Indigenous perspectives in all stages of the research. Moreover, participatory action research seeks to empower communities by involving Indigenous community members in all steps of the research process (e.g., conceptualization to data collection and analysis) and the dissemination of results (see Aldridge, 2015). It is also worth mentioning that it is critical to include Indigenous voices in quantitative research in criminology because we are analyzing social phenomena. At a fundamental level, including Indigenous perspectives increases the validity of quantitative measures and the validity of the interpretations of the quantitative data gathered from, on and with Indigenous peoples.
Conclusion
The overall goal of data analysis, regardless of the nature of those data, is to be able to condense large amounts of information and present it in a way that can be understood and communicated to an audience. In this chapter, we examine this process when the data are numerical. These numerical data are typically collected in studies employing a deductive approach using methods such as experiments or surveys, where there is often a large number of participants. While the focus of this chapter was on descriptive statistics – frequency distributions, measures of central tendency, and measures of dispersion – we also mention the role of inferential statistics in social science research. The ways we can effectively present these data in a graphical format are also reviewed; these visual representations of large amounts of data can be effective but also potentially misleading. The chapter concludes with emphasis on the importance of understanding how to collect, analyze and interpret numerical data accurately as these data, in their aggregate form, have a unique potential to be misused and to misinform audiences, a concern that is paramount when researching and reporting on the lived experiences of Indigenous peoples.
✅ Summary
- Quantitative data are a part of our everyday lives and are particularly susceptible to being misinterpreted or used in inappropriate ways. As such, a basic understanding of statistics is critical.
- Before we can begin to think about what quantitative analysis technique is most suitable for the data we have, we must first understand the level of measurement (LOM) of each of our variables. The LOM determines the legitimate univariate statistics we can calculate, and they influence the types of inferential statistics we can use later in the analysis process.
- Prior to analyzing our data, it is necessary to enter, clean and examine our data for errors.
- Descriptive statistics summarize our sample data.
- Univariate statistics include frequency distributions, measures of central tendency and measures of dispersion. The choice of the proper measure is determined by the level of measurement of the single variable.
🖊️ Key Terms
aggregate data (or aggregations of data): data from multiple sources that are summarized and compiled to examine trends or statistical analysis. They are the sum of their component parts or smaller units of data.
bar chart: a chart used to represent univariate distributions in a graphical format instead of a table. Bar charts are generally best to represent nominal, ordinal and grouped interval data with several categories.
bivariate statistics: statistical tests that assess the extent of the relationship between two variables.
descriptive statistics: statistics that focus on our sample data alone, such as frequency distributions, measures of central tendency and measures of dispersion for one or more variables in a study.
frequency distributions: a table that visually displays the distribution of one variable through aggregated counts and percentages of participants in each category of the variable. These distributions allow researchers to easily interpret the data and identify any potential data entry errors.
inferential statistics: statistics where we use our sample data to make inferences to a larger population from which our sample is drawn. This is based in probability theory.
line graph: a chart used to represent univariate distributions in a graphical format instead of a table. Line graphs are generally best for representing interval- or ratio-level data.
mean: a measure of central tendency that represents the arithmetic average of an interval or ratio variable.
measures of central tendency: statistics that provide a measure of where participants cluster, the typical response or the average response in a variable. The three measures of central tendency are mode, median and mean.
measures of dispersion: statistics that provide information on how responses on a variable differ from or are dispersed around a measure of central tendency. They compliment the measure of central tendency. The three measures of dispersion are range, variance and standard deviation.
median: a measure of central tendency that represents the midpoint in a distribution of an ordinal variable.
mode: a measure of central tendency that represents the most frequently occurring category of a nominal variable.
multivariate statistics: statistical tests that assess the extent of the relationship between three or more variables. Typically, we examine the impact of multiple independent variables on a single dependent variable.
outlier: extremely high or low scores in a distribution that make the mean an inaccurate measure of central tendency when dealing with interval- or ratio-level data. These outliers positively or negatively skew data.
pie chart: a chart used to represent univariate distributions in a graphical format instead of a table. Pie charts are generally best to represent a small number of categories at the nominal or ordinal level.
range: a measure of dispersion that represents the difference between the highest and lowest observed value in a variable.
standard deviation: a measure of dispersion that represents the square-root of the average of the squared deviations from the mean. This value is transformed back in the original units of the variable so it can be interpreted.
univariate analysis: statistics on one variable in our sample, such as the mean and standard deviation.
variance: a measure of dispersion that represents the average of the squared deviations from the mean. This value is in squared units so it is difficult to interpret as a measure of dispersion.
🧠 Chapter Review
Crossword
Fill in the term in the right-hand column and it will display in the crossword puzzle. Be sure to include spaces where appropriate.
Discussion Questions
- In your own words, describe how a distribution can have all the same measures of central tendency but different measures of dispersion.
- Discuss why quantitative research is particularly susceptible to silencing voices of Indigenous groups. Provide an example of a quantitative statistic you have read in the news or in another class that you feel mispresented the lived experiences of Indigenous peoples in Canada and what you might do to rectify this if you were reporting on the same social topic.
- What is one benefit and one limitation of using univariate graphs when trying to understand the distribution and central tendency of a single variable?
- Provide a hypothetical example of when an outlier should not be ignored as it provides meaningful information for your analysis.
References
Aldridge, J. (2015). Participatory research: Working with vulnerable groups in research and practice. Bristol University Press. https://doi.org/10.2307/j.ctt1t8933q
Bachman, R., & Paternoster, R. (2004). Statistics for criminology and criminal justice (2nd ed.). McGraw Hill.
Deckert, A. (2016). Criminologists, duct tape, and Indigenous peoples: Quantifying the use of silencing research methods. International Journal of Comparative and Applied Criminal Justice, 40(1), 43–62. https://doi.org/10.1080/01924036.2015.1044017
Fox, J., Levin, J., & Forde, D. R. (2013). Elementary statistics in criminal justice research (4th ed.). Prentice Hall.
Gau, J. M. (2019). Statistics for criminology and criminal justice (3rd ed.). Sage Publications.
Maxfield, M.G., & Babbie, E.R. (2017). Research decisions for criminal justice and criminology (8th ed.). Cengage Learning.
McQuaid, R.J., Bombay, A., McInnis, O.A., Humeny, C., Matheson, K., & Anisman, H. (2017). Suicide ideation and attempts among First Nations Peoples living on-reserve in Canada: The intergenerational and cumulative effects of indian residential schools. The Canadian Journal of Psychiatry, 62(6), 422-430. https://doi.org/10.1177/0706743717702075
Morisano, D., Robinson, M., & Linklater, R. (2018). Conducting research with Indigenous peoples of Canada: Ethical considerations for the Centre for Addiction and Mental Health (CAMH) [PDF]. Centre for Addiction and Mental Health. https://www.camh.ca/-/media/files/conducting-research-indigenous-peoples-canada-pdf.pdf
Morisano, D., Robinson, M., Rush, B., & Linklater, R. (2024). Conducting research with Indigenous peoples in Canada: Ethical and policy considerations. Frontiers in Psychology, 14, 1-16. https://doi.org/10.3389/fpsyg.2023.1214121
National Inquiry into Missing and Murdered Indigenous Women and Girls. (2019). Reclaiming power and place. The final report of the national inquiry into missing and murdered Indigenous women and girls [PDF]. The National Inquiry. https://www.mmiwg-ffada.ca/wp-content/uploads/2019/06/Final_Report_Vol_1a-1.pdf
Palys, T. S. (1997). Research decisions: Quantitative and qualitative perspectives (2nd ed.). Harcourt Canada.
Schulenberg, J. L. (2016). The dynamics of criminological research. Oxford University Press.
Statistics Canada. (2024, July 25). Number, rate and percentage changes in rates of homicide victims [Data set] (Table 35-10-0068-01). https://doi.org/10.25318/3510006801-eng
Statistics Canada. (2024, December 11). Homicide in Canada, 2023 [Infographic] (Catalogue no, 11-627-M). Canadian Centre for Justice and Community Safety Statistics. https://www150.statcan.gc.ca/n1/pub/11-627-m/11-627-m2024048-eng.htm
Ulluwishewa, R., Roskruge, N., Harmsworth, G., & Antaran, B. (2008). Indigenous knowledge for natural resource management: A comparative study of Maori in New Zealand and Dusun in Brunei Darussalam. GeoJournal, 73(4), 271-284. https://doi.org/10.1007/s10708-008-9198-9
Wilk, P., Maltby, A., & Cooke, M. (2017). Residential schools and the effects on Indigenous health and well-being in Canada: A scoping review. Public Health Reviews, 38(8), 1-23. https://doi.org/10.1186/s40985-017-0055-6
A table that visually displays the distribution of one variable through aggregated counts and percentages of participants in each category of the variable. These distributions allow researchers to easily interpret the data and identify any potential data entry errors.
Data from multiple sources that are summarized and compiled to examine trends or statistical analysis. They are the sum of their component parts or smaller units of data.
Statistics that focus on our sample data alone, such as frequency distributions, measures of central tendency and measures of dispersion for one or more variables in a study.
Statistics where we use our sample data to make inferences to a larger population from which our sample is drawn. This is based in probability theory.
Statistical tests that assess the extent of the relationship between two variables.
Statistical tests that assess the extent of the relationship between three or more variables. Typically, we examine the impact of multiple independent variables on a single dependent variable.
Statistics on one variable in our sample, such as the mean and standard deviation.
Statistics that provide a measure of where participants cluster, the typical response or the average response in a variable. The three measures of central tendency are mode, median and mean.
Statistics that provide information on how responses on a variable differ from or are dispersed around a measure of central tendency. They compliment the measure of central tendency. The three measures of dispersion are range, variance and standard deviation.
A measure of central tendency that represents the most frequently occurring category of a nominal variable.
A measure of central tendency that represents the midpoint in a distribution of an ordinal variable.
A measure of central tendency that represents the arithmetic average of an interval or ratio variable.
Extremely high or low scores in a distribution that make the mean an inaccurate measure of central tendency when dealing with interval- or ratio-level data. These outliers positively or negatively skew data.
A measure of dispersion that represents the difference between the highest and lowest observed value in a variable.
A measure of dispersion that represents the average of the squared deviations from the mean. This value is in squared units so it is difficult to interpret as a measure of dispersion.
A measure of dispersion that represents the square-root of the average of the squared deviations from the mean. This value is transformed back in the original units of the variable so it can be interpreted.
A chart used to represent univariate distributions in a graphical format instead of a table. Pie charts are generally best to represent a small number of categories at the nominal or ordinal level.
A chart used to represent univariate distributions in a graphical format instead of a table. Bar charts are generally best to represent nominal, ordinal and grouped interval data with several categories.
A chart used to represent univariate distributions in a graphical format instead of a table. Line graphs are generally best for representing interval- or ratio-level data.