"

14. Quantitative Data Analysis

Dr. Rochelle Stevenson

🎯 Learning Objectives

  • Understand the importance of levels of measurement in quantitative data analysis.
  • Describe the utility of frequency distributions.
  • Distinguish between descriptive and inferential statistics.
  • Distinguish between the three measures of central tendency: mode, median and mean.
  • Describe the utility of measures of dispersion.
  • Describe the different graphical formats used to display univariate distributions.
  • Explain the role quantitative data have played historically in justifying racist policies, including those targeting Indigenous peoples.

We are inundated with quantitative data on a daily basis – numbers are everywhere. We see important social issues being debated using numerical data in the public domain on social media and the news media, such as intimate partner violence, pandemic epidemiology (including COVID-19), fear of crime, immigration, correlates of offending, and many other important social topics. Numerical findings on these topics can be interpreted and used in inappropriate ways by laypersons as well as people who represent themselves as “authorities” in these media. The challenge is that when people are not well-versed in how the scientific process operates, they may incorrectly use research and numerical results as evidence for and therefore confirmation of their perspectives or positions based on a misunderstanding of what the statistics actually represent.

This chapter is a simplified overview of the process of quantitative data analysis for information gathered from experiments, surveys, content analysis, and other data represented by numbers. We start by discussing why understanding statistics is critical in everyday life, and we review some of the foundational concepts in statistics. We then discuss the process of data analysis, including data entry,  data cleaning, and univariate analyses. This chapter is structured to highlight the process of data analysis, from the point at which we have completed collecting our data on some interesting social phenomenon, to presenting data in a graphical format. We conclude by considering the importance of including Indigenous peoples and their voices and perspectives in the collection, analysis, and interpretation of numerical data, especially when conducting research involving Indigenous peoples and their communities.

 

Overview of Terminology in Quantitative Data Analysis

Although the process of operationalization is beyond the scope of this chapter, the end process of operationalization, where we define the categories in our variables with numbers for the purpose of quantitative analysis, first requires an understanding of levels of measurement (LOMs) (see the full discussion in chapter 6a). Recall that LOMs refer to how the attributes of our variables are related to one another; are they rank ordered, do we know the distance between the ranks, is there a true zero point, or are they simply different categories? Determining the level of measurement of our variables involves an appraisal of how categories in each variable are coded with numbers and how these numbers are used in statistical tests. LOMs are critical because they determine the range of legitimate mathematical operations that can be performed on specific variables of interest (see Fox et al., 2013; Gau, 2019; Bachman & Paternoster, 2004). LOMs determine:

  1. The legitimate statistics that can be interpreted for your variables.
  2. The types of descriptive/univariate statistics that can be reported and the types of inferential tests that can be undertaken.

In other words, determining the LOM of variables is always the starting point prior to running any statistical analysis. Please refer to Table 14.1 for a review of the four LOMs: nominal, ordinal, interval and ratio.

📹 Stop and Take a Break!

Watch this video for a clear explanation of LOMs and their importance in the context of statistical analysis.

 

Table 14.1: Levels of Measurement Defined
Level of Measurement Simplified Definition Examples
Nominal                    Numbers represent categories only.

No ranking or meaningful distance between categories.

gender, ethnicity, colour
Ordinal Numbers are categories with a natural ordering or ranking.

No meaningful distance between categories.

education, opinion scales, socioeconomic status
Interval Numbers are categories with a natural ordering and equal distance between the categories.

Absence of a natural zero point.

IQ, Temperature in degrees Celsius
Ratio Numbers are categories with a natural ordering and equal distance between the categories.

Presence of a natural zero point.

number of drinks in the past month, self-reported offences, final exam scores.

 

🧠 Stop and Take a Break!

Test your knowledge by answering a few questions on what you have read so far.

 

Overview of the Process of Quantitative Data Analysis

Once we have collected our quantitative data and we are clear on the level of measurement of each variable, now what? Rather than calculate all the statistics by hand, we can use a statistical analysis program, like Statistical Package for Social Sciences (SPSS), to run all the analyses we may want or need to address our research question. We can import our data directly into SPSS from online survey programs, such as Qualtrics and Survey Monkey, or manually enter the data ourselves into the analysis program, which may be the case if we are using content analysis or experiments. But before we jump into the analysis, we need to define our variables and clean our data.

Say we are conducting a survey of university students asking about substance use. As part of the demographic information we collect, we ask about their gender in an open question, allowing the respondent to type in their gender rather than choose from a selected list. As respondents may type in any gender identity, part of defining our variable for gender is going through the responses and placing them into categories so that we can run analyses with gender as an independent variable. We would then assign a number to each category so that our computer program can work with the variable (e.g., woman = 1; transwoman = 2; non-binary = 3; and so on), remembering that gender is also a nominal variable and so the numbering of the categories does not represent any ordering or ranking. Part of data cleaning is assessing the data we have collected for errors. For example, what if a respondent offered that their gender was “orange”? This would not be a valid response to the question about gender identity, and so we would need to determine how we would code this response. We could code it as “no response,” as the person realistically did not respond to the question asked, or we could interpret the person’s response as preferring not to answer and code it that way. The point of data cleaning, including defining our variables, is that each respondent should have a value entered for each variable, even if that value is “missing data” for respondents who skipped that question.

After we confirm the accuracy of our data, we generally examine all our variables in summary tables called frequency distributions. These distributions provide a visual display of frequency counts (how many times categories of a particular variable appear in the data ) so that researchers can easily interpret the data and more easily identify any potential additional data entry errors not identified while entering or double-checking the data.

Let’s look at an example. Table 14.2 shows the frequency distribution of the variable of political affiliation in a study on perceptions of capital punishment in a sample of adults. We can see that there are seven missing values. Missing values can happen for several reasons. A survey respondent may skip a survey question because they didn’t want to answer the question. Or, the question (variable) may not be applicable to the respondent; for example, if a respondent replies that they do not have children, we would expect that they would skip a question about the ages of their children. Missing values may also be due to data entry errors. We can also see from Table 14.2 that there is one response that is not a legitimate value for our categories of political affiliation: the value 5. This is likely due to a data entry error. This frequency distribution enables us to identify and correct such errors before we start the data analysis process. Checking each variable for data accuracy using frequency distributions is a good practice, and programs like SPSS make it easy!

 

Table 14.2: Political Party Affiliation in a Sample of Adult Participants in a Study on Views on Capital Punishment, a Frequency Distribution with Data Entry Errors
Question: What party do you most identify with?
Response Party Frequency Percent Valid Percent Cumulative Percent
Valid Conservative 127 51.2 52.7 52.7
Liberal 51 20.6 21.2 73.9
NDP 45 18.1 18,7 92.5
No Party 17 6.9 7.1 99.6
5 1 0.4 0.4 100.0
Total 241 97.2 100.0
Missing System 7 2.8
Total 248 100.0

Statistically Analyzing and Presenting Quantitative Data

Now that our quantitative data have been cleaned, errors have been addressed, and data have been presented in the form of frequency distributions, we are ready to consider the types of statistical analyses that can be performed on our data. The purpose of quantitative data analysis is to uncover patterns and relationships in large aggregations of data so that we can describe the data in meaningful ways.

Two types of statistics can be derived from numerical data. Descriptive statistics are statistics that represent in or more variables in our sample. We can discuss the percentages in categories of our variables (frequency distributions), typical or average cases (measures of central tendency), and the dispersion, or spread, of our variables (measures of dispersion). In other words, any conclusions based on descriptive analyses apply to the specific participants or elements we have studied only.

Conclusions based on inferential statistics go beyond the sample to say something about the population from which the sample was drawn (see Fox et al., 2013; Gau, 2019; Schulenberg, 2016). Inferential statistics involve formal hypothesis-testing procedures in making these inferences. Inferential statistics include bivariate (i.e., two variables) and multivariate (i.e., more than two variables) analyses. See Table 14.3 for the key features of each. This chapter focuses on univariate statistics. For more information about bivariate or multivariate analyses, any text authored by Neil J. Salkind can be a good place to start.

 

Table 14.3: Key Features of Descriptive and Inferential Statistics
Descriptive Statistics Inferential Statistics
focus on describing the sample only use the sample to make evidence-based statements about the population
often focus on one variable at a time (univariate); can also focus on more than one variable at a time focus on two variables (bivariate) or more than two variables (multivariate)

 

common univariate statistics include:

  • frequencies
  • mean
  • mode
  • median
  • range
  • standard deviation
  • variance
common bivariate statistical analyses include:

  • correlation
  • t-tests
  • chi square

common multivariate analyses include:

  • analysis of variance (ANOVA)
  • multiple analysis of variance (MANOVA)
  • linear regressions

Descriptive Statistics: Univariate Analysis

Univariate analysis focuses on describing the patterns in one variable at a time. There are three categories of univariate statistics:

  1. Frequency distributions, which display frequencies and percentages;
  2. Measures of central tendency, which are the mean, median, and mode;
  3. Measures of dispersion, which include the range, standard deviation, and variance. (see Gau, 2019; Maxfield & Babbie, 2018)

Frequency Distributions

The first way we can describe variables in our data is frequency or how often a value of a variable occurs in the data. Frequency distribution tables are a tool used in identifying frequencies. Not only do these tables assist us with data cleaning and identifying errors as discussed earlier, but they also provide information from which we can discuss the categories of our variables in meaningful ways. For example, let’s examine the data presented in Table 14.4 below, looking at participants’ subjective well-being.

 

 Table 14.4 Frequency Distribution of Subjective Well-Being 
Subjective well-being
Response Frequency Percent Valid Percent Cumulative Percent
Valid Very dissatisfied 161 0.9 0.9 0.9
1 71 0.4 0.4 1.4
2 136 0.8 0.8 2.2
3 243 1.4 1.4 3.6
4 321 1.9 1.9 5.5
5 1326 7.8 7.8 13.3
6 1237 7.3 7.3 20.6
7 3011 17.8 17.8 38.4
8 4945 29.2 29.2 67.5
9 2199 13.0 13.0 80.5
Very satisfied 3306 19.5 19.5 100.0
Total 16956 100.0 100.0

Note: Data drawn from the 2015 General Social Survey on Time Use

 

In the first column, we can see the categories for our “subjective well-being” variable. This ordinal-level variable ranges from a minimum value of 0, “very dissatisfied,” to a high of 10, “very satisfied.” The “frequency” column presents the number of participants in each category. We can see that 1,326 participants rated their subjective well-being at 5, meaning “neither satisfied nor dissatisfied.”  The “percent” column provides the percentage of the respondents who are in each category that includes respondents with missing values. In this sample, 7.8% of the sample rated their subjective well-being at 5. The “valid percent” column provides the percentage of the respondents who are in each column that excludes respondents with missing values (they are the same here because these data do not include missing values). If our data have missing values, it is common to use the “valid percent” column when discussing the categories of a variable. The “cumulative percent” column sums the percentages in each category. It is not often used but can be helpful when identifying the median (defined later in this chapter) and some other features of our data.

These frequency tables allow us to discuss specific features of our data. In the frequency distribution in Table 14.4, we can see that the majority of respondents have rated their subjective well-being as 7 or greater, meaning that people are generally satisfied with their own well-being. In addition, we can see that the most frequently reported satisfaction level is 8 (29.2%), and less than 6% of the sample reported being dissatisfied with their subjective well-being.

After we examine our univariate frequency distributions, we can take a look at descriptive statistics for our sample data: measures of central tendency, which include the mean, median and mode, and measures of dispersion, which include the standard deviation, range, and variance (see Fox & Levin, 2013; Gau, 2019). The selection of the most appropriate descriptive statistics to use is dependent on the LOMs.

 

Measures of Central Tendency

Measures of central tendency are the most common, typical or “average” response in our data (Palys, 1997). The mode is the most frequently occurring category in a variable and is best used to represent or describe nominal-level data (Schulenberg, 2016). For example, Table 14.5 shows the variable of whether respondents feel like they are constantly under stress. Of the two categories (yes and no), the most frequently occurring category, or the mode, is “no” (11,794 respondents). We can then say that the majority of respondents do not feel like they are constantly under stress.

 

 

Table 14.5: Perceptions of Feeling Constantly Under Stress, a Frequency Distribution
Response Frequency Percent Valid Percent Cumulative Percent
Valid Yes 5221 30.7 30.7 30.7
No 11794 69.3 69.3 100.0
Total 17015 100.0 100.0

Note: Data drawn from the 2015 General Social Survey on Time Use

 

The measure of central tendency frequently used for ordinal data is called the median. The median is the category that falls on the 50th percentile or the case that divides the frequency distribution in half (see Schulenberg, 2016), as depicted in Figure 14.1. While we can also use the median to describe interval and ratio-level data, the median is the most appropriate for ordinal data because the data are ordered from the lowest to the highest values.

 

Figure 14.1: Conceptual Representation of the Median [Image description for Figure 14.1]

 

When identifying the median, we first locate its position.   The position of the median is equal to the case in the middle of the distribution. Once we find the position of the median, we need to identify the “case” associated with that position (see Schulenberg, 2016). Consider the data in Table 14.4, which present the level of subjective well-being. Looking at the cumulative percent column, we can see that the median (or the 50% point) is located among the respondents who rated their subjective well-being as 8. So, we can state that the median level of subjective well-being among the sample studied is at the higher end of the satisfaction rankings.

The mean is the measure of central tendency best used to describe interval- or ratio-level variables (Schulenberg, 2016) and the one we are most familiar with. When you ask your professor for the class average for your exam, you are requesting the mean. The formula for calculating the mean is presented in the formula shown in Figure 14.2.

 

    \begin{align*} \overline{x} = \frac{\Sigma x}{N} \end{align*}

Figure 14.2: Formula for Calculating the Mean

 

To calculate the mean, we sum exam scores (each X) for all students in class and divide by the number of students in the class. When we have a lot of extremely low or high scores, sometimes referred to as outliers, the mean becomes a less accurate representation of interval and ratio-level data. For example, if most students scored between 70% and 90% on their exam but several students scored 0%, this would bring the mean down significantly. In cases where we have extreme scores that skew the mean, it is helpful to also provide the median to better describe our sample.

Figure 14.3 shows a graphical way to understand how the mean, median, and mode describe data. It is an example of perfectly symmetrical data with no outliers, and this does not often happen in the real world. These data are symmetrical because each half of the distribution are mirror images of each other; in other words, the mean, median, and mode are the same.

 

Rendered by QuickLaTeX.com

Figure 14.3: Symmetrical Distribution with the Same Measures of Central Tendency

 

Measures of dispersion

Measures of dispersion are complimentary to measures of central tendency (Palys, 1997). Despite knowing the centre of the data, we also often want to know how much data varies from the centre, such as knowing what the lowest and highest scores are. Consider Figure 14.4, where the blue distribution represents Class A’s exam scores (A) and the red distribution represents Class B’s exam scores (B). Although both distributions have the same mode, median and mean, the blue distribution of exam scores has much more dispersion – or spread – around its mean than the red distribution. This means that students in Class A had more high and low scores as compared to the students in Class B, whose scores clustered closer to the class average. Measures of dispersion allow us to describe how much our sample varies around its measure of central tendency. All measures of dispersion discussed here require at least ordinal-level variables.

 

Rendered by QuickLaTeX.com

Figure 14.4: Distributions with the Same Measures of Central Tendency and Different Dispersions

 

The most basic measure of dispersion is referred to as the range. The range represents the difference between the highest score and lowest score in our data. For example, if the lowest age in our sample of criminology students is 18 and the highest age is 24, our range is 6 years.

When we have data that are measured at the interval or ratio level, we can assess dispersion – or the “spread” – around our mean by using the variance and the standard deviation. While the details and formulas of these measures are beyond the scope of this text and will be explored further in your statistical analysis courses later on, it is important here to reiterate that the way we measure the dispersion around the mathematical average, or mean, will depend on the level of measurement of the data in question. The question ultimately becomes how different each “score” or participant is from the average score or participant. You will see variance-related concepts are used in inferential tests in future courses.

Figure 14.5 shows positively skewed data and negatively skewed data. When we have a positively skewed variable, the mean will be higher than the median, and the median will be higher than the mode. This occurs because the mean is pulled higher by outliers on the positive side of the distribution (higher scores). When we have a negatively skewed variable, the mean will be lower than the median, and the median will be lower than the mode. This occurs because the mean is pulled lower by outliers on the negative side of the distribution (lower scores) (see Fox et al., 2013).

 

Rendered by QuickLaTeX.com

Rendered by QuickLaTeX.com

Figure 14.5: Asymmetrical Distributions: A Positively (a) and Negatively (B) Skewed Distribution

Presenting Quantitative Data

It is often useful to audiences to present our univariate distributions in a graphical format instead of in tables. Univariate charts and graphs are a powerful way to visualize and communicate data. Some of these basic options for univariate distributions include pie charts, bar charts, and line graphs (see Palys, 1997; Schulenberg, 2016).

Pie charts are generally used to represent variables that have a small number of categories, usually measured at the nominal or ordinal level. Figure 14.6 illustrates solved homicides in Canada in 2024, broken down by the relationship between the accused and the victim.  As we can see from the pie chart, the most common relationship category is acquaintance (39%), followed by intimate relationship (19%).  A pie chart allows us to quickly see how the categories are distributed.

 

Figure 14.6: Solved Homicides in Canada, 2024, by type of accused-victim relationship. Statistics Canada. (2025, July 22). Number of victims of solved homicides, by type of accused-victim relationship [Data set] (Table 35-10-0073-01). https://doi.org/10.25318/3510007301-eng [Image description for Figure 14.6]

Bar charts are generally used to represent nominal, ordinal, and grouped interval data with several categories. The example in Figure 14.7 is a bar graph of the number of homicide victims in Canada over a 5-year period, from 2019 to 2023.  We can quickly see that there was an annual increase in homicide victims, peaking in 2022 with 882 victims.  Like pie charts, bar charts are useful to visually compare differences across categories.

 

Figure 14.7: Homicide victimization in Canada, 2019-2023 [Image description for Figure 14.7]

Line graphs are used to represent variables measured at the interval or ratio level. The example in Figure 14.8 is a line graph that shows the homicide victimization rates in Canada by gender and Indigenous identity over a period of 13 years. Looking at the graph, we can quickly see that Indigenous women and girls are victims of homicide at a much higher rate than non-Indigenous women and that men and boys have the second-highest rate of homicide among the three groups.

Figure 14.8: Homicide victimization, by gender and Indigenous identity, Canada, 2009 to 2021 (rate per 100,000 population). Data Source: Statistics Canada. (2023). Canadian Centre for Justice and Community Safety Statistics, Homicide Survey. [Image description for Figure 14.8]

The sharing of stark findings, such as those presented in Figure 14.8, without culturally sensitive interpretation can perpetuate harmful stereotypes, erode trust in research, and allow misinterpretation to justify discriminatory policies.  So, in addition to pointing out the disproportionately high percentage of Indigenous women and girls as victims of homicide, the findings should be couched in a discussion of systemic challenges faced by Indigenous peoples that may influence these percentages.  These challenges may include historical trauma, the forced removal of children to residential schools, and the reserve system.  Additionally, systemic factors such as the discretionary decisions of law enforcement and prosecutors to respond to victimization against Indigenous peoples should be mentioned, such as the National Inquiry Into Missing and Murdered Indigenous Women and Girls (2019).  The way in which we interpret quantitative data such as these as they pertain to Indigenous peoples is also addressed in the next section.

 

🧠 Stop and Take a Break!

Test your knowledge by answering a few questions on what you have read so far.

 

 

Including Indigenous Voices with the Use of Western Models of Science and Statistics

Western methods of science have been criticized as being a tool of colonialism that have aided in the perpetuation of Western systems of dominance of Indigenous groups in Canada. The focus of Western science on aggregated data, an outsider perspective and expertise being the domain of the academic researcher has silenced other sources of knowing as unworthy of consideration in traditional science (Deckert, 2016). Historically, the scientific method has been used within imperial systems to justify the subordinate position of Indigenous peoples. Policies influenced by quantitative data have discriminated against and harmed these groups through overtly racist institutions, such as residential schools and the reserve system (Deckert, 2016) and thus have led to intergenerational trauma for Indigenous communities (McQuaid et al., 2017). Moreover, in current times, research on these communities has been used to justify other forms of governmental intervention, such as the removal of children from Indigenous parents and the disproportionate incarceration of Indigenous peoples (Deckert, 2016; Wilk et al., 2017). These actions have resulted in resentment from Indigenous communities because they are over-researched and yet silenced within academic research (Morisano et al., 2018). Statistical data, which are a cornerstone of Western science, are often viewed as objective by both academics and the general population. Think about discussions you have had in class, with other students and with friends and family about crime rates, alcohol and drug use, high-school graduation rates, and other social problems with regard to Indigenous peoples. Now think about how many of those conversations have incorporated Indigenous voices on these issues.

Bearing all this in mind, it is important that quantitative researchers exercise caution when undertaking research with Indigenous communities. It should be apparent from the information provided in previous quantitative and qualitative methods chapters that quantitative research is particularly susceptible to silencing voices of Indigenous groups. This has led to a re-evaluation of the research process, particularly criminological research, involving Indigenous peoples (Morisano et al., 2018). More recent work on Indigenous knowledge has demonstrated the necessity and usefulness of including Indigenous voices in the applied sciences, including resource development and ecological management (see Ulluwishewa et al., 2008). The importance of Indigenous knowledge and perspectives has been acknowledged in ethical frameworks for research in Indigenous communities, data collection and analysis and ownership and the dissemination of data from research on Indigenous persons and their communities (Morisano et al., 2018, 2024). As discussed at greater length in chapter 9 and chapter 12, participatory action research is an emerging framework for research undertaken in collaboration with vulnerable groups that includes Indigenous perspectives in all stages of the research. Moreover, participatory action research seeks to empower communities by involving Indigenous community members in all steps of the research process (e.g., conceptualization to data collection and analysis) and the dissemination of results (see Aldridge, 2015). It is also worth mentioning that it is critical to include Indigenous voices in quantitative research in criminology because we are analyzing social phenomena. At a fundamental level, including Indigenous perspectives increases the validity of quantitative measures and the validity of the interpretations of the quantitative data gathered from, on and with Indigenous peoples.

 

Conclusion

The overall goal of data analysis, regardless of the nature of those data, is to be able to condense large amounts of information and present it in a way that can be understood and communicated to an audience. In this chapter, we examine this process when the data are numerical. These numerical data are typically collected in studies employing a deductive approach using methods such as experiments or surveys, where there is often a large number of participants. While the focus of this chapter was on descriptive statistics – frequency distributions, measures of central tendency, and measures of dispersion – we also mention the role of inferential statistics in social science research. The ways we can effectively present these data in a graphical format are also reviewed; these visual representations of large amounts of data can be effective but also potentially misleading. The chapter concludes with emphasis on the importance of understanding how to collect, analyze and interpret numerical data accurately as these data, in their aggregate form, have a unique potential to be misused and to misinform audiences, a concern that is paramount when researching and reporting on the lived experiences of Indigenous peoples.

 

✅ Summary

  • Quantitative data are a part of our everyday lives and are particularly susceptible to being misinterpreted or used in inappropriate ways. As such, a basic understanding of statistics is critical.
  • Before we can begin to think about what quantitative analysis technique is most suitable for the data we have, we must first understand the level of measurement (LOM) of each of our variables. The LOM determines the legitimate univariate statistics we can calculate, and they influence the types of inferential statistics we can use later in the analysis process.
  • Prior to analyzing our data, it is necessary to enter, clean and examine our data for errors.
  • Descriptive statistics summarize our sample data.
  • Univariate statistics include frequency distributions, measures of central tendency and measures of dispersion. The choice of the proper measure is determined by the level of measurement of the single variable.

 

🖊️ Key Terms

aggregate data (or aggregations of data): data from multiple sources that are summarized and compiled to examine trends or statistical analysis. They are the sum of their component parts or smaller units of data.

bar chart: a chart used to represent univariate distributions in a graphical format instead of a table. Bar charts are generally best to represent nominal, ordinal and grouped interval data with several categories.

bivariate statistics: statistical tests that assess the extent of the relationship between two variables.

descriptive statistics: statistics that focus on our sample data alone, such as frequency distributions, measures of central tendency and measures of dispersion for one or more variables in a study.

frequency distributions: a table that visually displays the distribution of one variable through aggregated counts and percentages of participants in each category of the variable. These distributions allow researchers to easily interpret the data and identify any potential data entry errors.

inferential statistics: statistics where we use our sample data to make inferences to a larger population from which our sample is drawn. This is based in probability theory.

line graph: a chart used to represent univariate distributions in a graphical format instead of a table. Line graphs are generally best for representing interval- or ratio-level data.

mean: a measure of central tendency that represents the arithmetic average of an interval or ratio variable.

measures of central tendency:  statistics that provide a measure of where participants cluster, the typical response or the average response in a variable. The three measures of central tendency are mode, median and mean.

measures of dispersion: statistics that provide information on how responses on a variable differ from or are dispersed around a measure of central tendency. They compliment the measure of central tendency. The three measures of dispersion are range, variance and standard deviation.

median: a measure of central tendency that represents the midpoint in a distribution of an ordinal variable.

mode: a measure of central tendency that represents the most frequently occurring category of a nominal variable.

multivariate statistics: statistical tests that assess the extent of the relationship between three or more variables. Typically, we examine the impact of multiple independent variables on a single dependent variable.

outlier: extremely high or low scores in a distribution that make the mean an inaccurate measure of central tendency when dealing with interval- or ratio-level data. These outliers positively or negatively skew data.

pie chart: a chart used to represent univariate distributions in a graphical format instead of a table. Pie charts are generally best to represent a small number of categories at the nominal or ordinal level.

range: a measure of dispersion that represents the difference between the highest and lowest observed value in a variable.

standard deviation: a measure of dispersion that represents the square-root of the average of the squared deviations from the mean. This value is transformed back in the original units of the variable so it can be interpreted.

univariate analysis: statistics on one variable in our sample, such as the mean and standard deviation.

variance: a measure of dispersion that represents the average of the squared deviations from the mean. This value is in squared units so it is difficult to interpret as a measure of dispersion.

 

🧠 Chapter Review

Crossword

Fill in the term in the right-hand column and it will display in the crossword puzzle. Be sure to include spaces where appropriate.

 

Discussion Questions

  1. In your own words, describe how a distribution can have all the same measures of central tendency but different measures of dispersion.
  2. Discuss why quantitative research is particularly susceptible to silencing voices of Indigenous groups. Provide an example of a quantitative statistic you have read in the news or in another class that you feel mispresented the lived experiences of Indigenous peoples in Canada and what you might do to rectify this if you were reporting on the same social topic.
  3. What is one benefit and one limitation of using univariate graphs when trying to understand the distribution and central tendency of a single variable?
  4. Provide a hypothetical example of when an outlier should not be ignored as it provides meaningful information for your analysis.

 


References

Aldridge, J. (2015). Participatory research: Working with vulnerable groups in research and practice. Bristol University Press. https://doi.org/10.2307/j.ctt1t8933q

Bachman, R., & Paternoster, R. (2004). Statistics for criminology and criminal justice (2nd ed.). McGraw Hill.

Deckert, A. (2016). Criminologists, duct tape, and Indigenous peoples: Quantifying the use of silencing research methods. International Journal of Comparative and Applied Criminal Justice40(1), 43–62. https://doi.org/10.1080/01924036.2015.1044017

Fox, J., Levin, J., & Forde, D. R. (2013). Elementary statistics in criminal justice research (4th ed.). Prentice Hall.

Gau, J. M. (2019). Statistics for criminology and criminal justice (3rd ed.). Sage Publications.

Maxfield, M.G., & Babbie, E.R. (2017). Research decisions for criminal justice and criminology (8th ed.). Cengage Learning.

McQuaid, R.J., Bombay, A., McInnis, O.A., Humeny, C., Matheson, K., & Anisman, H. (2017).  Suicide ideation and attempts among First Nations Peoples living on-reserve in Canada: The intergenerational and cumulative effects of indian residential schools. The Canadian Journal of Psychiatry, 62(6), 422-430. https://doi.org/10.1177/0706743717702075

Morisano, D., Robinson, M., & Linklater, R. (2018). Conducting research with Indigenous peoples of Canada: Ethical considerations for the Centre for Addiction and Mental Health (CAMH) [PDF]. Centre for Addiction and Mental Health. https://www.camh.ca/-/media/files/conducting-research-indigenous-peoples-canada-pdf.pdf

Morisano, D., Robinson, M., Rush, B., & Linklater, R. (2024). Conducting research with Indigenous peoples in Canada: Ethical and policy considerations. Frontiers in Psychology, 14, 1-16. https://doi.org/10.3389/fpsyg.2023.1214121

National Inquiry into Missing and Murdered Indigenous Women and Girls. (2019). Reclaiming power and place. The final report of the national inquiry into missing and murdered Indigenous women and girls [PDF]. The National Inquiry. https://www.mmiwg-ffada.ca/wp-content/uploads/2019/06/Final_Report_Vol_1a-1.pdf

Palys, T. S. (1997). Research decisions: Quantitative and qualitative perspectives (2nd ed.). Harcourt Canada.

Schulenberg, J. L. (2016). The dynamics of criminological research. Oxford University Press.

Statistics Canada. (2024, July 25).  Number, rate and percentage changes in rates of homicide victims [Data set] (Table 35-10-0068-01). https://doi.org/10.25318/3510006801-eng

Statistics Canada. (2024, December 11). Homicide in Canada, 2023 [Infographic] (Catalogue no, 11-627-M). Canadian Centre for Justice and Community Safety Statistics. https://www150.statcan.gc.ca/n1/pub/11-627-m/11-627-m2024048-eng.htm

Ulluwishewa, R., Roskruge, N., Harmsworth, G., & Antaran, B. (2008). Indigenous knowledge for natural resource management: A comparative study of Maori in New Zealand and Dusun in Brunei Darussalam. GeoJournal, 73(4), 271-284. https://doi.org/10.1007/s10708-008-9198-9

Wilk, P., Maltby, A., & Cooke, M. (2017). Residential schools and the effects on Indigenous health and well-being in Canada: A scoping review. Public Health Reviews, 38(8), 1-23. https://doi.org/10.1186/s40985-017-0055-6

definition