Descriptive measures in statistics: a complete guide and examples

  • Descriptive measures summarize large data sets using values ​​of position (mean, median, mode) and dispersion (range, variance, standard deviation, coefficient of variation).
  • The mean, median, and mode describe the typical value from different perspectives, with the median and mode being more robust when there are extreme data or highly skewed distributions.
  • Measures of dispersion quantify the variability of data around the central position and are essential for correctly interpreting any mean or representative value.
  • Combining measures of position and dispersion allows for a better understanding of the behavior of a variable and provides a basis for data-driven decisions in academic, professional, and everyday contexts.

descriptive measures

descriptive measures They are the foundation of descriptive statistics, and although they may sound very technical, we actually use them daily without realizing it. Every time you talk about "the average grade on an exam," "most people prefer..." or that "there's a big difference between salaries," you are unknowingly using ideas related to these measures.

In any analysis of dataWhether in social sciences, economics, health, or in the day-to-day running of a business, we need tools that help us to to summarize, organize, and understand large amounts of informationThat is exactly what descriptive measures do: they condense lots of numbers into a few easy-to-interpret values, allowing us to see at a glance what is happening with the variables we are studying.

What are descriptive measures and what are they used for?

When we talk about descriptive measures in statistics We are referring to a set of numerical values ​​calculated from the data that summarize different aspects of its behavior. These measures allow us to answer questions such as: What is the "typical" value? How are the data grouped? Is there much variability? How similar are the observations to each other?

To organize this task, descriptive measures are usually grouped into several categories: measures of position or central tendency, which indicate where the data ā€œis locatedā€; measures of dispersionwhich tell us how far apart they are from each other; and other complementary measures, such as relative coefficients or some measures of shape, which can help to refine the analysis.

In many university notes and manuals, descriptive measures appear accompanied by simple numerical examplessuch as small sets of notes, salaries, or quantities sold. This is because the best way to understand what each of these measures contributes is to see how they are calculated and what specific information they give us about a large set of observations.

descriptive statistical measures

Measures of position or central tendency

measures of position or central tendency They tell us which value can be considered most representative of a data set. They give us an idea of ​​where the observations are "concentrated," or what the typical or central value would be, so to speak. The most commonly used are the media, the median and fashioneach with its own peculiarities and recommended uses.

These measures apply to both ungrouped data (lists of individual values) and data grouped in frequency tables. Many university statistics materials begin by explaining the simplest version, using a set of values ​​X1, X2, …, Xn, and then generalize to more complex cases. The important thing is to understand what each measure represents and when it makes sense to use it.

Sample mean or average

La sample mean It is probably the best-known measure of central tendency. If we have n data points X1, X2, …, Xn, the sample mean is their arithmetic mean, that is, the sum of all the values ​​divided by the total number of observations. It is the famous ā€œaverageā€ that is mentioned when discussing grades, salaries, waiting times, or any other quantitative magnitude.

Formally, if we call XĢ„ The sample mean is calculated as XĢ„ = (X1 + X2 + … + Xn) / n. Many notes present this definition in a concise form, but the essential point is to understand that the mean distributes the "total quantity" evenly among all individuals. For example, if we add up the grades of a group of students and divide by the number of students, we obtain an average grade that indicates the overall performance of the group.

The sample mean has very useful properties, but also a significant drawback: It is very sensitive to extreme values.If a value within a dataset is significantly larger or smaller than the rest, that value will "pull" the mean up or down. Therefore, in situations with very pronounced outliers, it may not be the most appropriate measure to summarize the overall position.

Sample mode

La sample mode It is the value that appears most frequently in the data set. It is defined as the data point with the highest absolute frequency, that is, the one that appears most often. Unlike the mean, it is not obtained through arithmetic operations, but rather by counting how many times each possible value is observed.

An important detail is that fashion It may not exist or may not be uniqueIt is possible for all values ​​to be distinct and none to be repeated more than once; in that case, the distribution is said to be amodal. It is also possible for two or more values ​​to have the same maximum frequency; then we speak of bimodal or multimodal distributions, and several modes are considered simultaneously.

Fashion is especially useful when we work with qualitative or categorical variableswhere calculating a numerical average doesn't make sense. For example, if we want to know the majority preference among several response options, the mode tells us which category is chosen most frequently by the people surveyed.

Sample median

La sample median The median is the value that occupies the central position when the data is ordered from smallest to largest. To obtain it, the observations are first reordered, and then the middle value is located. If the number of observations is odd, the median is exactly that central value; if it is even, it is defined as the average of the two central values.

In many simple examples, these steps are explained very visually: the data is listed, ordered, and then crossed out from the outside in until only the one in the center remains. This makes it clear that the median is the point that divide the sample into two halves: 50% of the data falls below and the other 50% above.

An interesting detail is that, unlike the mean, the median It is not as affected by extreme values.If we add a very large or very small value to an ordered set, the median may barely change, while the mean shifts significantly. Therefore, when outliers exist or the distribution is highly skewed, the median is often considered a more robust measure of central tendency.

Example of mean, median, and mode

Teaching materials often include an example similar to this: the data 3, 5, 7, 7, 8, 9 are given, and the main measures of central tendency are requested. The mean is calculated by adding all the values ​​and dividing by the total number of data points: (3 + 5 + 7 + 7 + 8 + 9) / 6 = 39 / 6 = 6,5. Thus, the The sample mean is 6,5, which would be the average value of the set.

Looking at the ordered list, we see that the central values ​​are the third and fourth, which in this case are 7 and 7. The median is obtained by taking the average of these two values, which gives a sample median equal to 7Since they are repeated, the median coincides exactly with that value, which acts as a point of equilibrium for the lower 50% and the upper 50%.

Regarding the mode, simply look at which value appears most often. In this example, the number 7 is repeated twice, while the others appear only once. Therefore, the The sample mode is also 7In this data set, the median and mode happen to coincide, but this is not always the case.

These types of simple numerical examples are very common in presentations of Descriptive Statistics and serve to reinforce the idea that Each positional measurement adds a distinct nuanceThe mean describes the overall balance, the median the central position resistant to extreme values, and the mode the most frequent category or value.

Measures of dispersion

In addition to knowing what the typical or central value is, it is also important to know how scattered is the data around from that position. It's not the same for all the observations to be very close together as it is for there to be large differences between them. The measures of dispersion They precisely quantify that variability, that is, the degree of dispersion of the data with respect to its central tendency.

In the most basic statistics notes, several measures of dispersion are presented: the rango, the variance, the typical deviation and the coefficient of variationEach one provides a different way of seeing how much difference there is between the values ​​of a variable, and they are used in a complementary way to have a complete view of the behavior of the data.

Range

El rango It is probably the simplest measure of dispersion of all. It is defined as the difference between the maximum and minimum values ​​of the data. If we order the observations from smallest to largest as X1 ≤ X2 ≤ … ≤ Xn, the range is calculated as R = Xn āˆ’ X1. This formula frequently appears in teaching materials as the first approach to the idea of ​​dispersion.

The range gives us immediate information about the total range of the dataThis indicates how many units separate the smallest value from the largest. However, it has a significant limitation: it only depends on these two extreme values ​​and does not take into account how the intermediate data is distributed. Therefore, although useful as a quick indicator, it is usually complemented by other, more complex measures.

Variance

La variance It is a measure of dispersion that takes into account all observations and is based on the squared differences from the mean. Intuitively, it measures how far, on average, the data deviate from the sample mean. The greater the variance, the more dispersed the values ​​are around the mean; the smaller the variance, the more concentrated they are.

In statistics notes, variance is presented with its formal definition, but when it comes to calculating it A more convenient equivalent formula is usually used.This avoids having to work directly with all the squared differences. For sample data, variance is usually represented by s² and, although not all the theoretical details are always covered at the initial levels, it is emphasized that it is a mean of squared deviations from the mean.

One aspect to consider is that the variance is expressed in square unitsFor example, if the variable is measured in euros, the variance is measured in square euros, which doesn't have such an intuitive, direct interpretation. This is one of the reasons why the standard deviation is often preferred, as it returns the measurement to the original units.

Standard deviation

La typical deviationThe standard deviation, also called the variance, is the square root of the variance. Thus, if the sample variance is s², the standard deviation is s. Taking the square root returns us to the original units of the variable, making this measure much easier to interpret in practice.

In university teaching of statistics, it is emphasized that the standard deviation It should not be confused with the typical error or standard error.Although the names are similar, standard error is a concept in inferential statistics related to the variability of an estimator, while standard deviation is a measure of the dispersion of data within a sample. This distinction is often emphasized with expressions like "be careful not to confuse them" to avoid misunderstandings.

The standard deviation tells us, approximately, how far the data points are from the mean. In many distributions, a significant proportion of the observations lie within the interval between the mean minus one standard deviation and the mean plus one standard deviation. Therefore, this measure is fundamental when... assess stability or variability of the data in numerous contexts.

Coefficient of variation

El coefficient of variation It is a measure of relative dispersion that relates the standard deviation to the mean. Although the exact expression may vary depending on the convention, it is usually defined as the ratio between the standard deviation and the mean, often multiplied by 100 to express it as a percentage. It is a useful tool when we want to compare the variability of different variables that have very different averages or different units.

For example, if we compare salaries in two sectors with different scales, one sector might have a larger standard deviation in absolute terms but a lower coefficient of variation, indicating that, proportionally, salaries are more concentrated around their mean. In this sense, the coefficient of variation is particularly interesting for analyze the dispersion in relative terms, beyond the specific units of measurement.

Relationship between measures of position and dispersion

In any minimally rigorous data analysis, it makes no sense to focus solely on the mean or the standard deviation. The usual approach is to combine them. measures of position and dispersion to obtain a richer view. For example, we can have two groups with the same mean but with very different variances; in that case, although the central value is the same, the reality of each group is very different.

Measures of position tell us in what environment the data is moving, while measures of dispersion clarify how they are distributed around that environmentA high mean with little dispersion indicates a homogeneous group with high values; a similar mean with high dispersion reflects large differences between individual observations. Therefore, in practice, statistical tables and summaries usually include at least one measure of central tendency and one of variability.

The following structure frequently appears in the Descriptive Statistics materials of various universities: definition of the variable, frequency table, graphic representationCalculation of mean, median, mode, and then range, variance, standard deviation, and coefficient of variation. This sequence of work reflects the idea that The descriptive measures form a coherent block which allows understanding the data from several complementary angles.

Furthermore, as the subject is explored in greater depth, other related measures can be introduced, such as percentiles, quartiles, or measures of skewness and kurtosis, which broaden the descriptive analysis. However, the core concepts explained in the initial topics of statistics typically consist precisely of the measures of position and dispersion that we have discussed.

This entire set of tools has a very clear practical use: facilitate data-driven decision-makingWhether in the university environmentIn public administration or in a private company, knowing the position and variability of a key variable helps to correctly interpret the available information and avoid hasty conclusions.

Descriptive statistics, and within it descriptive measures, allow us to transform endless lists of numbers into a few manageable and intuitive indicators. Thanks to the mean, median, mode, range, variance, standard deviation, and relative coefficients, we can gain a very precise understanding of how data behaves, detect patterns, identify anomalies, and lay the groundwork for more advanced analyses if needed.

Balearic Islands Statistics Institute
Related article:
Balearic Islands Statistics Institute: functions, data and key resources