What is Standard Deviation?
Standard deviation (SD) is a crucial statistic that measures how dispersed data points are within a dataset. It is expressed as σ (sigma) for populations and s (small sigma) for samples. In essence, it is the average squared difference between each data point and the data set mean (average).
A higher standard deviation shows a larger range of data points, implying more variability within the data. In contrast, a lower standard deviation indicates that the data points are clustered closer to the mean, implying less unpredictability.
Why is Standard Deviation Important?
Standard deviation offers valuable insights for data analysis:
- Understanding data distribution: It goes beyond the mean to give a more complete picture of how data is spread around it.
- Statistical analysis: It serves as the foundation for calculating other statistical measures, such as variance and z-scores, which are used in hypothesis testing and inference statistics.
- Comparing datasets: Standard deviations can be used to compare the variability of data across multiple datasets.
How to calculate Standard Deviation?
There are two formulas for calculating standard deviation, depending on whether you have data for the entire population or a sample:
Population Standard Deviation (σ):
This formula is used when you have data for all elements in a population.
σ = √(Σ(xi - μ)² / N)
Explanation of the Formula:
- Σ (sigma) represents the sum of all elements.
- xi (x-subscript-i) represents each individual value in the population data set.
- μ (mu) represents the population mean (calculated by summing the values in the population and dividing by the total number of elements in the population – N).
- N represents the total number of elements in the population.
Steps for Calculating Population Standard Deviation:
- Calculate the population mean (μ): Sum the values in your population data and divide by the number of elements (N).
- Compute squared deviations from the mean: For each data point (xi) in the population, subtract the population mean (μ) and square the result. This represents the squared deviation of that particular element from the mean.
- Sum the squared deviations: Add up the squared deviations calculated for all elements in the population.
- Calculate population variance (σ²): Divide the sum of squared deviations from step 3 by the total number of elements in the population (N). This gives you the average squared deviation from the mean, representing the population variance (σ²).
- Calculate population standard deviation (σ): Take the square root of the population variance (σ²) to obtain the population standard deviation (σ).
Sample Standard Deviation (s):
This formula, already explained earlier, is used when you only have data from a subset (sample) of the population and is often used as an estimate of the population standard deviation.
s = √(Σ(xi - x̄)² / (n - 1))
Explanation of the Formula:
- Σ (sigma) represents the sum of all elements.
- xi (x-subscript-i) represents each individual value in the sample data set.
- x̄ (x-bar) represents the sample mean (average of the sample data).
- n represents the total number of elements in the sample.
Steps for Calculating Standard Deviation (Sample):
- Calculate the sample mean (x̄): Sum the values in your sample and divide by the number of elements (n).
- Compute squared deviations from the mean: For each data point (xi) in the sample, subtract the sample mean (x̄) and square the result. This represents the squared deviation of that particular element from the mean.
- Sum the squared deviations: Add up the squared deviations calculated for all elements in the sample.
- Calculate sample variance (s²): Divide the sum of squared deviations from step 3 by the number of elements in the sample minus one (n – 1). This gives you the average squared deviation from the mean, representing the sample variance (s²).
- Calculate sample standard deviation (s): Take the square root of the sample variance (s²) to obtain the sample standard deviation (s).
Population Standard Deviation:
data = [10, 8, 13, 9, 11, 14, 6, 4, 12, 7]
population_mean = sum(data) / len(data)
squared_deviations = [ (x - population_mean)**2 for x in data ]
sum_of_squared_deviations = sum(squared_deviations)
population_variance = sum_of_squared_deviations / len(data)
population_std = np.sqrt(population_variance) # Use numpy.sqrt for efficiency
print("Population:")
print(f" Mean: {population_mean:.2f}")
print(f" Standard Deviation: {population_std:.2f}")
Population:
Mean: 9.40
Standard Deviation: 3.04
Explanation:
- Calculate Population Mean: We calculate the mean by adding all the data points and dividing by the number of data points (using
sum
and division). - Compute Squared Deviations: We iterate through each data point (
x
) and subtract the population mean. Then, we square the difference to get the squared deviation from the mean. List comprehension is used to create a list of squared deviations for all data points. - Sum the Squared Deviations: We add up all the squared deviations from step 2 (using
sum
). - Calculate Population Variance: We divide the sum of squared deviations by the number of data points (population size) to get the population variance.
- Calculate Population Standard Deviation: We take the square root of the population variance using
numpy.sqrt
for efficiency.
Sample Standard Deviation:
data = [10, 8, 13, 9, 11, 14, 6, 4, 12, 7]
sample_mean = sum(data) / len(data)
squared_deviations = [ (x - sample_mean)**2 for x in data ]
sum_of_squared_deviations = sum(squared_deviations)
sample_variance = sum_of_squared_deviations / (len(data) - 1)
sample_std = np.sqrt(sample_variance) # Use numpy.sqrt for efficiency
print("\nSample:")
print(f" Mean: {sample_mean:.2f}")
print(f" Standard Deviation: {sample_std:.2f}")
Sample:
Mean: 9.40
Standard Deviation: 3.20
Explanation (Sample):
The steps are similar to the population case, but with one key difference:
- Unbiased Sample Variance: In the sample standard deviation calculation, we divide the sum of squared deviations by (n – 1) instead of n. This corrects for a small bias when estimating the population variance from a finite sample.
Note: Although calculating the population standard deviation might be feasible for smaller populations, for larger populations, obtaining data for all elements can be impractical. In such cases, sample standard deviation (using data from a representative sample) is a more practical approach to estimate the population standard deviation.
We hope you found the information helpful! If you learned something valuable, consider sharing it with your friends, family, and social networks.
Reference: Variance | Brilliant Math & Science Wiki
Also Read:
- Central Tendency: Sample Mean and Population Mean
- Difference between percentage and percentile
- 3 Measures of Central Tendency: Mean, Media, Mode
Hi, I am Vishal Jaiswal, I have about a decade of experience of working in MNCs like Genpact, Savista, Ingenious. Currently i am working in EXL as a senior quality analyst. Using my writing skills i want to share the experience i have gained and help as many as i can.