Standard deviation (SD) is a crucial statistic that measures how dispersed data points are within a dataset. It is expressed as σ (sigma) for populations and s (small sigma) for samples. In essence, it is the average squared difference between each data point and the data set mean (average).
A higher standard deviation shows a larger range of data points, implying more variability within the data. In contrast, a lower standard deviation indicates that the data points are clustered closer to the mean, implying less unpredictability.
Standard deviation offers valuable insights for data analysis:
There are two formulas for calculating standard deviation, depending on whether you have data for the entire population or a sample:
This formula is used when you have data for all elements in a population.
σ = √(Σ(xi - μ)² / N)
Explanation of the Formula:
Steps for Calculating Population Standard Deviation:
This formula, already explained earlier, is used when you only have data from a subset (sample) of the population and is often used as an estimate of the population standard deviation.
s = √(Σ(xi - x̄)² / (n - 1))
Explanation of the Formula:
Steps for Calculating Standard Deviation (Sample):
Population Standard Deviation:
data = [10, 8, 13, 9, 11, 14, 6, 4, 12, 7]
population_mean = sum(data) / len(data)
squared_deviations = [ (x - population_mean)**2 for x in data ]
sum_of_squared_deviations = sum(squared_deviations)
population_variance = sum_of_squared_deviations / len(data)
population_std = np.sqrt(population_variance) # Use numpy.sqrt for efficiency
print("Population:")
print(f" Mean: {population_mean:.2f}")
print(f" Standard Deviation: {population_std:.2f}")
Population:
Mean: 9.40
Standard Deviation: 3.04
Explanation:
sum
and division).x
) and subtract the population mean. Then, we square the difference to get the squared deviation from the mean. List comprehension is used to create a list of squared deviations for all data points.sum
).numpy.sqrt
for efficiency.Sample Standard Deviation:
data = [10, 8, 13, 9, 11, 14, 6, 4, 12, 7]
sample_mean = sum(data) / len(data)
squared_deviations = [ (x - sample_mean)**2 for x in data ]
sum_of_squared_deviations = sum(squared_deviations)
sample_variance = sum_of_squared_deviations / (len(data) - 1)
sample_std = np.sqrt(sample_variance) # Use numpy.sqrt for efficiency
print("\nSample:")
print(f" Mean: {sample_mean:.2f}")
print(f" Standard Deviation: {sample_std:.2f}")
Sample:
Mean: 9.40
Standard Deviation: 3.20
Explanation (Sample):
The steps are similar to the population case, but with one key difference:
Note: Although calculating the population standard deviation might be feasible for smaller populations, for larger populations, obtaining data for all elements can be impractical. In such cases, sample standard deviation (using data from a representative sample) is a more practical approach to estimate the population standard deviation.
We hope you found the information helpful! If you learned something valuable, consider sharing it with your friends, family, and social networks.
Reference: Variance | Brilliant Math & Science Wiki
Hi, I am Vishal Jaiswal, I have about a decade of experience of working in MNCs like Genpact, Savista, Ingenious. Currently i am working in EXL as a senior quality analyst. Using my writing skills i want to share the experience i have gained and help as many as i can.
Welcome to Day 13 of Learning Python for Data Science! Today, we’re focusing on three…
Test your understanding of Python Data Structure, which we learned in our previous lesson of…
Welcome to Day 12 of Learning Python for Data Science. Today, we’ll dive into Pandas,…
NumPy Array in Python is a powerful library for numerical computing in Python. It provides…
Welcome to Day 9 of Learning Python for Data Science. Today we will explore comprehensions,…
Test your understanding of Python Data Structure, which we learned in our previous lesson of…