Standard deviation (SD) is a crucial statistic that measures how dispersed data points are within a dataset. It is expressed as σ (sigma) for populations and s (small sigma) for samples. In essence, it is the average squared difference between each data point and the data set mean (average).
A higher standard deviation shows a larger range of data points, implying more variability within the data. In contrast, a lower standard deviation indicates that the data points are clustered closer to the mean, implying less unpredictability.
Standard deviation offers valuable insights for data analysis:
There are two formulas for calculating standard deviation, depending on whether you have data for the entire population or a sample:
This formula is used when you have data for all elements in a population.
σ = √(Σ(xi - μ)² / N)
Explanation of the Formula:
Steps for Calculating Population Standard Deviation:
This formula, already explained earlier, is used when you only have data from a subset (sample) of the population and is often used as an estimate of the population standard deviation.
s = √(Σ(xi - x̄)² / (n - 1))
Explanation of the Formula:
Steps for Calculating Standard Deviation (Sample):
Population Standard Deviation:
data = [10, 8, 13, 9, 11, 14, 6, 4, 12, 7]
population_mean = sum(data) / len(data)
squared_deviations = [ (x - population_mean)**2 for x in data ]
sum_of_squared_deviations = sum(squared_deviations)
population_variance = sum_of_squared_deviations / len(data)
population_std = np.sqrt(population_variance) # Use numpy.sqrt for efficiency
print("Population:")
print(f" Mean: {population_mean:.2f}")
print(f" Standard Deviation: {population_std:.2f}")
Population:
Mean: 9.40
Standard Deviation: 3.04
Explanation:
sum
and division).x
) and subtract the population mean. Then, we square the difference to get the squared deviation from the mean. List comprehension is used to create a list of squared deviations for all data points.sum
).numpy.sqrt
for efficiency.Sample Standard Deviation:
data = [10, 8, 13, 9, 11, 14, 6, 4, 12, 7]
sample_mean = sum(data) / len(data)
squared_deviations = [ (x - sample_mean)**2 for x in data ]
sum_of_squared_deviations = sum(squared_deviations)
sample_variance = sum_of_squared_deviations / (len(data) - 1)
sample_std = np.sqrt(sample_variance) # Use numpy.sqrt for efficiency
print("\nSample:")
print(f" Mean: {sample_mean:.2f}")
print(f" Standard Deviation: {sample_std:.2f}")
Sample:
Mean: 9.40
Standard Deviation: 3.20
Explanation (Sample):
The steps are similar to the population case, but with one key difference:
Note: Although calculating the population standard deviation might be feasible for smaller populations, for larger populations, obtaining data for all elements can be impractical. In such cases, sample standard deviation (using data from a representative sample) is a more practical approach to estimate the population standard deviation.
We hope you found the information helpful! If you learned something valuable, consider sharing it with your friends, family, and social networks.
Reference: Variance | Brilliant Math & Science Wiki
Hi, I am Vishal Jaiswal, I have about a decade of experience of working in MNCs like Genpact, Savista, Ingenious. Currently i am working in EXL as a senior quality analyst. Using my writing skills i want to share the experience i have gained and help as many as i can.
SQL Interview Question at Zomato: These questions were recently asked in interview at Zomato, you…
Introduction: SQL Indexing and Query Optimization SQL indexing is a critical concept that can drastically…
This article is about the SQL Interview Questions asked by Walmart for their Data Analyst…
You must be able to answer these SQL Interview Questions if you are applying for…
This article tackles common SQL Interview Questions asked by EY, offering detailed solutions and explanations…
1164. Product Price at a Given Date: Learn how to track and select price from…