Standard deviation (SD) is a crucial statistic that measures how dispersed data points are within a dataset. It is expressed as σ (sigma) for populations and s (small sigma) for samples. In essence, it is the average squared difference between each data point and the data set mean (average).
A higher standard deviation shows a larger range of data points, implying more variability within the data. In contrast, a lower standard deviation indicates that the data points are clustered closer to the mean, implying less unpredictability.
Standard deviation offers valuable insights for data analysis:
There are two formulas for calculating standard deviation, depending on whether you have data for the entire population or a sample:
This formula is used when you have data for all elements in a population.
σ = √(Σ(xi - μ)² / N)
Explanation of the Formula:
Steps for Calculating Population Standard Deviation:
This formula, already explained earlier, is used when you only have data from a subset (sample) of the population and is often used as an estimate of the population standard deviation.
s = √(Σ(xi - x̄)² / (n - 1))
Explanation of the Formula:
Steps for Calculating Standard Deviation (Sample):
Population Standard Deviation:
data = [10, 8, 13, 9, 11, 14, 6, 4, 12, 7]
population_mean = sum(data) / len(data)
squared_deviations = [ (x - population_mean)**2 for x in data ]
sum_of_squared_deviations = sum(squared_deviations)
population_variance = sum_of_squared_deviations / len(data)
population_std = np.sqrt(population_variance) # Use numpy.sqrt for efficiency
print("Population:")
print(f" Mean: {population_mean:.2f}")
print(f" Standard Deviation: {population_std:.2f}")
Population:
Mean: 9.40
Standard Deviation: 3.04
Explanation:
sum
and division).x
) and subtract the population mean. Then, we square the difference to get the squared deviation from the mean. List comprehension is used to create a list of squared deviations for all data points.sum
).numpy.sqrt
for efficiency.Sample Standard Deviation:
data = [10, 8, 13, 9, 11, 14, 6, 4, 12, 7]
sample_mean = sum(data) / len(data)
squared_deviations = [ (x - sample_mean)**2 for x in data ]
sum_of_squared_deviations = sum(squared_deviations)
sample_variance = sum_of_squared_deviations / (len(data) - 1)
sample_std = np.sqrt(sample_variance) # Use numpy.sqrt for efficiency
print("\nSample:")
print(f" Mean: {sample_mean:.2f}")
print(f" Standard Deviation: {sample_std:.2f}")
Sample:
Mean: 9.40
Standard Deviation: 3.20
Explanation (Sample):
The steps are similar to the population case, but with one key difference:
Note: Although calculating the population standard deviation might be feasible for smaller populations, for larger populations, obtaining data for all elements can be impractical. In such cases, sample standard deviation (using data from a representative sample) is a more practical approach to estimate the population standard deviation.
We hope you found the information helpful! If you learned something valuable, consider sharing it with your friends, family, and social networks.
Reference: Variance | Brilliant Math & Science Wiki
Hi, I am Vishal Jaiswal, I have about a decade of experience of working in MNCs like Genpact, Savista, Ingenious. Currently i am working in EXL as a senior quality analyst. Using my writing skills i want to share the experience i have gained and help as many as i can.
Python Practice Questions & Solutions Day 5 of Learning Python for Data Science Welcome back…
Day 5 of Learning Python for Data Science: Data Types, Typecasting, Indexing, and Slicing Understanding…
Python Practice Questions & Solutions Day 4 of Learning Python for Data Science Welcome back…
Day 4 of Learning Python for Data Science Day 4 of Learning Python for Data…
Test your Python skills with these 20 practice questions and solutions from Day 3 of…
Understanding Python’s conditional statements is essential for controlling the flow of a program. Today, we…