In statistics, sample variance (s²) plays a crucial role in understanding how spread out data points are within a sample. It reflects the average squared distance of each data point in the sample from the sample mean (x̄). A higher variance indicates a wider spread of data points, while a lower variance suggests the data points are clustered closer to the mean.
Sample variance provides valuable insights into data variability within a sample. This information is beneficial for:
It’s important to distinguish between sample variance (s²) and population variance (σ²). Population variance reflects the variability within the entire population, while sample variance estimates the population variance using data from a subset (sample) of the population. Obtaining data for the entire population can be impractical or impossible in many real-world scenarios. Sample variance plays a significant role when complete population data isn’t available.
Here’s the mathematical formula for calculating sample variance:
s² = Σ(xi - x̄)² / (n - 1)
Explanation of the Formula:
Steps for Calculating :
Consider a sample data set representing the test scores of 5 students: {85, 72, 90, 88, 65}.
x̄ = (85 + 72 + 90 + 88 + 65) / 5
x̄ = 400 / 5
x̄ = 80
Individual | Value (xi) | Deviation (xi – x̄) | Squared Deviation (xi – x̄)² |
---|---|---|---|
1 | 85 | 5 | 25 |
2 | 72 | -8 | 64 |
3 | 90 | 10 | 100 |
4 | 88 | 8 | 64 |
5 | 65 | -15 | 225 |
25 + 64 + 100 + 64 + 225 = 478
s² = 478 / (5 - 1) s² = 478 / 4 s² = 119.5
Therefore, the sample variance (s²) for this example is 119.5, indicating a moderate spread of test scores within this sample.
Here’s a Python code example demonstrating how to calculate sample variance:
sample_data = [85, 72, 90, 88, 65]
sample_mean = sum(sample_data) / len(sample_data)
squared_deviations = [(x - sample_mean) ** 2 for x in sample_data]
sample_variance = sum(squared_deviations) / (len(sample_data) - 1)
print("Sample Mean:", sample_mean)
print("Sample Variance:", sample_variance)
Explanation:
sample_data
containing the test scores (replace with your actual data).sample_mean
is calculated by summing the elements in sample_data
and dividing by the total number of elements (using len(sample_data)
).squared_deviations = [(x - sample_mean) ** 2 for x in sample_data]
iterates through each element x
in sample_data
. It subtracts the sample_mean
from each element, squares the result using the exponent **2
, and creates a new list containing the squared deviations from the mean for all elements.sample_variance
is calculated by summing the squared deviations in squared_deviations
and dividing by the total number of elements in the sample minus one (len(sample_data) - 1
). This adjustment (n-1) is used to obtain an unbiased estimate of the population variance.sample_mean
and sample_variance
are printed.Sample Mean: 80.0
Sample Variance: 119.5
We hope you found the information helpful! If you learned something valuable, consider sharing it with your friends, family, and social networks.
Reference: Variance | Brilliant Math & Science Wiki
Hi, I am Vishal Jaiswal, I have about a decade of experience of working in MNCs like Genpact, Savista, Ingenious. Currently i am working in EXL as a senior quality analyst. Using my writing skills i want to share the experience i have gained and help as many as i can.
Welcome to Day 13 of Learning Python for Data Science! Today, we’re focusing on three…
Test your understanding of Python Data Structure, which we learned in our previous lesson of…
Welcome to Day 12 of Learning Python for Data Science. Today, we’ll dive into Pandas,…
NumPy Array in Python is a powerful library for numerical computing in Python. It provides…
Welcome to Day 9 of Learning Python for Data Science. Today we will explore comprehensions,…
Test your understanding of Python Data Structure, which we learned in our previous lesson of…