What is Sample Variance?
In statistics, sample variance (s²) plays a crucial role in understanding how spread out data points are within a sample. It reflects the average squared distance of each data point in the sample from the sample mean (x̄). A higher variance indicates a wider spread of data points, while a lower variance suggests the data points are clustered closer to the mean.
Why is this Important?
Sample variance provides valuable insights into data variability within a sample. This information is beneficial for:
- Understanding data distribution: Sample variance helps visualize how data is distributed around the mean, offering valuable insights compared to just the mean alone.
- Statistical analysis: It serves as the foundation for calculating other statistical measures like sample standard deviation (which is the square root of sample variance).
- Comparing datasets from different samples: Sample variances can be used to compare the variability of data between different samples drawn from the same population.
Sample Variance vs. Population Variance (σ²)
It’s important to distinguish between sample variance (s²) and population variance (σ²). Population variance reflects the variability within the entire population, while sample variance estimates the population variance using data from a subset (sample) of the population. Obtaining data for the entire population can be impractical or impossible in many real-world scenarios. Sample variance plays a significant role when complete population data isn’t available.
How to calculate ?
Here’s the mathematical formula for calculating sample variance:
s² = Σ(xi - x̄)² / (n - 1)
Explanation of the Formula:
- Σ (sigma) represents the sum of all elements.
- xi (x-subscript-i) represents each individual value in the sample data set.
- x̄ (x-bar) represents the sample mean (calculated by summing the values in the sample and dividing by the number of elements in the sample – n).
- n represents the total number of elements in the sample.
Steps for Calculating :
- Calculate the sample mean (x̄): Sum the values in your sample and divide by the number of elements (n).
- Compute squared deviations from the mean: For each data point (xi) in the sample, subtract the sample mean (x̄) and square the result. This represents the squared deviation of that particular element from the mean.
- Sum the squared deviations: Add up the squared deviations calculated for all elements in the sample.
- Calculate : Divide the sum of squared deviations from step 3 by the number of elements in the sample minus one (n – 1). This gives you the average squared deviation from the mean, representing the sample variance (s²).
Examples
Consider a sample data set representing the test scores of 5 students: {85, 72, 90, 88, 65}.
- Sample Mean (x̄):
x̄ = (85 + 72 + 90 + 88 + 65) / 5
x̄ = 400 / 5
x̄ = 80
- Squared Deviations from the Mean:
Individual | Value (xi) | Deviation (xi – x̄) | Squared Deviation (xi – x̄)² |
---|---|---|---|
1 | 85 | 5 | 25 |
2 | 72 | -8 | 64 |
3 | 90 | 10 | 100 |
4 | 88 | 8 | 64 |
5 | 65 | -15 | 225 |
- Sum of Squared Deviations:
25 + 64 + 100 + 64 + 225 = 478
- Sample Variance (s²):
s² = 478 / (5 - 1) s² = 478 / 4 s² = 119.5
Therefore, the sample variance (s²) for this example is 119.5, indicating a moderate spread of test scores within this sample.
How to calculate It in Python?
Here’s a Python code example demonstrating how to calculate sample variance:
sample_data = [85, 72, 90, 88, 65]
sample_mean = sum(sample_data) / len(sample_data)
squared_deviations = [(x - sample_mean) ** 2 for x in sample_data]
sample_variance = sum(squared_deviations) / (len(sample_data) - 1)
print("Sample Mean:", sample_mean)
print("Sample Variance:", sample_variance)
Explanation:
- Sample Data: This line defines a list
sample_data
containing the test scores (replace with your actual data). - Sample Mean: The
sample_mean
is calculated by summing the elements insample_data
and dividing by the total number of elements (usinglen(sample_data)
). - Squared Deviations: The list comprehension
squared_deviations = [(x - sample_mean) ** 2 for x in sample_data]
iterates through each elementx
insample_data
. It subtracts thesample_mean
from each element, squares the result using the exponent**2
, and creates a new list containing the squared deviations from the mean for all elements. - Sample Variance: The
sample_variance
is calculated by summing the squared deviations insquared_deviations
and dividing by the total number of elements in the sample minus one (len(sample_data) - 1
). This adjustment (n-1) is used to obtain an unbiased estimate of the population variance. - Print Results: The calculated
sample_mean
andsample_variance
are printed.
Sample Mean: 80.0
Sample Variance: 119.5
We hope you found the information helpful! If you learned something valuable, consider sharing it with your friends, family, and social networks.
Reference: Variance | Brilliant Math & Science Wiki
Also Read:
- Understanding Population Variance
- Central Tendency: Sample Mean and Population Mean
- Difference between percentage and percentile
- 3 Measures of Central Tendency: Mean, Media, Mode
FAQs
Hi, I am Vishal Jaiswal, I have about a decade of experience of working in MNCs like Genpact, Savista, Ingenious. Currently i am working in EXL as a senior quality analyst. Using my writing skills i want to share the experience i have gained and help as many as i can.