In statistics, sample variance (s²) plays a crucial role in understanding how spread out data points are within a sample. It reflects the average squared distance of each data point in the sample from the sample mean (x̄). A higher variance indicates a wider spread of data points, while a lower variance suggests the data points are clustered closer to the mean.
Sample variance provides valuable insights into data variability within a sample. This information is beneficial for:
It’s important to distinguish between sample variance (s²) and population variance (σ²). Population variance reflects the variability within the entire population, while sample variance estimates the population variance using data from a subset (sample) of the population. Obtaining data for the entire population can be impractical or impossible in many real-world scenarios. Sample variance plays a significant role when complete population data isn’t available.
Here’s the mathematical formula for calculating sample variance:
s² = Σ(xi - x̄)² / (n - 1)
Explanation of the Formula:
Steps for Calculating :
Consider a sample data set representing the test scores of 5 students: {85, 72, 90, 88, 65}.
x̄ = (85 + 72 + 90 + 88 + 65) / 5
x̄ = 400 / 5
x̄ = 80
Individual | Value (xi) | Deviation (xi – x̄) | Squared Deviation (xi – x̄)² |
---|---|---|---|
1 | 85 | 5 | 25 |
2 | 72 | -8 | 64 |
3 | 90 | 10 | 100 |
4 | 88 | 8 | 64 |
5 | 65 | -15 | 225 |
25 + 64 + 100 + 64 + 225 = 478
s² = 478 / (5 - 1) s² = 478 / 4 s² = 119.5
Therefore, the sample variance (s²) for this example is 119.5, indicating a moderate spread of test scores within this sample.
Here’s a Python code example demonstrating how to calculate sample variance:
sample_data = [85, 72, 90, 88, 65]
sample_mean = sum(sample_data) / len(sample_data)
squared_deviations = [(x - sample_mean) ** 2 for x in sample_data]
sample_variance = sum(squared_deviations) / (len(sample_data) - 1)
print("Sample Mean:", sample_mean)
print("Sample Variance:", sample_variance)
Explanation:
sample_data
containing the test scores (replace with your actual data).sample_mean
is calculated by summing the elements in sample_data
and dividing by the total number of elements (using len(sample_data)
).squared_deviations = [(x - sample_mean) ** 2 for x in sample_data]
iterates through each element x
in sample_data
. It subtracts the sample_mean
from each element, squares the result using the exponent **2
, and creates a new list containing the squared deviations from the mean for all elements.sample_variance
is calculated by summing the squared deviations in squared_deviations
and dividing by the total number of elements in the sample minus one (len(sample_data) - 1
). This adjustment (n-1) is used to obtain an unbiased estimate of the population variance.sample_mean
and sample_variance
are printed.Sample Mean: 80.0
Sample Variance: 119.5
We hope you found the information helpful! If you learned something valuable, consider sharing it with your friends, family, and social networks.
Reference: Variance | Brilliant Math & Science Wiki
Hi, I am Vishal Jaiswal, I have about a decade of experience of working in MNCs like Genpact, Savista, Ingenious. Currently i am working in EXL as a senior quality analyst. Using my writing skills i want to share the experience i have gained and help as many as i can.
Pivot tables are a powerful tool for summarizing and analyzing data, and Python’s Pandas library…
Welcome to Section 3 of our Data Science Interview Questions series! In this part, we…
Welcome back to our Data Science Interview Questions series! In the first section, we explored…
Data Science Questions in Section 1 focus on the essential concepts of Data Visualization and…
In this article, we’ve compiled 30 carefully selected multiple choice questions (MCQs) with answers to…
Welcome to Day 15 of our Python for Data Science journey!On Day 15, we dived…