Variance

Average squared distance from the mean · 분산

1. What Is Variance?

분산이란?

Variance measures how spread out a data set is around its mean. A small variance means the values huddle close to the average; a large variance means they scatter widely. Two data sets can share the same mean yet have completely different variances — one tightly clustered, the other dispersed.

Variance works by averaging the squared distances of each value from the mean. Squaring removes the sign (so positive and negative deviations don’t cancel) and is the basis for standard deviation, regression, and most statistical inference.

분산은 데이터가 평균을 중심으로 얼마나 퍼져 있는지를 측정합니다. 분산이 작으면 값들이 평균 가까이 모여 있고, 크면 넓게 흩어져 있습니다. 평균이 같아도 분산은 전혀 다를 수 있습니다. 분산은 각 값이 평균에서 떨어진 거리를 제곱해 평균 낸 값으로, 제곱은 부호를 없애(양·음 편차가 상쇄되지 않도록) 표준편차·회귀·추론의 토대가 됩니다.

2. The σ² Formula & a Worked Example

σ² 공식과 예제

The population variance is σ² = Σ(xᵢ − μ)² / n: find the mean, subtract it from each value, square, sum, and divide by n.

Worked example. Data set 2, 4, 4, 4, 5, 5, 7, 9 (n = 8):

Mean: μ = 40 / 8 = 5.
Squared deviations: (2−5)²=9, (4−5)²=1, 1, 1, (5−5)²=0, 0, (7−5)²=4, (9−5)²=16.
Sum of squares: 9 + 1 + 1 + 1 + 0 + 0 + 4 + 16 = 32.
Divide by n: 32 / 8 = 4.

So the variance is σ² = 4.

모분산은 σ² = Σ(xᵢ − μ)²/n입니다. 예: 2,4,4,4,5,5,7,9 (n=8) → 평균 5, 제곱편차 9·1·1·1·0·0·4·16, 합 32, 32/8 = 4. 따라서 분산은 4입니다.

3. Population vs Sample: n or n − 1?

모집단 vs 표본: n인가 n−1인가

When the data are an entire population, divide by n (above: σ² = 4). When the data are a sample used to estimate a larger population’s variance, divide by n − 1 instead — Bessel’s correction. For the same data: s² = 32 / 7 ≈ 4.57.

Dividing by n − 1 compensates for having used the sample mean (not the true mean), which slightly under-counts spread; it makes s² an unbiased estimator. Spreadsheets expose both: VAR.P (÷n) and VAR.S (÷n−1); NumPy uses ddof=1 for the sample version.

데이터가 전체 모집단이면 n으로 나누고(위에서 σ²=4), 표본으로 모분산을 추정할 때는 n−1로 나눕니다(베셀 보정). 같은 데이터의 표본분산은 s² = 32/7 ≈ 4.57. n−1로 나누면 표본평균을 쓴 데서 생기는 과소추정이 보정되어 불편추정량이 됩니다. 스프레드시트의 VAR.P(÷n)·VAR.S(÷n−1), NumPy의 ddof=1이 이에 해당합니다.

4. Key Properties & Common Mistakes

주요 성질과 흔한 실수

Never negative. Squared terms make σ² ≥ 0; it equals 0 only when every value is identical.
Squared units. Variance is in the square of the data’s units, which is why we often report the standard deviation σ = √(σ²).
n vs n−1. Using ÷n on a sample under-estimates spread; pick the divisor that matches your intent.
Outlier-sensitive. Squaring amplifies far-from-mean values, so a single outlier can dominate the variance.

제곱항이라 σ² ≥ 0이며, 모든 값이 같을 때만 0입니다. 단위가 제곱이라 보통 표준편차 σ = √(σ²)로 보고합니다. 표본에 ÷n을 쓰면 산포를 과소추정하므로 의도에 맞는 분모를 고르세요. 제곱이 멀리 떨어진 값을 키우므로 이상치 하나가 분산을 좌우할 수 있습니다.

5. Frequently Asked Questions

자주 묻는 질문

What is the formula for variance? Population variance is σ² = Σ(xᵢ − μ)² / n: the average of the squared deviations from the mean.

Why divide by n − 1 for a sample? Bessel’s correction offsets the bias from using the sample mean in place of the unknown true mean, giving an unbiased estimate.

Why square the deviations? Squaring removes signs so deviations don’t cancel, emphasizes large deviations, and yields the differentiable form that powers regression and inference.

모분산은 σ² = Σ(xᵢ − μ)²/n으로 평균에서의 제곱편차의 평균입니다. 표본에서는 표본평균 사용에서 오는 편향을 없애기 위해 n−1로 나눕니다(베셀 보정). 편차를 제곱하는 이유는 부호 상쇄를 막고 큰 편차를 강조하며 미분 가능한 형태를 얻기 위함입니다.

Want a quick result? The Standard Deviation Calculator reports population and sample variance (and SD) for any data set.

Ready to practice? Drill variance and more on C:Stat, or review the full statistics reference and related pages: standard deviation, mean, and median.

빠른 계산은 표준편차 계산기로 모집단·표본 분산(및 표준편차)을 구할 수 있습니다. 실전 연습은 C:Stat에서, 전체 개념은 통계 레퍼런스와 표준편차, 평균, 중앙값 문서에서 이어집니다.

Practice statistics problems → C:Stat