Menu
×
   ❮   
HTML CSS JAVASCRIPT SQL PYTHON JAVA PHP HOW TO W3.CSS C C++ C# BOOTSTRAP REACT MYSQL JQUERY EXCEL XML DJANGO NUMPY PANDAS NODEJS R TYPESCRIPT ANGULAR GIT POSTGRESQL MONGODB ASP AI GO KOTLIN SASS VUE DSA GEN AI SCIPY AWS CYBERSECURITY DATA SCIENCE
     ❯   

Statistics - Estimating Population Proportions


A population proportion is the share of a population that belongs to a particular category.

Confidence intervals are used to estimate population proportions.


Estimating Population Proportions

A statistic from a sample is used to estimate a parameter of the population.

The most likely value for a parameter is the point estimate.

Additionally, we can calculate a lower bound and an upper bound for the estimated parameter.

The margin of error is the difference between the lower and upper bounds from the point estimate.

Together, the lower and upper bounds define a confidence interval.


Calculating a Confidence Interval

The following steps are used to calculate a confidence interval:

  1. Check the conditions
  2. Find the point estimate
  3. Decide the confidence level
  4. Calculate the margin of error
  5. Calculate the confidence interval

For example:

  • Population: Nobel Prize winners
  • Category: Born in the United States of America

We can take a sample and see how many of them were born in the US.

The sample data is used to make an estimation of the share of all the Nobel Prize winners born in the US.

By randomly selecting 30 Nobel Prize winners we could find that:

6 out of 30 Nobel Prize winners in the sample were born in the US

From this data we can calculate a confidence interval with the steps below.


1. Checking the Conditions

The conditions for calculating a confidence interval for a proportion are:

  • The sample is randomly selected
  • There is only two options:
    • Being in the category
    • Not being in the category
  • The sample needs at least:
    • 5 members in the category
    • 5 members not in the category

In our example, we randomly selected 6 people that were born in the US.

The rest were not born in the US, so there are 24 in the other category.

The conditions are fulfilled in this case.

Note: It is possible to calculate a confidence interval without having 5 of each category. But special adjustments need to be made.



2. Finding the Point Estimate

The point estimate is the sample proportion (\(\hat{p}\)).

The formula for calculating the sample proportion is the number of occurrences (\(x\)) divided by the sample size (\(n\)):

\(\displaystyle \hat{p} =\frac{x}{n}\)

In our example, 6 out of 30 were born in the US: \(x\) is 6, and \(n\) is 30.

So the point estimate for the proportion is:

\(\displaystyle \hat{p} = \frac{x}{n} = \frac{6}{30} = \underline{0.2} = 20\%\)

So 20% of the sample were born in the US.


3. Deciding the Confidence Level

The confidence level is expressed with a percentage or a decimal number.

For example, if the confidence level is 95% or 0.95:

The remaining probability (\(\alpha\)) is then: 5%, or 1 - 0.95 = 0.05.

Commonly used confidence levels are:

  • 90% with \(\alpha\) = 0.1
  • 95% with \(\alpha\) = 0.05
  • 99% with \(\alpha\) = 0.01

Note: A 95% confidence level means that if we take 100 different samples and make confidence intervals for each:

The true parameter will be inside the confidence interval 95 out of those 100 times.

We use the standard normal distribution to find the margin of error for the confidence interval.

The remaining probabilities (\(\alpha\)) are divided in two so that half is in each tail area of the distribution.

The values on the z-value axis that separate the tails area from the middle are called critical z-values.

Below are graphs of the standard normal distribution showing the tail areas (\(\alpha\)) for different confidence levels.

Standard Normal Distributions with two tail areas, with different sizes.


4. Calculating the Margin of Error

The margin of error is the difference between the point estimate and the lower and upper bounds.

The margin of error (\(E\)) for a proportion is calculated with a critical z-value and the standard error:

\(\displaystyle E = Z_{\alpha/2} \cdot \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \)

The critical z-value \(Z_{\alpha/2} \) is calculated from the standard normal distribution and the confidence level.

The standard error \(\sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \) is calculated from the point estimate (\(\hat{p}\)) and sample size (\(n\)).

In our example with 6 US-born Nobel Prize winners out of a sample of 30 the standard error is:

\(\displaystyle \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} = \sqrt{\frac{0.2(1-0.2)}{30}} = \sqrt{\frac{0.2 \cdot 0.8}{30}} = \sqrt{\frac{0.16}{30}} = \sqrt{0.00533..} \approx \underline{0.073}\)

If we choose 95% as the confidence level, the \(\alpha\) is 0.05.

So we need to find the critical z-value \(Z_{0.05/2} = Z_{0.025}\)

The critical z-value can be found using a Z-table or with a programming language function:

Example

With Python use the Scipy Stats library norm.ppf() function find the Z-value for an \(\alpha\)/2 = 0.025

import scipy.stats as stats
print(stats.norm.ppf(1-0.025))
Try it Yourself »

Example

With R use the built-in qnorm() function to find the Z-value for an \(\alpha\)/2 = 0.025

qnorm(1-0.025)
Try it Yourself »

Using either method we can find that the critical Z-value \( Z_{\alpha/2} \) is \(\approx \underline{1.96} \)

The standard error \(\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\) was \( \approx \underline{0.073}\)

So the margin of error (\(E\)) is:

\(\displaystyle E = Z_{\alpha/2} \cdot \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \approx 1.96 \cdot 0.073 = \underline{0.143}\)


5. Calculate the Confidence Interval

The lower and upper bounds of the confidence interval are found by subtracting and adding the margin of error (\(E\)) from the point estimate (\(\hat{p}\)).

In our example the point estimate was 0.2 and the margin of error was 0.143, then:

The lower bound is:

\(\hat{p} - E = 0.2 - 0.143 = \underline{0.057} \)

The upper bound is:

\(\hat{p} + E = 0.2 + 0.143 = \underline{0.343} \)

The confidence interval is:

\([0.057, 0.343]\) or \([5.7 \%, 34.4 \%]\)

And we can summarize the confidence interval by stating:

The 95% confidence interval for the proportion of Nobel Prize winners born in the US is between 5.7% and 34.4%


Calculating a Confidence Interval with Programming

A confidence interval can be calculated with many programming languages.

Using software and programming to calculate statistics is more common for bigger sets of data, as calculating manually becomes difficult.

Example

With Python, use the scipy and math libraries to calculate the confidence interval for an estimated proportion.

Here, the sample size is 30 and the occurrences is 6.

import scipy.stats as stats
import math

# Specify sample occurrences (x), sample size (n) and confidence level
x = 6
n = 30
confidence_level = 0.95

# Calculate the point estimate, alpha, the critical z-value, the standard error, and the margin of error
point_estimate = x/n
alpha = (1-confidence_level)
critical_z = stats.norm.ppf(1-alpha/2)
standard_error = math.sqrt((point_estimate*(1-point_estimate)/n))
margin_of_error = critical_z * standard_error

# Calculate the lower and upper bound of the confidence interval
lower_bound = point_estimate - margin_of_error
upper_bound = point_estimate + margin_of_error

# Print the results
print("Point Estimate: {:.3f}".format(point_estimate))
print("Critical Z-value: {:.3f}".format(critical_z))
print("Margin of Error: {:.3f}".format(margin_of_error))
print("Confidence Interval: [{:.3f},{:.3f}]".format(lower_bound,upper_bound))
print("The {:.1%} confidence interval for the population proportion is:".format(confidence_level))
print("between {:.3f} and {:.3f}".format(lower_bound,upper_bound))
Try it Yourself »

Example

With R, use the built-in math and statistics functions to calculate the confidence interval for an estimated proportion.

Here, the sample size is 30 and the occurrences is 6.

# Specify sample occurrences (x), sample size (n) and confidence level
x = 6
n = 30
confidence_level = 0.95

# Calculate the point estimate, alpha, the critical z-value, the standard error, and the margin of error
point_estimate = x/n
alpha = (1-confidence_level)
critical_z = qnorm(1-alpha/2)
standard_error = sqrt(point_estimate*(1-point_estimate)/n)
margin_of_error = critical_z * standard_error

# Calculate the lower and upper bound of the confidence interval
lower_bound = point_estimate - margin_of_error
upper_bound = point_estimate + margin_of_error

# Print the results
sprintf("Point Estimate: %0.3f", point_estimate)
sprintf("Critical Z-value: %0.3f", critical_z)
sprintf("Margin of Error: %0.3f", margin_of_error)
sprintf("Confidence Interval: [%0.3f,%0.3f]", lower_bound, upper_bound)
sprintf("The %0.1f%% confidence interval for the population proportion is:", confidence_level*100)
sprintf("between %0.4f and %0.4f", lower_bound, upper_bound)
Try it Yourself »

×

Contact Sales

If you want to use W3Schools services as an educational institution, team or enterprise, send us an e-mail:
[email protected]

Report Error

If you want to report an error, or if you want to make a suggestion, send us an e-mail:
[email protected]

W3Schools is optimized for learning and training. Examples might be simplified to improve reading and learning. Tutorials, references, and examples are constantly reviewed to avoid errors, but we cannot warrant full correctness of all content. While using W3Schools, you agree to have read and accepted our terms of use, cookie and privacy policy.

Copyright 1999-2024 by Refsnes Data. All Rights Reserved. W3Schools is Powered by W3.CSS.