Note: The actual exam reference sheet does not label individual formulas in this section — they appear without names.
| Sample Mean | \(\displaystyle\bar{x}=\frac{\displaystyle\sum x_i}{n}\) | ▼ |
What it means
Add up all the data values and divide by how many there are. The mean is the balancing point of the distribution. It is sensitive to outliers — a single extreme value can pull it far from the center of the bulk of the data. Worked Example
Scores: 72, 85, 90, 88, 75 (\(n=5\))\(\bar{x}=\dfrac{72+85+90+88+75}{5}=\dfrac{410}{5}=\) 82 | ||
| Sample Standard Deviation | \(\displaystyle s_x=\sqrt{\frac{\displaystyle\sum(x_i-\bar{x})^2}{n-1}}\) | ▼ |
What it means
Measures how spread out the data are around \(\bar{x}\). We divide by \(n-1\) rather than \(n\) to produce an unbiased estimate of the population standard deviation \(\sigma\). Larger \(s_x\) means more variability in the data. Worked Example
Values: 2, 4, 4, 4, 5, 5, 7, 9 (\(n=8,\;\bar{x}=5\))\(\sum(x_i-\bar{x})^2 = 9{+}1{+}1{+}1{+}0{+}0{+}4{+}16 = 32\) \(s_x = \sqrt{32/7}\approx\) 2.14 | ||
| Least-Squares Regression Line | \(\hat{y} = a + bx\) | ▼ |
What it means
\(\hat{y}\) ("y-hat") is the predicted response for a given \(x\). The line minimizes the sum of squared vertical residuals \(\sum(y_i-\hat{y}_i)^2\). \(b\) is the slope and \(a\) is the \(y\)-intercept. Use \(\hat{y}\) only for interpolation within the range of the data. Worked Example
Predicting test score from hours studied: \(\hat{y}=50+8x\)For \(x=4\) hours: \(\hat{y}=50+8(4)=\) 82 points | ||
| Slope of the LSRL | \(\displaystyle b=r\frac{s_y}{s_x}\) | ▼ |
What it means
The slope is the correlation coefficient scaled by the ratio of the standard deviations. For every 1-unit increase in \(x\), \(\hat{y}\) changes by \(b\) units. The sign of \(b\) always matches the sign of \(r\). Worked Example
\(r=0.85,\;s_y=10,\;s_x=4\)\(b=0.85\times\tfrac{10}{4}=0.85\times 2.5=\) 2.125 Each extra unit of \(x\) predicts +2.125 units of \(\hat{y}\). | ||
| Correlation Coefficient | \(\displaystyle r=\frac{1}{n-1}\sum\!\left(\frac{x_i-\bar{x}}{s_x}\right)\!\!\left(\frac{y_i-\bar{y}}{s_y}\right)\) | ▼ |
What it means
Measures the strength and direction of a linear association. Always \(-1\le r\le 1\). Values near \(\pm 1\) are strong; near 0 are weak. \(r\) only measures linear association — a perfectly curved relationship can have \(r\approx 0\). Worked Example (conceptual)
When tall people also tend to be heavier, both \(z\)-scores are positive together → their products are positive → \(r\) is positive. A calculator gives the exact value. Example result: \(r=+0.87\) (strong positive linear association).
| ||
| \(P(A\cup B)=P(A)+P(B)-P(A\cap B)\) | ▼ |
What it means
Probability that \(A\) or \(B\) (or both) occur. We subtract the overlap \(P(A\cap B)\) to avoid double-counting it. If \(A\) and \(B\) are mutually exclusive, \(P(A\cap B)=0\) and the rule simplifies to \(P(A)+P(B)\). Worked Example
Standard deck: \(P(\heartsuit)=\tfrac{13}{52}\), \(P(\text{face})=\tfrac{12}{52}\), \(P(\heartsuit\cap\text{face})=\tfrac{3}{52}\)\(P(\heartsuit\cup\text{face})=\tfrac{13+12-3}{52}=\tfrac{22}{52}\approx\) 0.423 | |
| \(\displaystyle P(A\mid B)=\frac{P(A\cap B)}{P(B)}\) | ▼ |
What it means
The probability of \(A\) given that \(B\) has occurred. We restrict the sample space to outcomes where \(B\) happened, then find what fraction also include \(A\). Events are independent when \(P(A\mid B)=P(A)\). Worked Example
\(P(\text{studied})=0.90\), \(P(\text{pass}\cap\text{studied})=0.63\)\(P(\text{pass}\mid\text{studied})=0.63/0.90=\) 0.70 | |
| Probability Distribution | Mean | Standard Deviation | |
|---|---|---|---|
|
Discrete random variable, \(X\) \(\mu_X=E(X)={\displaystyle\sum} x_i\cdot P(x_i)\) |
\(\mu_X\) | \(\sigma_X=\sqrt{{\displaystyle\sum}(x_i-\mu_X)^2\cdot P(x_i)}\) | ▼ |
Mean — what it means
The expected value: a probability-weighted average of all possible outcomes. Multiply each value by its probability and sum. Think of it as the long-run average if the random process were repeated many, many times. Worked Example — Fair Die
\(X=\) face value; each \(P(x)=\tfrac{1}{6}\)\(\mu_X=\tfrac{1}{6}(1{+}2{+}3{+}4{+}5{+}6)=\tfrac{21}{6}=\) 3.5 \(\sigma_X=\sqrt{\tfrac{1}{6}[(1{-}3.5)^2+\cdots+(6{-}3.5)^2]}\approx\) 1.71 | |||
|
If \(X\) has a binomial distribution with parameters \(n\) and \(p\), then: \(P(X=x)=\dbinom{n}{x}p^x(1-p)^{n-x}\) \(x=0,1,2,\ldots,n\) |
\(np\) | \(\sqrt{np(1-p)}\) | ▼ |
What it means
Use when you have \(n\) independent trials, each with probability \(p\) of success, and you want \(P(\text{exactly }x\text{ successes})\). \(\binom{n}{x}\) counts the arrangements; \(p^x(1-p)^{n-x}\) is the probability of each arrangement. Conditions: Binary, Independent, Number fixed, Same \(p\). Worked Example
5 T/F questions; random guess (\(p=0.5\)). \(P(X=3)\)?\(\binom{5}{3}(0.5)^3(0.5)^2=10\times 0.125\times 0.25=\) 0.3125 \(\mu=5(0.5)=2.5\); \(\sigma=\sqrt{5(.5)(.5)}\approx 1.12\) | |||
|
If \(X\) has a geometric distribution with parameter \(p\), then: \(P(X=x)=(1-p)^{x-1}p\) \(x=1,2,3,\ldots\) |
\(\dfrac{1}{p}\) | \(\dfrac{\sqrt{1-p}}{p}\) | ▼ |
What it means
Use when you repeat independent trials until the first success. \(X\) = the trial number of the first success. There is no fixed \(n\) — you stop when you succeed. \((1-p)^{x-1}\) is the probability of \(x-1\) failures first. Worked Example
Free-throw shooter, \(p=0.70\). \(P(\text{first make on 3rd attempt})\)?\(P(X=3)=(0.30)^2(0.70)=0.09\times 0.70=\) 0.063 On average: \(\mu=1/0.70\approx\) 1.43 attempts | |||
| Standardized Test Statistic | \(\displaystyle\text{test statistic}=\frac{\text{statistic}-\text{parameter}}{\text{standard error of the statistic}}\) | ▼ |
What it means
The skeleton of every significance test. Measures how many standard errors the sample statistic lies from the hypothesized parameter value. A large absolute value means the data are unlikely under \(H_0\), providing evidence against it. Worked Example
\(H_0:\mu=100\). Sample: \(\bar{x}=107\), \(\text{SE}=3.5\)\(t=(107-100)/3.5=7/3.5=\) 2.0 The sample mean is 2 SEs above \(\mu_0=100\). | ||
| Confidence Interval | \(\text{statistic}\pm(\text{critical value})(\text{SE of statistic})\) | ▼ |
What it means
An interval of plausible values for the true parameter. The critical value (\(z^*\) or \(t^*\)) controls the confidence level. Common \(z^*\) values: 1.645 (90%), 1.960 (95%), 2.576 (99%). The margin of error is \(\text{critical value}\times\text{SE}\). Worked Example — 95% CI for \(\mu\)
\(\bar{x}=52,\;\text{SE}=2.4,\;z^*=1.960\)\(52\pm 1.960(2.4)=52\pm 4.70=\) (47.30, 56.70) We are 95% confident the true mean is in this interval. | ||
| Chi-Square Statistic | \(\displaystyle\chi^2=\sum\frac{(\text{observed}-\text{expected})^2}{\text{expected}}\) | ▼ |
What it means
Measures how far observed counts deviate from expected counts under \(H_0\). Each cell contributes one term; larger \(\chi^2\) = more evidence against \(H_0\). Used for goodness-of-fit and tests of independence/homogeneity. Always \(\chi^2\ge 0\); the test is always right-tailed. Worked Example
Fair die, 60 rolls. Expected: 10 per face. Observed: 8,12,9,11,7,13\(\chi^2=\tfrac{4}{10}+\tfrac{4}{10}+\tfrac{1}{10}+\tfrac{1}{10}+\tfrac{9}{10}+\tfrac{9}{10}=\) 2.8 (df=5; fail to reject \(H_0\)) | ||
| Random Variable | Parameters of Sampling Distribution | Standard Error* of Sample Statistic | |
|---|---|---|---|
| One population: \(\hat{p}\) |
\(\mu_{\hat{p}}=p\) \(\sigma_{\hat{p}}=\sqrt{\dfrac{p(1-p)}{n}}\) |
\(s_{\hat{p}}=\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}\) | ▼ |
What it means
For a significance test, use \(\sigma_{\hat{p}}\) with \(p_0\) from \(H_0\) (you're assuming \(p\) is known). For a confidence interval, use \(s_{\hat{p}}\) with \(\hat{p}\) in place of \(p\) (you don't know \(p\), so you estimate it). Worked Example — 95% CI
\(n=400,\;\hat{p}=0.62\)\(s_{\hat{p}}=\sqrt{0.62(0.38)/400}\approx 0.0242\) \(0.62\pm 1.96(0.0242)=\) (0.572, 0.668) | |||
| Two populations: \(\hat{p}_1-\hat{p}_2\) |
\(\mu=p_1-p_2\) \(\sigma=\sqrt{\dfrac{p_1(1-p_1)}{n_1}+\dfrac{p_2(1-p_2)}{n_2}}\) |
CI: \(\sqrt{\dfrac{\hat{p}_1(1-\hat{p}_1)}{n_1}+\dfrac{\hat{p}_2(1-\hat{p}_2)}{n_2}}\) Test (\(p_1=p_2\) assumed): \(\sqrt{\hat{p}_c(1-\hat{p}_c)\!\left(\tfrac{1}{n_1}+\tfrac{1}{n_2}\right)}\) where \(\hat{p}_c=\dfrac{X_1+X_2}{n_1+n_2}\) |
▼ |
What it means
CI: use each sample's \(\hat{p}\) separately — you are not assuming equal proportions. Worked Example — Pooled Test
\(X_1=30,\,n_1=100;\;X_2=45,\,n_2=150\)\(\hat{p}_c=75/250=0.30\) \(\text{SE}=\sqrt{0.30(0.70)(1/100+1/150)}\approx\) 0.056 | |||
| Random Variable | Parameters of Sampling Distribution | Standard Error* of Sample Statistic | |
|---|---|---|---|
| One population: \(\bar{x}\) |
\(\mu_{\bar{x}}=\mu\) \(\sigma_{\bar{x}}=\dfrac{\sigma}{\sqrt{n}}\) |
\(s_{\bar{x}}=\dfrac{s}{\sqrt{n}}\) | ▼ |
What it means
The sampling distribution of \(\bar{x}\) is centered at \(\mu\) with SE \(=s/\sqrt{n}\). Larger \(n\) → smaller SE → more precise estimate. The Central Limit Theorem guarantees \(\bar{x}\) is approximately Normal for large \(n\), regardless of population shape. Worked Example — One-Sample \(t\)-test
\(n=25,\;\bar{x}=78,\;s=12\)\(\text{SE}=12/\sqrt{25}=12/5=2.4\) Test \(H_0:\mu=75\): \(t=(78-75)/2.4=\) 1.25 (df=24) | |||
| Two populations: \(\bar{x}_1-\bar{x}_2\) |
\(\mu=\mu_1-\mu_2\) \(\sigma=\sqrt{\dfrac{\sigma_1^2}{n_1}+\dfrac{\sigma_2^2}{n_2}}\) |
\(s_{\bar{x}_1-\bar{x}_2}=\sqrt{\dfrac{s_1^2}{n_1}+\dfrac{s_2^2}{n_2}}\) | ▼ |
What it means
Combines variability from two independent samples. Because the samples are independent, variances add: \(s_1^2/n_1+s_2^2/n_2\). Take the square root for the SE. Used in two-sample \(t\)-tests and confidence intervals for \(\mu_1-\mu_2\). Worked Example
A: \(\bar{x}_1=84,\,s_1=10,\,n_1=30\) | B: \(\bar{x}_2=79,\,s_2=8,\,n_2=40\)\(\text{SE}=\sqrt{100/30+64/40}=\sqrt{4.93}\approx 2.22\) \(t=(84-79)/2.22\approx\) 2.25 | |||
| Random Variable | Parameters of Sampling Distribution | Standard Error* of Sample Statistic | |
|---|---|---|---|
|
For slope: \(b\) |
\(\mu_b=\beta\) \(\sigma_b=\dfrac{\sigma}{s_x\sqrt{n}}\) |
\(s_b=\dfrac{s}{s_x\sqrt{n}}\) where \(\;s=\sqrt{\dfrac{\displaystyle\sum(y_i-\hat{y}_i)^2}{n-2}}\) and \(\;s_x=\sqrt{\dfrac{\displaystyle\sum(x_i-\bar{x})^2}{n-1}}\) |
▼ |
What it means
\(s\) is the residual standard error — how much the \(y\)-values scatter around the regression line. More spread in \(x\) (larger \(s_x\)) → more precise slope estimate → smaller \(s_b\). Test \(H_0:\beta=0\) using \(t=b/s_b\) with df \(=n-2\). Worked Example
\(n=20,\;b=2.5,\;s=4.1,\;s_x=3.0\)\(s_b=4.1/(3.0\sqrt{20})=4.1/13.42\approx 0.306\) \(t=2.5/0.306\approx\) 8.17 (df=18; strong evidence \(\beta\neq 0\)) | |||
*Standard deviation is a measurement of variability from the theoretical population. Standard error is the estimate of the standard deviation. If the standard deviation of the statistic is assumed to be known, then the standard deviation should be used instead of the standard error.
Values are exact for listed df rows. When your df falls between rows, the nearest row is highlighted and the result shown is an approximation — use a calculator for precision.
Values are exact for listed df rows. When your df falls between rows, the nearest row is highlighted and the result shown is an approximation — use a calculator for precision.