Just as individuals may differ from one another in phenotype because they have different genotypes, because they developed in different environments, or both, relatives may resemble one another more than they resemble other members of the population because they have similar genotypes, because they developed in similar environments, or both. In an experimental situation, we typically try to randomize individuals across environments. If we are successful, then any tendency for relatives to resemble one another more than non-relatives must be due to similarities in their genotypes.

Using this insight, we can develop a statistical technique that
allows us to determine how much of the variance among individuals in
phenotype is a result of genetic variance and how much is due to
environmental variance. *Remember*, we can only ask
about how much of the variability is due to genetic differences, and we
can only do so *in a particular environment* and
*with a particular set of genotypes*, and we can
only do it when we *randomize genotypes across
environments*.

The basic approach to the analysis is either to use a linear regression of offspring phenotype on parental phenotype, which as we’ll see estimates \(h^2_n\), or to use a nested analysis of variance. One of the most complete designs is a full-sib, half-sib design in which each male sires offspring from several dams but each dam mates with only one sire.

The offspring of a single dam are full-sibs (they are nested within
dams). Differences among the offspring of dams indicates that there are
differences in maternal “genotype” in the trait being measured.^{1}

The offspring of different dams mated to a single sire are half-sibs.
Differences among the offspring of sires indicates that there are
differences in paternal “genotype” in the trait being measured.^{2}

As we’ll see, this design has the advantage that it allows both additive and dominance components of the genetic variance to be estimated. It has the additional advantage that we don’t have to assume that the distribution of environments in the offspring generation is the same as it was in the parental generation. To use the regression approach to estimate heritability, we have to assume that the distribution of environmental effects is the same in parental and offspring generations.

OK, so I’ve given you the basic idea. Where does it come from, and
how does it work? Funny you should ask. The whole approach is based on
calculations of the degree to which different relatives resemble one
another. For these purposes we’re going to continue our focus on
phenotypes influenced by one locus with two alleles, and we’ll do the
calculations in detail only for half sib families. We start with
something that may look vaguely familiar.^{3}
Take a look at Table 1.

Maternal | Offspring genotype | |||

genotype | Frequency | \(A_1A_1\) | \(A_1A_2\) | \(A_2A_2\) |

\(A_1A_1\) | \(p^2\) | \(p\) | q | 0 |

\(A_1A_2\) | \(2pq\) | \(\frac{p}{2}\) | \(\frac{1}{2}\) | \(\frac{q}{2}\) |

\(A_2A_2\) | \(q^2\) | 0 | p | q |

Note that the probabilities in Table 1 are
appropriate *only* if the progeny are from half-sib
families. If the progeny are from full-sib families, we must specify the
frequency of each of the nine possible matings (keeping track of the
genotype of both mother and father) and the offspring that each will
produce.^{4}

Let \(p_{xy}\) be the probability
that random variable \(X\) takes the
value \(x\) and random variable \(Y\) takes the value \(y\). Then the covariance between \(X\) and \(Y\) is: \[\mbox{Cov}(X,Y) = \sum p_{xy}(x - \mu_x)(y -
\mu_y) \quad ,\] where \(\mu_x\)
is the mean of \(X\) and \(\mu_y\) is the mean of \(Y\). The covariance between two random
variables is a measure of how much they vary togethercovary. If the
covariance is large and positive, they tend to vary in the same way.
Positive deviations from the mean in one are associated with positive
deviations from the mean in the other, and negative deviations are
similarly associated. If the covariance is large and negative, they tend
to vary in opposite ways. Positive deviations from the mean in one
variable are associated with *negative* deviations
in the other, and vice versa. If the covariance is small, it means there
isn’t a strong tendency for deviations from the mean in one variable to
be associated with deviations in the other.

Here’s how we can calculate the covariance between half-siblings:
First, imagine selecting huge number of half-sibs pairs at random. The
phenotype of the first half-sib in the pair is a random variable (call
it \(S_1\)), as is the phenotype of the
second (call it \(S_2\)). The mean of
\(S_1\) is just the mean phenotype in
*all* the progeny taken together, \(\bar x\). Similarly, the mean of \(S_2\) is just \(\bar x\).^{5} Now
with one locus, two alleles we have three possible phenotypes: \(x_{11}\) (corresponding to the genotype
\(A_1A_1\)), \(x_{12}\) (corresponding to the genotype
\(A_1A_2\)), and \(x_{22}\) (corresponding to the genotype
\(A_2A_2\)). So all we need to do to
calculate the covariance between half-sibs is to write down all possible
pairs of phenotypes and the frequency with which they will occur in our
sample of randomly chosen half-sibs based on the frequencies in Table 1 above and the frequency of
maternal genotypes. It’s actually a bit easier to keep track of it all
if we write down the frequency of each maternal genotype and the
frequency with which each possible phenotypic combination will occur in
her progeny. \[\begin{aligned}
\mbox{Cov}(S_1,S_2) &=& p^2[p^2(x_{11} - {\bar x})^2 +
2pq(x_{11} - {\bar x})
(x_{12} - {\bar x})
+ q^2(x_{12} - {\bar x})^2]
\\
&&+ 2pq[{1 \over 4}p^2(x_{11} - {\bar x})^2
+ {1 \over 2}p(x_{11} - {\bar x})(x_{12} - {\bar x})
+ {1 \over 2}pq(x_{11} - {\bar x})(x_{22} - {\bar x})
\\
&&\ \ + {1 \over 4}(x_{12} - {\bar x})^2
+ {1 \over 2}q(x_{12} - {\bar x})(x_{22} - {\bar x})
+ {1 \over 4}q^2(x_{22} - {\bar x})^2] \\
&&+ q^2[p^2(x_{12} - {\bar x})^2 + 2pq(x_{12} -
{\bar
x})(x_{22} - {\bar x})
+ q^2(x_{22} - {\bar
x})^2] \\
&=&\ p^2[p(x_{11} - {\bar x}) + q(x_{12} - {\bar x})]^2 \\
&&+ 2pq[{1 \over 2}p(x_{11} - {\bar x}) +
{1 \over 2}q(x_{12} - {\bar x}) +
{1 \over 2}p(x_{12} - {\bar x}) +
{1 \over 2}q(x_{22} - {\bar x})]^2 \\
&&+ q^2[p(x_{12} - {\bar x}) + q(x_{22} - {\bar x})]^2 \\
&=&\ p^2[px_{11} + qx_{12} - {\bar x}]^2 \\
&&+ 2pq\left[{1 \over 2}(px_{11} + qx_{12} - {\bar x}) +
{1 \over 2}(px_{12} + qx_{22} - {\bar x})\right]^2 \\
&&+ q^2[px_{12} + qx_{22} - {\bar x}]^2 \\
&=&\ p^2\left[\alpha_1 - {{\bar x} \over 2}\right]^2
+ 2pq\left[{1 \over 2}(\alpha_1 - {{\bar x} \over 2}) +
{1 \over 2}(\alpha_2 - {{\bar x} \over 2})\right]^2
+ q^2\left[\alpha_2 - {{\bar x} \over 2}\right]^2 \\
&=&\ p^2\left[{1 \over 2}(2\alpha_1 - {\bar x})\right]^2
+ 2pq\left[{1 \over 2}(\alpha_1 + \alpha_2 - {\bar x})\right]^2
+ q^2\left[{1 \over 2}(2\alpha_2 - {\bar x})\right]^2 \\
&=& \left({1 \over 4}\right)
\left[p^2(2\alpha_1 - {\bar x})^2
+ 2pq[(\alpha_1+\alpha_2 - {\bar x})]^2
+ q^2(2\alpha_2 - {\bar x})^2\right] \\
&=& \left({1 \over 4}\right)V_a
\end{aligned}\]

Now we’ll return to an example we saw earlier (Table 2). This set of genotypes and phenotypes may look familiar. It is the same one we encountered earlier when we calculated additive and dominance components of variance. Let’s assume that \(p = 0.4\). We know from our earlier calculations that \[\begin{aligned} \bar x &=& 54.4 \\ V_a &=& 1505.28 \\ V_d &=& 207.36 \quad . \end{aligned}\] We can also calculate the numerical version of Table 1, which you’ll find in Table 3.

Genotype | \(A_1A_1\) | \(A_1A_2\) | \(A_2A_2\) |

Phenotype | 100 | 80 | 0 |

Maternal | Offspring genotype | |||

genotype | Frequency | \(A_1A_1\) | \(A_1A_2\) | \(A_2A_2\) |

\(A_1A_1\) | 0.16 | 0.4 | 0.6 | 0.0 |

\(A_1A_2\) | 0.48 | 0.2 | 0.5 | 0.3 |

\(A_2A_2\) | 0.36 | 0.0 | 0.4 | 0.6 |

So now we can follow the same approach we did before and calculate the numerical value of the covariance between half-sibs in this example: \[\begin{aligned} \mbox{Cov}(S_1,S_2) &=&\ [(0.4)^2(0.16) + (0.2)^2(0.48)](100 - 54.4)^2 \\ && + [(0.6)^2(0.16) + (0.5)^2(0.48) + (0.4)^2(0.36)] (80 - 54.4)^2 \\ && + [(0.3)^2(0.48) + (0.6)^2(0.36)](0 - 54.4)^2 \\ && + 2[(0.4)(0.6)(0.16) + (0.2)(0.5)(0.48)](100 - 54.4)(80 - 54.4) \\ && + 2(0.2)(0.3)(0.48)(100 - 54.4)(0 - 54.4) \\ && + 2[(0.5)(0.3)(0.48) + (0.4)(0.6)(0.36)](80 - 54.4)(0 - 54.4) \\ &=&\ 376.32 \\ &=&\ \left({1 \over 4}\right)1505.28 \quad . \end{aligned}\]

Well, if we can do this sort of calculation for half-sibs, you can probably guess that it’s also possible to do it for other relatives. I won’t go through all of the calculations, but the results for common forms of relationship are summarized in Table 4

MZ twins (\(\mbox{Cov}_{MZ}\)) | \(V_a + V_d\) |

Parent-offspring (\(\mbox{Cov}_{PO}\))\(^1\) | \(\left(\frac{1}{2}\right)V_a\) |

Full sibs (\(\mbox{Cov}_{FS}\)) | \(\left(\frac{1}{2}\right)V_a + \left(\frac{1}{4}\right)V_d\) |

Half sibs (\(\mbox{Cov}_{HS}\)) | \(\left(\frac{1}{4}\right)V_a\) |

\(^1\)One parent or mid-parent. |

Galton introduced the term *regression* to
describe the inheritance of height in humans. He noted that there is a
tendency for adult offspring of tall parents to be tall and of short
parents to be short, but he also noted that offspring tended to be less
extreme than the parents.^{6} He described this as a “regression
to mediocrity,” and statisticians adopted the term to describe a
standard technique for describing the functional relationship between
two variables.

Measure the parents. Regress the offspring phenotype on: (1) the phenotype of one parent or (2) the mean of the parental phenotypes. In either case, the covariance between the parental phenotype and the offspring genotype is \(\left({1 \over 2}\right)V_a\). Now the regression coefficient between one parent and offspring, \(b_{P \rightarrow O}\), is \[\begin{aligned} b_{P \rightarrow O} &=& \frac{\mbox{Cov}_{PO}}{\mbox{Var}(P)} \\ &=& {\left({1 \over 2}\right)V_a \over V_p} \\ &=& \left({1 \over 2}\right)h^2_N \quad . \end{aligned}\] In short, the slope of the regression line is equal to one-half the narrow sense heritability. In the regression of offspring on mid-parent value, \[\begin{aligned} \mbox{Var}(MP) &=& \mbox{Var}\left(\frac{M+F}{2}\right) \\ &=& \frac{1}{4} \mbox{Var}(M+F) \\ &=& \frac{1}{4} \left(Var(M) + Var(F)\right) \\ &=& \frac{1}{4} \left(2V_p\right) \\ &=& \frac{1}{2} V_p \quad . \end{aligned}\] Thus, \(b_{MP \rightarrow O} = \frac{1}{2}V_a/\frac{1}{2}V_p = h^2_N\). In short, the slope of the regression line is equal to the narrow sense heritability.

Mate a number of males (sires) with a number of females (dams). Each sire is mated to more than one dam, but each dam mates only with one sire. Do an analysis of variance on the phenotype in the progeny, treating sire and dam as main effects. The result is shown in Table 5.

Composition of | |||

Source | d.f. | Mean square | mean square |

Among sires | \(s-1\) | \(MS_S\) | \(\sigma^2_W + k\sigma^2_D + dk\sigma^2_s\) |

Among dams | \(s(d-1)\) | \(MS_D\) | \(\sigma^2_W + k\sigma^2_D\) |

1em (within sires) | |||

Within progenies | \(sd(k-1)\) | \(MS_W\) | \(\sigma^2_W\) |

\(s = \hbox{number of sires}\) | |||

\(d = \hbox{number of dams per sire}\) | |||

\(k = \hbox{number of offspring per dam}\) |

Now we need some way to relate the variance components (\(\sigma^2_W\), \(\sigma^2_D\), and \(\sigma^2_S\)) to \(V_a\), \(V_d\), and \(V_e\).^{7} How do we do that? Well,
\[V_p = \sigma^2_T = \sigma^2_S + \sigma^2_D
+ \sigma^2_W \quad .\] \(\sigma^2_S\) estimates the variance among
the means of the half-sib families fathered by each of the different
sires or, equivalently, the covariance among half-sibs.^{8}
\[\begin{aligned}
\sigma^2_S &=& \mbox{Cov}_{HS} \\
&=& \left(\frac{1}{4}\right)V_a \quad .
\end{aligned}\] Now consider the within progeny component of the
variance, \(\sigma^2_W\). In general,
it can be shown that *any* among group variance
component is equal to the covariance among the members within the
groups.^{9} Thus, a within group component of
the variance is equal to the total variance minus the covariance within
groups. In this case, \[\begin{aligned}
\sigma^2_W &=& V_p - \mbox{Cov}_{FS} \\
&=& V_a + V_d + V_e - \left[\left(\frac{1}{2}\right)V_a +
\left(\frac{1}{4}\right)V_d
\right] \\
&=& \left(\frac{1}{2}\right)V_a
+ \left({3 \over 4}\right)V_d
+ V_e \quad .
\end{aligned}\] There remains only \(\sigma^2_D\). Now \(\sigma^2_W = V_p - Cov_{FS}\), \(\sigma^2_S = Cov_{HS}\), and \(\sigma^2_T = V_p\). Thus, \[\begin{aligned}
\sigma^2_D &=& \sigma^2_T - \sigma^2_S - \sigma^2_W \\
&=& V_p - \mbox{Cov}_{HS} - (V_p - \mbox{Cov}_{FS})
\\
&=& \mbox{Cov}_{FS} - \mbox{Cov}_{HS} \\
&=& \left[
\left(\frac{1}{2}\right)V_a + \left(\frac{1}{4}\right)V_d
\right]
- \left(\frac{1}{4}\right)V_a \\
&=& \left(\frac{1}{4}\right)V_a +
\left(\frac{1}{4}\right)V_d \quad .
\end{aligned}\] So if we rearrange these equations, we can
express the genetic components of the phenotypic variance, the
*causal* components of variance, as simple functions
of the *observational* components of variance: \[\begin{aligned}
V_a &=& 4\sigma^2_S \\
V_d &=& 4(\sigma^2_D - \sigma^2_S) \\
V_e &=& \sigma^2_W - 3\sigma^2_D + \sigma^2_S \quad .
\end{aligned}\] Furthermore, the narrow-sense heritability is
given by \[h^2_N =
\frac{4\sigma^2_s}{\sigma^2_S + \sigma^2_D + \sigma^2_W} \quad
.\]

The analysis involves 719 offspring from 74 sires and 192 dams, each
with one litter. The offspring were spread over 4 generations, and the
analysis is performed as a nested ANOVA with the genetic analysis nested
*within* generations. An additional complication is
that the design was unbalanced, i.e., unequal numbers of progeny were
measured in each sibship. As a result the degrees of freedom don’t work
out to be quite as simple as what I showed you.^{10}
The results are summarized in Table 6.

Composition of | |||

Source | d.f. | Mean square | mean square |

Among sires | 70 | 17.10 | \(\sigma^2_W + k'\sigma^2_D + dk'\sigma^2_s\) |

Among dams | 118 | 10.79 | \(\sigma^2_W + k\sigma^2_D\) |

1em (within sires) | |||

Within progenies | 527 | 2.19 | \(\sigma^2_W\) |

\(d = 2.33\) | |||

\(k = 3.48\) | |||

\(k' = 4.16\) |

Using the expressions for the composition of the mean square we obtain \[\begin{aligned} \sigma^2_W &=& MS_W \\ &=& 2.19 \\ \sigma^2_D &=& \left({1 \over k}\right)(MS_D - \sigma^2_W) \\ &=& 2.47 \\ \sigma^2_S &=& \left({1 \over dk'}\right)(MS_S - \sigma^2_W - k'\sigma^2_D) \\ &=& 0.48 \quad . \end{aligned}\] Thus, \[\begin{aligned} V_p &=& 5.14 \\ V_a &=& 1.92 \\ V_d + V_e &=& 3.22 \\ V_d &=& (0.00\hbox{---}1.64) \\ V_e &=& (1.58\hbox{---}3.22) \\ \end{aligned}\]

Why didn’t I give a definite number for \(V_d\) after my big spiel above about how we
can estimate it from a full-sib crossing design? Two reasons. First, if
you plug the estimates for \(\sigma^2_D\) and \(\sigma^2_S\) into the formula above for
\(V_d\) you get \(V_d = 7.96, V_e
= -4.74\), which is clearly impossible since \(V_d\) has to be less than \(V_p\) and \(V_e\) has to be greater than zero. It’s a
variance. Second, the experimental design confounds two sources of
resemblance among full siblings: (1) genetic covariance and (2)
environmental covariance. The full-sib families were all raised by the
same mother in the same pen. Hence, we don’t know to what extent their
resemblance is due to a common natal environment.^{11}
If we assume \(V_d = 0\), we can
estimate the amount of variance accounted for by exposure to a common
natal environment, \(V_{Ec} =
1.99\), and by environmental variation within sibships, \(V_{Ew} =
1.23\).^{12} Similarly, if we assume \(V_{Ew} = 0\), then \(V_d = 1.64\) and \(V_{Ec} = 1.58\). In any case, we can
estimate the narrow sense heritability as \[\begin{aligned}
h^2_N &=& \left({1.92 \over 5.14}\right) \\
&=& 0.37 \quad .
\end{aligned}\]

These notes are licensed under the Creative Commons Attribution License. To view a copy of this license, visit or send a letter to Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA.