Markov Chains

Definition 3.1 A stochastic matrix is a matrix with nonnegative entries and row sums of

. A doubly stochastic matrix is a matrix

such that

and $A^{tr}$ (transpose) are both stochastic. A stochastic matrix is ergodic if it (precisely, the digraph of its nonzero entries) is strongly connected and aperiodic.

$\begin{exercise} % latex2html id marker 882 An eigenvalue of a stochastic matrix has norm at most $1$. \end{exercise}$

$\begin{exercise} % latex2html id marker 884If a stochastic matrix is strongly ... ...da$\ is an eigenvalue, then so is $\omega\lambda$. \end{enumerate}\end{exercise}$

$\begin{exx} % latex2html id marker 887If a stochastic matrix is ergodic, then ... ...y less than $1$. (This is almost equivalent to the following theorem.) \end{exx}$

Theorem 3.2 (Perron-Frobenius) If

is an ergodic matrix, then $T^\infty=\lim_{k\to\infty}T^k$ exists.

Observation 3.3 Since $TT^\infty=T^\infty$ , we must have

for

a column of

, but since the geometric multiplicity is

, this eigenvector is a multiple of the all

s vector. Thus all rows are equal; they are the unique stationary distribution $\pi$ .

$\begin{exercise} % latex2html id marker 889 If a Markov chain is ergodic, then f... ...q_0$, $\lim_{k\to\infty}q_0T^k=\pi$, the stationary distribution. \end{exercise}$

$\begin{exercise} % latex2html id marker 891If $T$\ is ergodic and doubly stochastic, then the stationary distribution is uniform. \end{exercise}$

One way to guarantee that

is doubly stochastic is to ask $T=T^{tr}$ . A walk on a Cayley graph is always doubly stochastic. Having the generating set

be closed under inverses makes $T=T^{tr}$ , but we do that only to invoke the Laundau-Odlyzko theorem.

Theorem 3.4 (Landau-Odlyzko) A random walk on a regular undirected graph of degree $\Delta$ , diameter

and

vertices has an eigenvalue gap

$\displaystyle \gamma=1-\max_{\lambda \ne 1}\vert\lambda\vert\ge\frac{c}{n\Delta d}$

The eigenvalue gap tells us about the rate of convergence to the stationary distribution. This is always true, but easier if

is symmetric. Then the spectral theorem gives us an orthonormal basis $e_1,\ldots,e_n$ , with $e_iT=\lambda_i e_i$ . for

. We can choose $e_1=(1,\ldots,1)/\sqrt n$ . Then for

the rotation that sends those eigenvectors to the standard basis, $C^{-1}TC$ is diagonal, with entries its eigenvalues. This ``separation of coordinates'' makes raising the matrix to powers easy: $C^{-1}T^kC=(C^{-1}TC)^k$ , a diagonal matrix with entries $\lambda_i^k$ . This exponentially decays to the diagonal matrix with entries $(1,0,\dots,0)$ . Conjugating by the inverse of

then gives $T^\infty$ .

To measure convergence of

to $T^\infty$ , we need a measure of the size of a matrix and apply it to $T^k-T^\infty$ , which conjugated by

gives a diagonal matrix with entries $(0,\lambda_2,\ldots,\lambda_n)$ .

Definition 3.5 The operator norm or matrix norm $\vert A\vert$ of a matrix

is $\max_{x \ne 0}} \frac{\Vert xA\Vert}{\Vert x\Vert$ , where the norm $\Vert x\Vert$ of vectors is the $\ell^2$ -norm: $\sqrt{\sum x_i^2}$ . The Frobenius norm $\Vert A\Vert _F$ is the $\ell^2$ norm on the entries: $\sqrt{\sum a_{ij}^2}$ .

$\begin{exercise} % latex2html id marker 893 $\Vert A\Vert\le \Vert A\Vert _F\le ... ...ll $1$s matrix and the identity matrix show that these are sharp. \end{exercise}$

$\begin{exercise} % latex2html id marker 895 If $A=A^{tr}$\ then $\Vert A\Vert=\m... ...ambda}\vert\lambda\vert$. \ {\em Hint.} Use the spectral theorem. \end{exercise}$

$\begin{exercise} % latex2html id marker 897 If $C$\ is an orthogonal matrix, $\Vert AC\Vert=\Vert CA\Vert=\Vert A\Vert$. \end{exercise}$

Thus $\Vert T^k-T^\infty\Vert=\Vert C^{-1}(T^k-T^\infty)\Vert=\lambda_2^k= (1-\gamma)^k<e^{-\gamma k}$ . So for $\Vert T^k-T^\infty\Vert<\varepsilon$ it suffices to have $k\ge \frac{1}{\gamma}\ln\frac{1}{\varepsilon}$ .

Now we need to relate this bound on the operator norm to what we care about: the deviation of the distribution from uniform, $\sum_{j=1}^n\left\vert p_{ij}^{(k)}-\frac1n\right\vert$ .

Set $\varepsilon=\frac{\delta}n$ and $A=T^k-T^\infty$ . If $\Vert A\Vert<\varepsilon$ , then $\Vert A\Vert _F^2<n^2\varepsilon^2<\delta$ , so for all

and

, $\vert A_{ij}\vert<n\varepsilon=\delta$ . So to be within $\delta$ of a uniform distribution, we need $\Vert A\Vert<\varepsilon=\frac{\delta}n$ , which we can achieve with $k>\frac{\ln(\frac 1\varepsilon)}{\gamma}\sim\frac{2\ln n}{\gamma}$ (since $\delta$ is a constant) and by Landau-Odlyzko, $\gamma>\frac{c}{N\Delta d}>cn^{-7}$ (

, $\Delta\le n$ , $d\le n^3$ ), so we can let

, which leads to a similar diamater and we have proved

Theorem 3.6 If $S_n=\langle S\rangle$ and one of the generators has degree less than

then the diameter of the Cayley graph is $O(n^7\log^cn)$ .

Theorem 3.7 Every chain of subgroups in

has length at less than

$\begin{exercise} % latex2html id marker 899 Use the above theorem to show that a... ...ors of $S_n$\ has fewer than $2n$\ elements and thus $\Delta<2n$. \end{exercise}$