Algebraic Values of Transcendental Functions at Algebraic Points

In honor of Pi Day 2023, I’d like to discuss Hilbert’s 7th Problem, which in an oversimplified (and rather vague) form asks: under what circumstances can a transcendental function take algebraic values at algebraic points?

The connection with $\pi$ is that Lindemann proved in 1882 that the transcendental function $f(z) = e^z$ takes transcendental values at every nonzero algebraic number. Since $e^{\pi i} = -1$ by Euler’s formula, this proves that $\pi i$, and hence $\pi$ itself, is transcendental. In light of this theorem, it is natural to wonder what if anything is special here about the function $f(z) = e^z$ and the point $z=0$.

One thing that’s special about $z=0$ is that if $\alpha \neq 0$ is algebraic and $e^\alpha$ is also algebraic, then both $n\alpha$ and $e^{n \alpha}$ are algebraic for all $n \in {\mathbb Z}$, and these numbers are all distinct. So one might be led to speculate that if $f$ is a transcendental entire function then there are only finitely many algebraic numbers $\alpha$ for which $f(\alpha)$ is also algebraic.

Unfortunately, as Hilbert knew, this is completely false. For example, the function $f(z) = e^{2\pi iz}$ is transcendental but it takes the rational value 1 at every integer. In 1886, Weierstrass had given an example of a transcendental entire function that takes rational values at all rational numbers; later, in 1895, Stäckel showed that there is a transcendental entire function that takes rational values at all algebraic points. However, the functions of Weierstrass and Stäckel, are in some sense “pathological”; they have large growth rates and do not occur “in nature”. The challenge is to make this intuitive feeling more precise, and also to distinguish $e^z$ from $e^{2\pi iz}$.

One thing that is special about $e^z$, which is not shared by any of the other functions mentioned in the previous paragraph, is that it satisfies a linear differential equation with rational coefficients (namely $f'(z) = f(z)$). The existence of such a (not necessarily linear) differential equation turns out to be the key idea needed to generalize Lindemann’s theorem in a substantial way.

Another fruitful generalization is to rephrase our original question as an unlikely intersection problem: given two algebraically independent entire functions $f_1(z)$ and $f_2(z)$ satisfying suitable hypotheses, can we conclude that there are only finitely many complex numbers $\alpha$ such that $f_1(\alpha)$ and $f_2(\alpha)$ are simultaneously algebraic? This generalizes our original question by letting $f_1(z) = z$ and $f_2(z) = f(z)$.

One reason this is a fruitful generalization is that it contains, as a special case, the main question that Hilbert chose to focus on in his 7th problem:

Question (Hilbert, 1900): If $\alpha,\beta$ are algebraic numbers with $\alpha \neq 0,1$ and $\beta$ irrational, is it true that $\alpha^{\beta}$ is transcendental?

To see the connection, let $f_1(z) = e^z$ and $f_2(z) = e^{\beta z}$. If $\alpha^\beta$ is algebraic, then $f_1(w)$ and $f_2(w)$ are both algebraic for the infinitely many complex numbers $w = \log \alpha, 2\log \alpha, 3 \log \alpha, \ldots$

(For the record, Hilbert also formulated his question as the following equivalent problem in Euclidean geometry: In an isosceles triangle, if the ratio of the base angle to the angle at the vertex is algebraic but not rational, does this imply that the ratio between the base and side is transcendental? Also, according to Siegel, Hilbert often speculated that a solution to this problem would materialize later than a proof of the Riemann Hypothesis or Fermat’s Last Theorem. It was solved by Gelfand in 1934.)

It remains to find suitable hypotheses on $f_1$ and $f_2$ under which we can actually prove such a finiteness theorem. This is the content of the main result we’ll be discussing in this blog, the Schneider-Lang theorem.

The Schneider-Lang theorem

The Schneider-Lang theorem not only has an elegant formulation, its proof is — at least to my taste — more motivated and intuitive than many other approaches to the transcendence of $\pi$. It also contains ideas, such as the construction of a suitable auxiliary polynomial using Siegel’s Lemma, which play a very important role in modern transcendence theory.

Note: I will assume some familiarity with complex analysis in the remainder of this post, as well as some basic facts from algebraic number theory. I have benefitted particularly in the exposition which follows from the book “Transcendental Numbers” by M.R. Murty and P. Rath. Other exposition of the proof can be found in the Appendix 1 to S. Lang’s book “Algebra” and in this Master’s thesis.

In order to state the result, we first need a definition from complex analysis.

Definition: An entire function $f$ is said to be of finite order if there exists $\rho > 0$ such that $\log | f(z) | \ll R^\rho$ whenever $|z| \leq R$. The infimum of all such $\rho$ is called the order of $f$. A meromorphic function is said to have order at most $\rho$ if it is the quotient of two entire functions of order at most $\rho$.

Example: The function $e^z$ has order 1.

Theorem (Schneider-Lang): Let $f_1,\ldots,f_d$ be meromorphic functions of finite order, and assume that $f_1$ and $f_2$ are algebraically independent over $\overline{\mathbb Q}$. Suppose furthermore that the derivatives $f_1',\ldots,f_d'$ belong to the ring $K[f_1,\ldots,f_d]$ for some number field $K$. Then there are at most finitely many  $w \in K$ such that $f_i(w) \in K$ for all $i=1,\ldots,d$.

More precisely, the proof can be optimized to show that if $f_1$ and $f_2$ have order $\rho_1$ and $\rho_2$, respectively, then the number of such $w$ is bounded by $(\rho_1 + \rho_2) [K:{\mathbb Q}]$. This bound is sharp in the case where $K = {\mathbb Q}$, $f_1(z) = z$, and $f_2(z) = e^{z(z-1)(z-2)\cdots(z-k+1)}$.

To see that the Schneider-Lang theorem implies the transcendence of $\pi$, we deduce from it the following:

Corollary (Hermite-Lindemann): Let $\alpha$ be a nonzero algebraic number. Then $e^{\alpha}$ is transcendental.

Proof: Suppose not, and let $K = {\mathbb Q}(\alpha,e^{\alpha})$. Let $f_1(z) = z$ and $f_2(z) = e^{\alpha z}$. The conditions of the Schneider-Lang theorem are satisfied, since growth rate considerations show that $f_1$ and $f_2$ are algebraically independent. However, $f_1(w),f_2(w) \in K$ for all natural numbers $w$, a contradiction. Q.E.D.

The Schneider-Lang theorem also implies the Gelfand-Schneider theorem, which provides a positive answer to Hilbert’s 7th problem:

Corollary (Gelfand-Schneider): Let $\alpha,\beta$ be algebraic numbers with $\alpha \neq 0,1$ and $\beta$ irrational. Then $\alpha^{\beta}$ is transcendental.

Proof: Suppose not, and let $K = {\mathbb Q}(\alpha,\beta,\alpha^{\beta})$. Let $f_1(z) = e^z$ and $f_2(z) = e^{\beta z}$. Because $f_1$ and $f_2$ are algebraically independent, the conditions of the Schneider-Lang theorem are satisfied. However, $f_1(w), f_2(w) \in K$ for all  $w = \log \alpha, 2\log \alpha, 3 \log \alpha, \ldots$, a contradiction. Q.E.D.

This shows, for example, that $2^{\sqrt{2}}$ and $e^{\pi}$, and $i^i$ are all transcendental.

Example: The applications of Schneider-Lang we’ve given so far involve two functions that satisfy a first-order linear differential equations. For an application of the more general formulation of the theorem, consider the Weierstrass $\wp$-function associated to a complex lattice $\Lambda$. It is a meromorphic function on ${\mathbb C}$ with poles only at the points of $\Lambda$, and it satisfies a non-linear second-order differential equation of the form $(f')^2 = 4f^3 - g_2 f - g_3$ for certain complex numbers $g_2,g_3$ associated to $\Lambda$. The Schneider-Lang theorem can be used to prove that if $g_2,g_3$ are algebraic then $\wp(\alpha)$ is transcendental for all algebraic numbers $\alpha \not\in \Lambda$. To see this, one supposes that $\wp(\alpha)$ is algebraic and takes the number field $K = {\mathbb Q}(\alpha, g_2, g_3, \wp(\alpha))$ and the functions $f_1(z) = z, f_2(z) = \wp(z)$, and $f_3(z) = \wp'(z),$ which can be shown to be have order at most 3. Moreover, $f_1$ and $f_2$ are algebraically independent. If $\alpha \not\in \Lambda$, the addition formula for the Weierstrass $\wp$-function shows that $\wp(n\alpha) \in K$ whenever $n \in {\mathbb Z}$ and $n\alpha \not\in \Lambda$. This contradicts the Schneider-Lang theorem (see Chapter 10 of Murty-Rath for details).

Sketch of the proof

Although the general version of the Schneider-Lang theorem given above is useful for applications such as transcendence of special values of the Weierstrass $\wp$-function, for the applications to Hermite-Lindemann and Gelfand-Schneider we only needed the following simplified version:

Theorem (Schneider-Lang; simplified form): Let $f,g$ be algebraically independent entire functions of order at most 1. Suppose furthermore that the derivatives $f',g'$ belong to the ring $K[f,g]$ for some number field $K$.Then there are at most finitely many $w \in K$ such that both $f(w),g(w) \in K$.

We focus on the proof of this statement, which contains all of the most important ideas from the general case. Our proof will show that if $w_1,\ldots,w_m \in K$ and $f(w_j),g(w_j) \in K$ for $j=1,\ldots,m$, then $m \leq 4 [K:{\mathbb Q}]$.

We denote by ${\mathcal O}_K$ the ring of algebraic integers belonging to the number field $K$.

Step 1: (Auxiliary function) Construct a polynomial $F(z) = \sum_{i,j=1}^r a_{ij} f(z)^i g(z)^j$ in $f$ and $g$ with $a_{ij} \in {\mathcal O}_K$ not all zero such that $F(z)$ vanishes to order at least $n$ at each $w_j$, where $n$ is some large integer. With a judicious choice of $r$, we can ensure that the algebraic integers $a_{ij}$ are not too “large” by using a famous consequence of the Pigeonhole Principle known as Siegel’s Lemma. (Specifically, we choose $r \approx (2mn)^{1/2}$.)

Step 2: (Extrapolation) Assuming that $m > 4 [K:{\mathbb Q}]$ and that $F$ vanishes to order at least $s \geq n$ at each $w_1,\ldots,w_m$, show that it in fact $F$ vanishes to order $s+1$ at each $w_j$.

Since an entire function function which vanishes to infinite order at some $w \in {\mathbb C}$ must be identically zero, we conclude from Step 2 that $F \equiv 0$. By the choice of $F$ in Step 1, this shows that $f_1$ and $f_2$ are algebraically dependent, a contradiction.

Admittedly, our description of Step 2 was quite vague, so let’s break it into a couple of more detailed sub-steps:

Step 2a (Liouville inequality): If $F$ vanishes to order exactly $s \geq n$ at some $w_j$, say $w_1$, then $\alpha := F^{(s)}(w_1)$ is a non-zero algebraic number. Moreover, because of the way we constructed $F$ and the differential equations satisfied by the $f_i$, we can give upper bounds for the degree of $\alpha$, the “denominator” of $\alpha$ (the least positive integer $D$ such that $D\alpha$ is an algebraic integer), and the absolute values of all the complex conjugates of $\alpha$. Since the norm of a nonzero algebraic integer is a nonzero rational integer, the absolute value of the norm must be at least 1. This gives a nontrivial lower bound for $|\alpha|$ in terms of the above data.

Step 2b (Maximum modulus principle): Since $F$ vanishes to order at least $s$ at each $w_j$, the function $G(z) = \frac{F(z)}{\prod_{\ell = 1}^m (z - w_\ell)^s}$ is entire. The maximum modulus principle thus implies that $|G(w_1)|$ is bounded above by its maximum value on any circle of radius $R>0$ around $w_1$. Choosing $R$ carefully (specifically, we choose $R = s^{1/2}$) and using our assumption that $f$ and $g$ have order at most 1, we obtain an upper bound for $|\alpha| = |F^{(s)}(w_1)|$ which contradicts the lower bound from Step 2a if $m > 4 [K:{\mathbb Q}]$ and $n$ is sufficiently large.

A more detailed sketch

Here is a more quantitative sketch of the proof.

Suppose $\alpha$ is a nonzero algebraic number of degree $d$, and let $\alpha_1,\ldots,\alpha_d \in {\mathbb C}$ be its conjugates, i.e., the roots of the minimal polynomial $P_\alpha$ of $\alpha$ over ${\mathbb Q}$. Let $a$ be the leading coefficient of $P_\alpha$. Define the denominator ${\rm den}(\alpha)$ of $\alpha$ to be $|a|$; this is the least positive integer $D$ such that $D \alpha \in {\mathcal O}_K$. Define the size $H(\alpha)$ of $\alpha$ to be $H(\alpha) = \max \{ |\alpha_1|,\ldots,|\alpha_d| \}$.

If $P$ is a polynomial with algebraic coefficients, we write ${\rm den}(P)$ (resp. $H(P)$) for the LCM of the denominators of the coefficients of $P$ (resp. the maximum of the sizes of the coefficients of $F$).

We will need the very simple:

Lemma (Liouville Inequality): If $\alpha \in {\mathbb C}$ is a nonzero algebraic number of degree $d$, then

$|\alpha| \geq \frac{1}{{\rm den}(\alpha)^d \cdot H(\alpha)^{d-1}}.$

Proof: Since ${\rm den}(\alpha) \cdot \alpha \in {\mathcal O}_K$, the product of all conjugates of ${\rm den}(\alpha) \cdot \alpha$ is a nonzero integer, hence it is at least 1. This easily yields the desired inequality. Q.E.D.

We will also need the following two results. We write $D = \frac{d}{dz}$ for the usual differentiation operator.

Lemma (Derivative Lemma): With $f,g, w_1,\ldots,w_m$ and $K$ as in the statement of the Schneider-Lang theorem, there exists a constant $C > 0$ with the following property. If $P \in K[x,y]$ is a polynomial of degree $r$ with coefficients in $K$ and $F = P(f,g)$, then for all positive integers $k$ we have $H(D^k F(w_\ell)) \leq H(P) r^k k! C^{k+r}$ and ${\rm den}(D^k F(w_\ell)) \leq {\rm den}(P)C^{k+r}$ for all $\ell = 1,\ldots,m$.

For a proof, see Appendix 1, Lemma 3 in S. Lang’s “Algebra”. Lang’s argument uses the notion of a derivation on a commutative ring, together with induction on $k$. The details are a bit too cumbersome for this post, but it should seem plausible that we can obtain bounds for $H(D^k F(w_\ell))$ and ${\rm den}(D^k F(w_\ell))$ using the fact that the ring $K[f,g]$ is closed under taking derivatives, together with the generalized Leibniz formula

$D^k (f^i g^j)(w_\ell) = \sum_{t=0}^k \binom{k}{t} D^t(f^i)(w_\ell)D^{k-t}(g^j)(w_\ell).$

Lemma (Siegel’s Lemma): Let $K$ be a number field. There exists a constant $C > 0$ depending only on $K$ with the following property. For $1 \leq i \leq r$ and $1 \leq j \leq n$, with $n > r$, let $\alpha_{ij} \in K$ be algebraic numbers of size at most $A$ and denominator at most $B$. Then the system of $r$ homogeneous linear equations $\sum_{j=1}^n \alpha_{ij} x_j = 0$ has a nonzero solution $(x_1,\ldots,x_n) \in {\mathcal O}_K^n$ satisfying $H(x_j) \leq C^{\frac{n}{n-r}} (nAB)^{\frac{r}{n-r}}$ for all $j = 1,\ldots,n$.

For a proof, see Appendix 1, Lemma 2 in S. Lang’s “Algebra”. (Lang assumes that the $\alpha_{ij}$ are algebraic integers, but the general case follows easily by clearing denominators in each of the linear equations.) Here we explain the proof of Siegel’s Lemma in the special case where the coefficients of the linear forms in question belong to ${\mathbb Z}$. The general case can be reduced to this one by writing everything in terms of an integral basis for $K/{\mathbb Q}$.

Proof over ${\mathbb Z}$: Let $M = (a_{ij})$ be the associated matrix, thought of as a map from ${\mathbb R}^n$ into ${\mathbb R}^r$ which takes the lattice ${\mathbb Z}^n$ into ${\mathbb Z}^r$. Let $H \geq 1$ and let $B(H)$ be the set of vectors in ${\mathbb Z}^n$ with coordinates of size at most $H$ in absolute value. Then $M$ maps $B(H)$, which has size $(2H + 1)^n$, into $B(nAH)$, which has size $(2nAH + 1)^r$. If

$(2nAH + 1)^r < (2H)^n,$

then by the Pigeonhole Principle there will be two distinct vectors in $B(H)$ mapping to the same point. The difference of these two vectors gives a solution in $B(2H)$ to the homogeneous system $Mx = 0$. Choosing $H = (2nA)^{\frac{r}{n-r}}$ gives the desired result in a sharpened form. Q.E.D.

We now return to our quantitative sketch of the simplified version of Schneider-Lang.

For notational convenience, we define $h(\alpha) = \log {\rm den}(\alpha) + \log H(\alpha)$ if $\alpha \neq 0$ and $h(0)=0$.

We wish to find $a_{ij} \in {\mathcal O}_K$, not all zero, such that $F(z) = \sum_{i,j=1}^r a_{ij} f(z)^i g(z)^j$ satisfies $D^k F(w_\ell) = 0$ for $1 \leq \ell \leq m$ and $0 \leq k \leq n-1$, where $r,n$ are parameters to be determined later. This amounts to solving the following linear system of $mn$ equations in $r^2$ unknowns:

$\sum_{i,j=1}^r a_{ij} D^k(f^i g^j)(w_\ell) = 0.$

Using the generalized Leibniz formula, one sees that the numbers $D^k(f^i g^j)(w_\ell)$ all belong to $K$, and the Derivative Lemma gives

$h(D^k F(w_\ell)) = 2k \log k + O(k)$

for all $\ell$. If we choose $r,n$ such that $r^2 \approx 2mn$ (i.e., we have roughly twice as many equations as unknowns), Siegel’s Lemma shows that we can find such $a_{ij}$ with $h(a_{ij}) \leq n\log n + O(n+r)$.

Since $f$ and $g$ are algebraically independent over ${\mathbb Q}$, and hence over $K$, the function $F$ is not identically zero. Let $s$ be the smallest integer such that all derivatives of $F$ up to order $s-1$ vanish at the points $w_1,\ldots,w_m$ but $D^s F$ does not vanish at some $w_{\ell}$, say $w_1$. Then $s \geq n$ by construction, and the Derivative Lemma gives

$h(D^s F(w_1)) = 2s \log s + O(s).$

Since $D^s F(w_1) \neq 0$, the Liouville inequality gives

$-\log |D^s F(w_1)| \leq 2[K:{\mathbb Q}]s \log s + s \log s + O(s).$

However, the assumption that $f$ and $g$ have order at most allow us to give a lower bound for this quantity:

Claim: If $s \gg 0$ then $-\log |D^s F(w_1)| \geq (m/2) s \log s + s \log s + O(rs^{1/2}).$

Since $r = O(n^{1/2})$ and $s \geq n$, we obtain a contradiction if $m > 4 [K:{\mathbb Q}]$ and $n \gg 0.$ This finishes the proof of the Schneider-Lang theorem, modulo the claim.

To prove the claim, note that since $F$ vanishes to order at least $s$ at each $w_j$, the function $G(z) = F(z) \cdot \frac{\prod_{\ell = 2}^m (w_1 - w_\ell)^s}{\prod_{\ell = 1}^m (z - w_\ell)^s}$ is entire. Using the fact that $| f(z) |, |g(z)| \ll C^R$ for some constant $C>0$ by hypothesis, the Maximum Modulus Principle applied to $G(z)$ along the disc of radius $R = s^{1/2}$ around $w_1$ implies that

$|G(w_1)| \ll C^{r R} R^{-ms}.$

But a simple computation shows that $|G(w_1)| = \frac{D^s(F(w_1))}{s!}$, and the result follows. Q.E.D.

Concluding Remarks

(1) For the quantitative version of the Schneider-Lang theorem mentioned above, giving the sharp bound of $(\rho_1 + \rho_2) [K:{\mathbb Q}]$ for the number of exceptional points, see e.g. M. Waldschmidt’s book “Nombres Transcendents”.

(2) The modular $j$-function, which is holomorphic on the complex upper-half plane ${\mathbb H}$, takes algebraic values at all imaginary quadratic numbers. Conversely, Schneider showed that if $\tau \in {\mathbb H}$ is not imaginary quadratic then $j(\tau)$ is transcendental. This can be deduced from the Schneider-Lang theorem by way of the Weierstrass $\wp$-function, see Chapter 15 of Murty-Rath.

(3) It is interesting and useful to replace algebraic points by algebraic subvarieties in many of the above considerations. For example, given a transcendental map $f : X \to Y$ between complex algebraic varieties defined over $\overline{\mathbb Q}$, one can study the algebraic subvarieties $V$ of $X,$ defined over $\overline{\mathbb Q}$, such that $f(V)$ is also an algebraic subvariety defined over $\overline{\mathbb Q}$. Theorems along such lines are known as Ax-Schanuel type theorems, and they have played influential role in recent developments in Diophantine geometry. We mention, for example, the proof of the André-Oort conjecture, a uniform version of the Mordell-Lang conjecture, and the relative Manin-Mumford conjecture.

Happy Pi Day!