Skip to main content

Fundamental theorem of symmetric functions and an application

Symmetric polynomials

Let R be a commutative ring and R[x1,x2,,xn] the polynomial ring in n indeterminates over R. A polynomial f(x) in it is said to be a symmetric polynomial (or symmetric function) of x1,,xn if it is unchanged by any permutation of the indeterminates. Consider the polynomials:

e1(x1,,xn)=x1+x2++xne2(x1,,xn)=x1x2+x1x3++x1xn+x2x3++xn1xnei(x1,,xn)=1j1<j2<<jinxj1xjien(x1,,xn)=x1x2xn

All of f(t,x1,,xn) and ei(x1,,xn) are symmetric functions of x1,,xn. The ei's are called elementary symmetric polynomials of x1,,xn. Sums, products and scalar multiples of symmetric polynomials are again symmetric polynomials. A polynomial g(e1,,en) of e1,,en becomes a symmetric polynomial g(x1,xn) when ei's are written in terms of xi's. The fundamental theorem on symmetric functions is the converse of this statement, that every symmetric polynomial can be expressed uniquely in this way, which is in this article stated and proved.

Introduce one more indeterminate t and consider the polynomial

f(t,x1,,xn)=(tx1)(tx2)..(txn)=tne1(x1,,xn)tn1++(1)nen(x1,,xn)t

by which

f(t,x1,,xn)=i=0ntniei(x1,,xn).

Note that symmetric polynomials give general expressions for coefficients of a polynomial in terms of its roots. The fundamental theorem (once proved) can then be thought of as being able to express any symmetric function of the roots of a polynomial, in terms of the coefficients without needing to explicitly compute the roots.

Note that the discussion so far is valid for any ring R by e.g., first doing in the ring Z[t,x1,,xn] and tensoring with R. A more sophisticated way to state context is to construct the ring of symmetric functions as a "power series ring" constructed as the inverse limit as n varies, of the ring of symmetric polynomials in n variables over R. Details of such a construction is available eg in MacDonald (1979) [MD1979]. The exposition here is of van der Waerden (1930) [vdW1930] and Lang (1993) [La1993].

Every polynomial ei is a homogeneous polynomial of degree i in x1,xn.

Consider the homomorphism from R[x1,,xn] to itself sending x1e1,,xnen. Given a polynomial g(x1,,xn), one gets the symmetric polymonial g(e1,,en) considered as a polynomial in x1,,xn. The term a.e1μ1..enμn of g(e1,,en) where a is a scalar, becomes a homogeneous polynomial w. r. t. xi of degree μ1+2μ2++nμn in x1,,xk. The sum μ1+2μ2++nμn is said to be the weight of the monomial a.e1μ1..enμn. Weight of a polynomial may be defined as the highest among the weights of monomial summands. A polynomial g of weight k as above thus yields a symmetric polynomial in xi of weight at most k.

Fundamental Theorem of Symmetric Functions

Let R be a commutative ring and R[x1,x2,,xn] the polynomial ring in n indeterminates over R. A symmetric polynomial f(x1,,xn) of degree k can be expressed as a polynomial of the elementary symmetric polynomials. This representation is unique.

For proof, we follow the presentation of van der Waerden (1930) [vdW1930]. First we establish that any symmetric polynomial can be expressed as a polynomial of the elementary symmetric polynomials.

The proof is constructive and uses a lexicographically ordering of the monomials, in which a term x1μ1..xnμn is greater than e1ν1..enνn if the first nonvanishing difference μiνi is positive. The component monomials of f are written in decreasing lexicographical order, where, given a particular monomial, all other resulting monomials result from permuting the xi in it are considered together with it and written implicitly in short using the lexicographically greatest term only inside a summation, like:

ax1μ1..xnμn.

Let k be the degree of f and let the first term in the lexicographic ordering be ax1μ1..xnμn. Thus k=μ1+2μ2++nμn. Now the product

ae1μ1μ2e2μ2μ3..enμn

has weight k and when expanded as a function of the xi and lexicographically ordered, has initial term ax1μ1..xnμn. The weight of this product is less than or equal to

μ1μ2+2(μ2μ3)+3(μ3μ4)++nμn=k.

This product may be subtracted from f(x) and the resulting polynomial, in lexicographical ordering, has the term corresponding to weight k and indices μ1,,μn, equal to zero. On this difference, the procudure may be inductively applied. This subtraction does not increase the degree of the polynomial. For each given degree, there are only finitely many choices of possible indices. At each step we move further along the lexicographical ordering as some term vanishes. Therefore the inductive procedure terminates after finitely many steps and the polynomial f is expressed as a polynomial of the elementary symmetric polynomials.

To show uniqueness of the representation, if two different polynomials functions f1 and f2of:math:ei represent the same symmetric function, then their (nonzero) difference f1f2 evaluated on the ei would yield 0. Therefore it is enough to show that if f(x1,,xn)0, then f(e1,,en)0. Given nonzero f, after lexicographical ordering, each monomial summand can be written in the form ax1μ1μ2exμ2μ3..xnμn (together with its permutations). Among all systems (μ1,,μn) of indices for nonzero coefficients a, there is one that is first in the lexicographic ordering. Replacing xi by ei in the above and expanding in terms of xi, then in the above algorithm the first term of the lexicographical ordering is

ax1μ1μ2exμ2μ3..xnμn,

and this term does not cancelled by any ones that are lesser in the lexicographical ordering, and thus g(e1,,en) is nonzero. Therefore uniqueness of the representation is proved.

QED


Weber (1895) [We1895] cites proofs of this theorem by Waring [1], Gauss [2] and Cauchy [3].

Let the symmetric group Sn act on R[x1,x2,,xn] sending xiσ(xi) where σSn is a permutation. Then the fundamental theorem says that the subring fixed under this action is the subring generated by e1,,en and that there are no relations between the generators ei; that is to say, the homomorphism generated by xiei is an isomorphism from R[x1,x2,,xn] to itself.

As the ring of symmetric functions in n variables and the polynomial ring in n variables are isomorphic, one may substitute particular values in R or another approprite rings for the xi and any relations between the the symmetric polynomials are preserved for the evaluations. In particular, considering f as in eqref{eq:elemsymmpoly}, symmetric functions of the roots of a polynomial are expressible as polynomials of the coefficients of f.

Discriminant of a polynomial

Consider the polynomial f=anxn+an1xn1++a0 with roots xi. Consider the difference product

i<j(xixj)

which is an alternating function of xi; that is to say, it changes sign upon an odd permutation of the xi. Its vanishing indicates that f has at least one root of multiplicity greater than 1. Its square is called the discriminant:

Δ=i<j(xixj)2

and is a symmetric function of the xi. By the preceding, the discriminant is expressible as a polynomial function of the coefficents of f. Given a more general polynomial f(t)=a0+a1tn++antn where an is not necessarily 1, with roots xi, the discriminant may be defined as:

Δ=an2(n1)i<j(xixj)2=ij(xixj).

The product ij(xixj) is a vandermonde determinant:

|1x1x12x1n11x2x22x2n11x3x32x3n11xnxn2xnn1|

This agrees with definition of descriminant given in the quadratic case ax2+bx+c where it is equal to b24ac. For the case of cubics considered for e. g., for elliptic curves such as ax3+bx2+cx+d, the discriminant is b2c24ac34b3d27a2d2+18abcd, which may be simplified for the polynomial x3+ax+b into (4a3+27b2). A method for computing discriminants from the coefficents of f will be given proved in §5. These statements may be verified either by computation by hand using that methord, or a software computer algebra system.

The discriminant is a homogeneous polynomial of the roots; therefore it is also a homogenous w. r. t. the coefficients.

The discriminanant is a determinantal expression on polynomials in R[x]; therefore, it may also be thought of as an element living in the exterior algebra on R[x].

Resultant of two polynomials

Consider two polynomials in R[x] at once, viz:

f(x)=anxn+an1xn1++a0g(x)=bmxm+bm1xm1++b0.

We wish to find conditions for f(x) and g(x) to have a common divisor. We follow the approach of van der Waerden (1930) [vdW1930].

In the above, in general, it may be that either an or bn vanish; we call n or m the respective formal degree of the polynomial and an or bm the formal leading coefficent. For the present we assume that an and bn do not simultaneously vanish.

[U. F. D.]: Unique Factorization Domain

Lemma 1. f(x) and g(x) have a nonconstant common divisor ϕ(x) only if an equation of the form

h(x)f(x)=k(x)g(x)

exists, where h(x) is of degree at most m1 and k(x) is of degree at most n1, and h,k do not simultaneously equal 0. If R is a field F, the above condition is also sufficient.

Proof.

If φ(x) is a common factor of f(x) and g(x), then one may decompose f(x)=φ(x)h(x) and g(x)=φ(x)k(x); these h(x) and g(x) will satisfy eqref{eq:polycommfactor}.

Now we show sufficiency and start with h(x)f(x)=k(x)g(x) If R is a field F, then we have unique factorization into irreducibles also in the polynomial ring R[x]. Without loss of generality let it be that an0. Every prime factor of f(x) must also be a factor of k(x)g(x) with at least the same exponent; but f(x) does not divide k(x) as f is of degree n and k of degree n1. Therefore at least one factor of f(x) divides k(x); therefore k is nonvanishing. Q. E. D.

Now put

h(x)=cm1xm1+cm2xm2++c0k(x)=dn1xn1+dn2xn2++d0.

where the highest degree coefficent might also be zero.

Evaluating h(x)f(x)=k(x)g(x) and expanding we get n+m homogeneous linear equations in n+m quantities:

c0a0=d0b0c0a1+c1a0=d0b1+d1b0c0a2+c1a1+c1a1=d0b2+d1b1+d1b1cm2an+cm1an1=dn2bm+dn1bm1cm1an=dn1bm.

For a solution to exist, the determinant of the Sylvester matrix should vanish:

R=|a0a1ana0a1ana0a1anb0b1bmb0b1bmb0b1bm|

Here, to avoid minus signs in the determinant, ci and dj were regarded as unknowns, and the terms invoving dj in eqref{eq:sylveqn} were moved to the left, thus yielding a matrix equation in n+m unknowns that equals zero, for which the (n+m)×(n+m) matrix should have zero determinant to have solutions.

[gcd]: greatest common divisor

The quantity R is called the resultant of f and g. By the preceding, its vanishing is necessary and sufficient condition for f and g to have a nonconstant common factor. Generally, for more efficient computation of a determinant of a polynomial matrix with coefficents in a field, one strategy that may be employed is to use the row operations mentioned in the [article on determinants]({filename}determinants.md); one may multiply one row with a scalar and add or subtract it from another, or even apply a variant of the Euclidean algorithm to compute the “gcd” of the rows involved, and repeat the procedure with more rows, and finally use the Leibniz expansion. For ease of computation, software for computer algebra such as PARI/GP have built-in functions to compute resultants and these may be used.

The determinant contains the “principal term” anmb0n and this is the highest degree term in R considered as a polynomial of aμ and bν. It vanishes when an=bn=0. There are precisely m rows with entries an,,a1,a0 and n rows with entries bm,,b1,b0. The resultant is a homogeneous polynomial of degree m in the aμ and degree n in the bν.

In the above, it was assumed that at least one of an,bm was nonzero. To deal with the case when both are zero, consider the homogeneous polynomials

F(x,y)=anxn+an1xn1y++a0ynG(x,y)=bmxm+bm1xm1y++b0ym.

Any factorization of f(x)=(pnxr+pr1xr1++p0)(qnxs+qs1xs1++q0) yield a factorization F(x)=(pnxr+pr1yxr1++p0yr)(qnxs+qs1xs1y++q0ys) and vice versa; factors of f may be obtained from those of F by setting y=1. Same argument may be made for g and F. Every common factor of f and g gives a common factor of F and G and vice versa. The whole process above of taking determinant of the the Sylvester matrix may be repeated. The case an=bm=0 corresponds to when a nonconstant common factor of F,G is a pure power of y. Thus the resultant criterion may be reformulated, using homogeneous polynomials, as: if the resultant is zero, then F and G have a nonconstant, homogeneous common factor and conversely.

Resultant as a symmetric function

Suppose that the two polynomials f,g above can be factored completely into linear factors:

f(x)=an(xx1)(xxn)g(x)=bm(xy1)(xym).

Then the coefficents aμ of f(x) are products by an of the elementary symmetric functions of the roots xi. Likewise the coefficents bν of g(x) are products by bn of the elementary symmetric functions of the roots yj. The resultant R is homogenous of degree m in the aμ and degree n in the bν where it is unchanged by a permutation of the xi or yj.

Applying the fundamental theorem twice in succession, R can be written as a0mb0n times a polynomial function of the coefficents aμ and bν, that are respectively the values of the elementary symmetric functions on xi and yj. The following more explicit argument also may be made given xi and yj.

Suppose the xi and yj are indeterminates. For some i and j let xi=yj. Then R(f,g)=0 as f,g has a common factor. Therefore (xixj) divides R(f,g) and likewise for each pair of i and j. Thus the product of all (xixj) divides R(f,g).

Consider

g(x)=bmj(xyj)(xym)

and form the product

ig(xi)=bmnij(xiyj)(xym).

Similarly, from

f(x)=bmi(xxi)(xym)

form the product

jg(xj)=(1)mnanmij(xix)(xym).

It is convenient to define

S=anmbmnij(xiyj)=anmig(xi)=(1)mnbmnjf(yj).

Now like R, S is also homogenous of degree m in the aμ and degree n in the bν and is unchanged by a permutation of the xi or yj. Therefore R and S can differ only by a scalar factor. But the highest powers of either an or bm in both S and R corresponds to the “principal term” anmbmn and therefore the scalar factor must be 1 and R=S.

The representation of R in terms of aμ and bν is unique by the uniqueness part of the elementary theorem for symmetric functions. Therefore the foregoing is valid even if f(x) and g(x) do not split into linear factors.

Moreover, the computation of R via determinant of Sylvester matrix is valid over any ring, as determinant computation involves only addition, subtraction and scalar multiplication. Regarding arbitrary field of definition, we may additionally prove the absolute irreducibility of R. That is to say, R is an irreducible polynomial of the aμ and bν over any field of definition. To see this, assume that R factors as R=R1R2, where R1 is nonconstant; that is, for some i,j, the monomial (xiyj) divides R1. But as R1 and R2 are polynomial expressions in aμ and bν, they are also symmetric functions in the roots xi and yj. Thus the product i,j(xiyj) divides R1. Therefore R2 must divide anmbmn. But looking at the determinant of the Sylvester matrix, one sees that R is not divisible by an or bn. Therefore R2=1 and a nontrivial factorization of R is impossible.

Resultant and discriminant

Consider the polynomial

f(x)=anxn++a0=an(xx1)(xxn)

and its derivative f(x). The resultant R(f,f) is

R(f,f)=ann1if(xi).

By the product rule for differentiation:

f(x)=ani(xx1)(xxi1)(xxi+1)(xxn)f(xi)=ani(xix1)(xixi1)(xixi+1)(xixn)

Using this in eqref{eq:polyderiresu},

R(f,f)=an2n1ij(xixj),

and if Δ denotes the discriminant,

R(f,f)=anΔ(f).

Thus, the discriminant may be computed as using the determinant of Sylvester Matrix for R(f,f):

|anan1an2a1a0000anan1an2a1a000 0 0anan1an2a1a0nan(n1)an1(n2)an2 a1000nan(n1)an1(n2)an2 a100 000nan(n1)an1(n2)an2 a1|

Bibliography


[vdW1930] (1,2,3)

van der Waerden, B. L., Moderne Algebra, 1930; Algebra I {% amazon 0387406247 %} & II {% amazon 0387406255 %}, Springer, 1991.

[Bo1992]

Bosch, S., Algebra, Springer, Heidelberg, 1992.

[La1993]

Lang, S., Algebra {% amazon 1461265517 %}, Addison-Wesley, 1993, Rev. 3rd, Ed., Graduate Texts in Mathematics 211, Springer, 2002.

[Pr2001]

Prasolov, V. V., Polynomials {% amazon 3642039790 %}, Springer, 2001.

[MD1979]

MacDonald, I. G., Symmetric Functions and Hall Polynomials {% amazon 0198739125 %}, Oxford Mathematical Monographs, 1979.

[We1895]

Weber, H., Lehrbuch der Algebra, Vieweg, 1985.