Matroids over Hyperfields, Part I

In this post and its sequel, I’d like to explain a new perspective on matroid theory which was introduced in this recent preprint which I wrote with Nathan Bowler.  The main observation is that by working with algebraic structures called hyperfields, in which addition is allowed to be multi-valued, linear subspaces, matroids, valuated matroids, and oriented matroids become special cases of a single general concept.  In the process of explaining this observation, I also hope to advertise how natural hyperfields are, for example in the context of tropical geometry.

Hyperstructures

The notion of an algebraic structure in which addition is allowed to be multi-valued goes back to Frédéric Marty, who introduced hypergroups in 1934.  Later on, in the mid-1950’s, Marc Krasner developed the theory of hyperrings and hyperfields in the context of approximating non-Archimedean fields, and in the 1990’s Murray Marshall explored connections to the theory of real spectra and spaces of orderings.  For the most part, however, the theory of hyperstructures was largely ignored by the mathematical community at large until Connes and Consani started advocating its potential utility in connection with $F_1$-geometry and the Riemann hypothesis.  There now seems to be a reappraisal of sorts going on within the math community of the “bias” against multi-valued operations.  As Oleg Viro puts it in this paper:

Krasner, Marshall, Connes and Consani and the author came to hyperfields for different reasons, motivated by different mathematical problems, but we came to the same conclusion: the hyperrings and hyperfields are great, very useful and very underdeveloped in the mathematical literature… Probably, the main obstacle for hyperfields to become a mainstream notion is that a multivalued operation does not fit to the tradition of set-theoretic terminology, which forces to avoid multivalued maps at any cost. I believe the taboo on multivalued maps has no real ground, and eventually will be removed.  Hyperfields, as well as multigroups, hyperrings and multirings, are legitimate algebraic objects related in many ways to the classical core of mathematics… I believe hyperfields are to displace the tropical semifield in the tropical geometry. They suit the role better. In particular, with hyperfields the varieties are defined by equations, as in other branches of algebraic geometry.

I will attempt to provide my own justification for the naturalness and utility of hyperfields by explaining how they unify in a conceptually very satisfying way various flavors of matroids.

Examples of hyperfields

Roughly speaking, a hyperfield is an algebraic structure satisfying similar axioms to a field, but where addition is allowed to be multi-valued.  Like fields, hyperfields come equipped with an additive identity element called zero and a negation map $x \mapsto -x$.  However, one does not require that the hypersum of $x$ and $-x$ is zero, only that zero is contained in the hypersum. Before giving a precise axiomatic definition, let us give some motivating examples.

1. (Fields) Any field $K$ is in particular a hyperfield.
2. (The Krasner hyperfield) The Krasner hyperfield ${\mathbb K}$ records the arithmetic of “zero” versus “nonzero”.  Specifically, define ${\mathbb K} = \{ 0,1 \}$ together with the binary operation $0 \odot 0 = 0, 0 \odot 1 = 0, 1 \odot 1 = 1$ and the binary hyperoperation $0 \boxplus 0 = \{ 0 \}, 0 \boxplus 1 = \{ 1 \}, 1 \boxplus 1 = \{ 0,1 \}$.  If we think of $1$ as representing “nonzero”, the hyperaddition rules just say that zero plus zero is zero and zero plus something nonzero is always nonzero, but the sum of two things which are nonzero could be either zero or nonzero.  The negation operator is the identity map, since negative zero is zero and the negative of something nonzero is again nonzero.
3. (The hyperfield of signs) The hyperfield of signs ${\mathbb S}$ records the arithmetic of “zero”, “positive”, and “negative”, represented by the symbols $0, 1, -1$, respectively.  The product $x \odot y$ is defined in the obvious way for $x,y \in {\mathbb S} := \{ 0, 1, -1 \}$.  Addition is also defined in the “obvious” way except we have $1 \boxplus -1 = \{ 0, 1, -1 \}$ since the sum of a positive number and a negative number could be either zero, positive, or negative.  The negation operator takes $0$ to itself and interchanges $1$ and $-1$.
4. (The tropical hyperfield) The tropical hyperfield ${\mathbb T}$ records the arithmetic of valuations.  More precisely, if $v : K \to {\mathbb T} := {\mathbb R} \cup \{ \infty \}$ is the valuation map on a valued field $K$ with value group ${\mathbb R}$, the hypersum $a \boxplus b$ consists of all possible values of  $v(\alpha+\beta)$ where $\alpha,\beta$ are elements of $K$ with $v(\alpha)=a$ and $v(\beta)=b$.  (Note that the axioms for a valuation tell us that $v(\alpha \cdot\beta) = a + b$.) Concretely, this means that we should define $a \odot b := a + b$ (with the evident conventions when one of $a,b$ equals $\infty$), and we define $a \boxplus b := {\rm min}(a,b)$ if $a \neq b$ and $a \boxplus a := \{ z \in {\mathbb T} \; : \; z \geq a \}$.  The multiplicative identity element is $0$ and the additive identity element is $\infty$.  The negation operator is the identity map, since the unique value of $b$ such that $\infty \in a \boxplus b$ is $b = a$.

All of these examples are special cases of the following construction due to Krasner.  Let $A$ be a multiplicative semigroup, let $K$ be a field, and let $\phi : K \twoheadrightarrow A$ be a surjective homomorphism of multiplicative semigroups such that $\phi^{-1}(0)=\{ 0 \}$. The induced hyperfield structure on $A$ is defined by the hyperaddition law
$x \boxplus y := \phi(\phi^{-1}(x) + \phi^{-1}(y))$ and the negation rule $-x := \phi(-1) \cdot x$.

The above examples all have an additional property not shared by all hyperfields: they are doubly distributive (see below for the definition).

Definition of hyperrings and hyperfields

A hyperoperation on a set $S$ is a map $\boxplus$ from $S \times S$ to the collection of non-empty subsets of $S$.

If $A,B$ are non-empty subsets of $S$, we define
$A \boxplus B := \bigcup_{a \in A, b \in B} (a \boxplus b)$
and we say that $\boxplus$ is associative if $a \boxplus (b \boxplus c) = (a \boxplus b) \boxplus c$ for all $a,b,c \in S$.

A (commutative) hyperring is a set $R$ together with:

1. A commutative and associative binary operation $\odot$
2. A commutative and associative binary hyperoperation $\boxplus$
3. Distinguished elements $0,1 \in R$

such that:

1. $(R, \odot, 1)$ is a commutative monoid.
2. $0 \odot x = x \odot 0 = 0$ and $0 \boxplus x = x \boxplus 0 = \{ x \}$ for all $x \in R.$
3. For every $x \in R$ there is a unique element of $R$ (denoted $-x$) such that $0 \in x\boxplus -x.$
4. $a \odot (x \boxplus y) = (a \odot x) \boxplus (a \odot y)$ for all $a,x,y \in R.$

A hyperring $F$ is called a hyperfield if $0 \neq 1$ and every non-zero element of $F$ has a multiplicative inverse.

There are examples of hyperfields which do not arise from Krasner’s construction given above, but they rarely appear in practice.

A hyperfield $F$ is called doubly distributive if it satisfies $(a \boxplus b)\odot (c \boxplus d) = (a\odot c) \boxplus (a \odot c) \boxplus (b \odot c) \boxplus (b\odot d)$ for all $a,b,c,d \in F$.

Zero sets of polynomials

Given an associative hyperoperation $\boxplus$ on a set $S$, we define the hypersum $x_1 \boxplus \cdots \boxplus x_m$ of $x_1,\ldots,x_m$ for $m \geq 2$ recursively by the formula
$x_1 \boxplus \cdots \boxplus x_m := \bigcup_{x' \in x_2 \boxplus \cdots \boxplus x_m} x_1 \boxplus x'.$

If $F$ is a hyperfield and $g(x_1,\ldots,x_k) = \sum a_I \underline{x}^I$ is a polynomial in $k$ variables with coefficients $a_I \in F$, the zero set of $g$ is
$Z(g) := \{ \underline{z} \in F^k \; : \; 0 \in \boxplus a_I \underline{z}^I \}.$

For example, setting $g(x_1,\ldots,x_k) = x_1 + \cdots + x_k$ with $k \geq 2$, we have:

1. If $x_1,\ldots,x_k \in {\mathbb K}$, then $0 \in x_1 \boxplus \cdots \boxplus x_k$ iff $\{ i \; | \; x_i = 1 \}$ does not have exactly one element.
2. If $x_1,\ldots,x_k \in {\mathbb S}$, then $0 \in x_1 \boxplus \cdots \boxplus x_k$ if and only if all $x_i = 0$ or the nonzero $x_i$‘s are not all equal.
3. If $x_1,\ldots,x_k \in {\mathbb T}$, then $\infty \in x_1 \boxplus \cdots \boxplus x_k$ if and only if the minimum of the $x_i$ occurs (at least) twice.

The third of these examples is particularly interesting in the context of tropical algebra / geometry.  Indeed, the zero set $Z(g)$ of a polynomial with coefficients in ${\mathbb T}$, in the above hyperfield sense, is what tropical geometers usually call the bend locus of $g$: the set of $\underline{z} \in {\mathbb T}^k$ such that the maximum of the terms $a_I \underline{z}^I$  is achieved at least twice.  The bend locus appears completely naturally in the context of hyperfields, whereas it has to be introduced artificially in the context of “min-plus algebra”.

Grassmann-Plücker relations and functions

With an eye toward generalizing the notion of “linear subspace” from fields to hyperfields, we first recall how one can encode an $r$-dimensional linear subspace of $K^m$, where $K$ is a field, as a certain kind of alternating multilinear function.

Let $W$ be an $r$-dimensional linear subspace of $K^m$, and let $w_1,\ldots,w_r$ be a basis for $W$.  Write the vectors $w_i$ as the rows of an $r \times m$ matrix $A$ with coefficients in $K$.  Let $E = \{ 1,\ldots,m \}$, and for each $i \in E$ let $v_i$ be the $i^{\rm th}$ column of $A$.  For $(i_1,\ldots,i_r) \in E^r$, let $\varphi_A (i_1,\ldots,i_r)$ be the determinant of the $r \times r$ matrix whose $i^{\rm th}$ column is $v_i$.  The function $\varphi_A : E^r \to K$ is nonzero (since $A$ has rank $r$), alternating, and if we replace $w_1,\ldots,w_r$ by another basis $w_1',\ldots,w_r'$ for $W$ and $A$ by the corresponding matrix $A'$, the corresponding functions satisfy $\varphi_A (i_1,\ldots,i_r) = c \cdot \varphi_{A'} (i_1,\ldots,i_r)$ for some nonzero $c \in K$.  So the projective equivalence class of $\varphi_W$ of the function $\varphi_A$ depends only on the subspace $W$.  In addition to being alternating, the function $\varphi := \varphi_A$ satisfies the following somewhat less obvious collection of algebraic relations:

(Grassmann-Plücker relations) For any two subsets $\{ x_1,\ldots,x_{r+1} \}$ and $\{ y_1,\ldots,y_{r-1} \}$ of $E$,
$\sum_{k=1}^{r+1} (-1)^k \varphi(x_1,x_2,\ldots,\hat{x}_k,\ldots,x_{r+1}) \cdot \varphi(x_k,y_1,\ldots,y_{r-1}) = 0.$

Conversely, it is a well-known classical fact that any nonzero alternating function $\varphi : E^r \to K$ satisfying the Grassmann-Plücker relations is of the form $\varphi_A$ for some $r \times m$ matrix $A$ or rank $r$, and two functions $\varphi_A$ and $\varphi_{A'}$ are projectively equivalent if and only if the row spaces of $A$ and $A'$ are equal.  (This is usually phrased in terms of Plücker vectors and the Grassmannian variety $G(r,m)$, but the two formulations are easily seen to be equivalent.)

Grassmann-Plücker functions over hyperfields

Now let $F$ be a hyperfield.  A function $\varphi : E^r \to F$ is called a Grassmann-Plücker function if it is nonzero, alternating (meaning that $\varphi(x_1,\ldots,x_i, \ldots, x_j, \ldots, x_r)=-\varphi(x_1,\ldots,x_j, \ldots, x_i, \ldots, x_r)$ and $\varphi(x_1,\ldots, x_r) = 0$ if $x_i = x_j$ for some $i \neq j$), and it satisfies the following generalization of the Grassmann-Plücker relations:

(GP) For any two subsets $\{ x_1,\ldots,x_{r+1} \}$ and $\{ y_1,\ldots,y_{r-1} \}$ of $E$,
$0 \in \boxplus_{k=1}^{r+1} (-1)^k \varphi(x_1,x_2,\ldots,\hat{x}_k,\ldots,x_{r+1}) \odot \varphi(x_k,y_1,\ldots,y_{r-1}).$

Note that the negation operation $x \mapsto -x$ on $F$ is used both to make sense of what it means to be alternating and in writing down (GP).

As noted above, if $F=K$ is a field then a projective equivalence class of Grassmann-Plücker functions $\varphi : E^r \to K$ is the same thing as an $r$-dimensional subspace of $K^m$.

We now come to the first big punchline of our story:

Proposition 1: If $F = {\mathbb K}$ is the Krasner hyperfield, a projective equivalence class of Grassmann-Plücker functions $\varphi : E^r \to {\mathbb K}$ is the same thing as a rank $r$ matroid on $E$.

Indeed, it follows from Example 1 in the section “Zero sets of polynomials” that if $\varphi : E^r \to {\mathbb K}$ is a nonzero alternating function and ${\mathbf B}_\varphi$ denotes the set of $r$-element subsets $\{ e_1,\ldots, e_r \}$ of $E$ such that $\varphi(e_1,\ldots,e_r) \neq 0$, then ${\mathbf B} := {\mathbf B}_\varphi$ satisfies the Grassmann-Plücker relations (GP) if and only if

(BE) Given $B,B' \in {\mathbf B}$ and $b \in B \backslash B'$, there exists $b' \in B' \backslash B$ such that $(B \cup \{ b' \}) \backslash \{ b \} \in {\mathbf B}$.

But condition (BE) is precisely the Basis Exchange property for matroids!  In other words, ${\mathbf B}_\varphi$ is the set of bases of a rank $r$ matroid $M_{\varphi}$ on $E$.  Conversely, given such a matroid $M$, we can define $\varphi_M : E^r \to {\mathbb K}$ by setting $\varphi_M(e_1,\ldots,e_r) = 1$ if $\{ e_1,\ldots,e_r \}$ is a basis of $M$ and $\varphi_M(e_1,\ldots,e_r) = 0$ if not.  By the exchange property (BE), the function $\varphi_M(e_1,\ldots,e_r)$ will satisfy (GP).  Since two nonzero functions are projectively equivalent over ${\mathbb K}$ if and only if they are equal, the Proposition now follows easily.

Motivated by Proposition 1, if $F$ is a doubly distributive hyperfield we define an $F$-matroid (or matroid over $F$) of rank $r$ on $E$ to be a projective equivalence class of Grassmann-Plücker functions $\varphi : E^r \to F$.  Thus matroids over fields are the same as linear subspaces, and matroids over the Krasner hyperfield are the same as matroids in the usual sense.

We have the following additional interesting examples of matroids over hyperfields:

Proposition 2: If $F = {\mathbb S}$ is the hyperfield of signs, a rank $r$ matroid over ${\mathbb S}$ is the same thing as a rank $r$ oriented matroid on $E$.

Proposition 3: If $F = {\mathbb T}$ is the tropical hyperfield, a rank $r$ matroid over ${\mathbb T}$ is the same thing as a rank $r$ valuated matroid on $E$.

We previously discussed valuated matroids (also known as tropical linear spaces) in this blog post.  For more on oriented matroids, see for example this AMS blog post.  In the theory of oriented matroids, Grassmann-Plücker functions are called chirotopes.

In a sequel post, we will discuss duality theory for matroids over hyperfields as well as a cryptomorphic characterization of matroids over hyperfields in terms of circuits.  We will also say a bit more about the algebra and geometry of hyperrings.

Note added January 2017: In a forthcoming update to my paper with Nathan Bowler, we will explain in detail why the double distributivity condition appears in the above definition of $F$-matroid above.