*By Art Duval, Contributing Editor, University of Texas at El Paso*

How is \(0^0\) defined? On one hand, we say \(x^0 = 1\) for all positive \(x\); on the other hand, we say \(0^y = 0\) for all positive \(y\). The French language has the Académie françaiseto decide its arcane details. There is no equivalent for mathematics, so there is *no one* deciding once and for all what \(0^0\) equals, or if it even equals anything at all. But that doesn’t matter. While some definitions are so well-established (e.g., “polynomial”, “circle”, “prime number”, etc.) that altering them only causes confusion, in many situations we can define terms as we please, as long as we are clear and consistent.

Don’t get me wrong; the notion of mathematics as proceeding in a never-ending sequence of “definition-theorem-proof” is essential to our understanding of it, and to its rigorous foundations. My mathematical experience has trained me to ask, “What are the definitions?” before answering questions in (and sometimes out of) mathematics. Yet, while we tell students that the definition needs to come before the proof of the theorem, what students apparently hear is that the definition needs to come before the idea, as opposed to the definition coming from the idea.

**Why definitions?**

What is a definition anyway? Or rather, what gets defined? We could make a special name for the function that maps \(x\) to \(5x^{17} – 29x^2 + 42\), but we don’t. On the other hand, we give the name “sine function” to \(\sin(x)\), the ratio of the length of the side opposite an angle with measure x to the length of the hypotenuse of a right triangle. We give a name to the sine function, even though it takes much longer to describe than \(5x^{17} – 29x^2 + 42\); in fact, we give it a name in part precisely *because* it takes longer to describe. If we need to refer to \(5x^{17} – 29x^2 + 42\), it’s not that hard, but we do not want to have to write down that definition of sine every time we use it in a statement or problem. We give definitions to ideas for two related reasons:

**Brevity:** It’s clearly easier to write “\(\sin(x)\)” instead of the huge sentence above. Further, packing this idea into a single word helps make it easier to chunk ideas in an even longer statement, such as a trigonometric identity.

**Repetition:** If we have to use the same idea more than once, then giving it a compact name increases the efficiency described above that much more. Sometimes an idea repeats just locally, within a single argument or discussion, and then we might temporarily give it a name; for instance when finding the maximum value \(x e^{-x}\), we would write \(f(x)=x e^{-x}\), so we could then write \(0 =f'(x)\), but we are only using \(f\) this way in this one problem. On the other hand, the ideas that show up over and over again, in many different contexts, such as \(\sin(x)\) or “vector space”, get names that stick.

This begs the question, “Why do certain ideas, or combinations of conditions, repeat?” Consider “vector space”. The idea of \(R^n\) is clear enough, but of all its properties, why focus on the simple rules satisfied by vector addition and scalar multiplication?

Defining terms in mathematics involves more choices than students think.

First, because several additional examples have been found that satisfy these rules, such as the vector space of continuous functions, the vector space of polynomials, and the vector space of polynomials of degree at most 5. Second, because once the key properties that make up the definition are identified, we may find that the proofs only depend on those key properties: The Fundamental Theorem of Linear Algebra, for instance, is true for arbitrary finite-dimensional vector spaces, so we don’t need a separate proof for \(R^n\), for polynomials of degree at most 5, etc. (Purists may argue that all finite-dimensional vector spaces of the same dimension are isomorphic, but this isomorphism is defined in terms of vector addition and scalar multiplication, just reinforcing the significance of those operations.)

**Choices**

But there are often still choices to be made. Must a vector space include the zero vector, or could it be empty? (Is the empty set a vector space)? For that matter, since vectors are often described as being determined by “a direction and a magnitude” and the zero vector has no direction, is the zero vector even a vector? The answers to these questions are no and yes, respectively, but why? The zero vector is a vector, because it is so helpful for a vector space to be a group under addition, which requires an identity element. (I know — this only takes us back to why are groups defined the way they are. Let’s just take this as a piece of evidence for why groups are an important definition.)

As for the empty vector space, there’s nothing inherently wrong with it, except perhaps for the need for a zero vector as discussed above. (This also takes us back to why groups are not allowed to be empty. Let’s stick to vector spaces for now.) But how would we define the dimension of an empty vector space? How would we define the sum of the empty vector space with another vector space? And then, even if we do make those definitions, how do we reconcile them with this identity?:

\[

\dim (A+B) =\ \dim A\ +\ \dim B\ -\ \dim (A \cap B)

\]

This example shows that, even though we cannot write the proof of a theorem until all the relevant definitions are stated, we do often look ahead at the theorem before settling on the fine points of the definition. At research-level mathematics, we might even modify our definitions substantially to make our theorems stronger, or to deal with potential counterexamples. (For more details on this, read Imre Lakatos’ classic Proofs and Refutations [1].) I will stick to smaller cases where we adjust definitions mostly just to make the theorems easier to state.

**More examples**

Why is 1 considered to be neither prime nor composite? When you first learn this, it may seem silly. The definition of prime is so simple and elegant — an integer \(n\) is prime if its only factors are 1 and \(n\) — and 1 seems to fit that definition just fine. Why make an exception? The answer lies in the Fundamental Theorem of Arithmetic, that every integer has a unique factorization. Well, except of course that we could change the order of the factors around; for instance, it makes sense to consider \(17 \times 23\) to be the same factorization as \(23 \times 17\). And also we need to leave out any factors of 1, otherwise we might consider \(17 \times 23, 1 \times 17 \times 23, 1 \times 1 \times 17 \times 23\), … to all be different factorizations. If we take a little extra effort at the definition, and rule out 1 as a prime number, then the theorem becomes more elegant to state.

Is a square also a rectangle? In other words, should we define rectangle to include the possibility that the rectangle is a square, or exclude that possibility? When children first learn about shapes, it’s easier to simply categorize shapes, so a shape could be either a rectangle or a square, but not both. But when writing a careful definition of rectangle, it takes more work to exclude the case of a square than to simply allow it. Similarly, theorems about rectangles are easier to state if we don’t have to exclude the special cases where the rectangle happens to be a square: “Two different diameters of a circle are the diagonals of a rectangle” is more elegant than “Two different diameters of a circle are the diagonals of a rectangle, unless the diameters are perpendicular, in which case they are the diagonals of a square.”

Is 0 is a natural number? It doesn’t really matter; just pick an answer, be consistent, and move on. It’s even better if we can use non-ambiguous language instead, such as “positive integers” or “non-negative integers.” To be sure, mathematics is picky, but let’s not be picky about the wrong things.

Finally, what about \(0^0\)? If you just look at limits, you’d be ready to declare that this expression is undefined (the limit of \(x^y\) as \(x\) and \(y\) approach 0 is not defined, even just considering \(x \geq 0\) and \(y \geq 0\)). And that’s fine. But in combinatorics, where I work, setting \(0^0 =1\) makes the binomial theorem (\((x+y)^n = \sum \binom{n}{k} x^k y^{n-k}\)) work in more cases (for instance when \(y=0\)). And so we simply *declare* \(0^0=1\), at least in combinatorics, even though it might remain undefined in other settings.

(See herefor a list of other “ambiguities” in mathematics definitions.)

In each of these examples, there is a human choice about how to exactly state the definition. This is a great freedom. But, to alter a popular phrase, with great freedom comes great responsibility. If you declare \(0^0\) is a value *other* than 1, now you are limiting, not expanding, the applicability of the binomial theorem. And if you want to declare that \(\frac{1}{0}\) has *any* numerical value, you will have to sacrifice at least some of the field axioms in your new number system.

**In the classroom**

The issues that arise with developing precise mathematical definitions is well-known to mathematicians, but we generally don’t share it with our students enough. If we stop hiding this story from our students, then they will see that mathematics is a human endeavor, and that mathematical subjects are not handed down to us from on high. This can be one factor in convincing students that mathematics, even advanced mathematics, is something they can do, that it is not just reserved for other people. And even students who already “get it” will not be turned off — we should not abandon definition-theorem-proof, we can just pay more attention to sharing why each of our definitions is written the way it is. If students know where a definition comes from, what motivated it, and why we made the choices we did, they may have a better chance of making sense of the idea instead of memorizing the string of words or symbols. (See also my earlier blog post, A Call for More Context.)

An anecdote that Keith Devlin tells, near the end of a blog post about mathematical thinking, illustrates the power of crafting the right definition. To summarize much too briefly, his task was to “look at ways that reasoning and decision making are influenced by the context in which the data arises” in a national security setting. His first step was to “write down as precise a mathematical definition as possible of what a *context* is.” When he presented his work to government bigwigs, they never got past his first slide, with that definition, because the entire room spent the whole time discussing that one definition; later he was told “That one slide justified having you on the project.”

We might not have the luxury of spending an entire hour discussing a single definition, but we can still let students in on the secret that the definitions are up to us, and that writing them well can make all the difference.

**References**

[1]Lakatos, Imre. *Proofs and refutations. The logic of mathematical discovery.* Edited by John Worrall and Elie Zahar. Cambridge University Press, Cambridge-New York-Melbourne*,* 1976.