Making Your Own Sense

Gerry-mean-dering

David Butler — Sat, 12 Aug 2023 06:43:13 +0000

A recent video from Howie Hua showed how if you split a collection of numbers into equal-sized groups, then find the mean of each group, then find the mean of those means, it turns out this final answer is the same as the mean of the original collection. He was careful to say it usually does not work if the groups were different sizes. Which got me to wondering: just how much of an effect on the final mean-of-means can you have by splitting a collection of numbers into different-sized groups?

This is how I asked the question:

You have the whole numbers from 1 to 12.
You split them into however many groups you like of whatever sizes you like, so every number is used exactly once.

For example, you could split them like this:
{2,5},{8},{1,6,10,11},{3,4,7,9,12}.

Now find the mean of each of these groups.

In the example, those means are
(2+5)/2=3.5, 8, (1+6+10+11)/4=7, (3+4+7+9+12)/5=7.

Then find the mean of those means.

In the example, this mean-of-means is
(3.5+8+7+7)/4=6.375.

What is the smallest you can make the final mean-of-means, by choosing your groups just right?

Having played with this question online for a day, and in person with others at two sessions of One Hundred Factorial, it has now become one of my favourite problems. It has been particularly fascinating watching other people think about the problem. It seems to defy instincts to begin with, but as people try things out, their instincts develop and they quite quickly have a conjecture about the smallest possible answer. Proving this really is the smallest possible answer was actually quite difficult, though I have to say I really enjoyed the process. The ideas that produced the proof came from multiple people, including me, Lyron Winderbaum, Alex Mackay and James Fitton-Gum..

I’m going to include a proof here, because I love the ideas in it so much, and because I happen to find a certain pleasure in writing up a proof.

While writing it, I was struck anew by how different the process of coming up with a proof is from writing one. You can know a thing is true, and know all the reasons why, and be able to explain to someone why, but it’s not the same thing at all as a formal written proof. A formal written proof is a thing all its own, which requires its own special thought processes different from just coming up with or explaining the ideas in a proof to someone live. For a start, a formal proof usually does not run by just giving suggestive examples like an explanation can. You have to define everything carefully and choose variables and terminology, and choose an efficient and logical order for the statements you make, which usually takes reordering along the way. I happen to like this process, even though it can be hard work.

Anyway, the proof is done and I have written it below. It is slightly less formal than the traditional formal proof, but I like it. if you don’t want to see a proof right now, please look away!

In all that follows, assume that n is a fixed natural number.

Definition 0:
Consider a collection of disjoint sets of numbers. When you calculate the mean of each set, and then calculate the mean of those means, the final result is called the mean-of-means of the collection of sets.

For example, the mean-of-means of the collection of sets
{2,5},{8},{1,6,10,11},{3,4,7,9,12}
is
6.375.

Of particular interest here are the collections of sets that are a partition of the set {1,…,n}. A partition of a set is a collection of disjoint subsets which together contain all the elements of the original set. We are interested in which partition of {1,…,n} has the lowest mean-of-means.

Since the set {1,…,n} is finite, it has only finitely many possible partitions, and so there are only finitely many means-of-means. Therefore there must actually be a lowest mean-of-means.

Lemma 1:
Consider a collection of natural numbers that add to n. Of all partitions of the set {1,…,n} into subsets of those sizes, choose one with the lowest mean-of-means. In such a partition, if two numbers are in different-sized subsets, then the larger number is in the larger subset.

For example, if you partition {1,…,12} into subsets of sizes 2,3,3,4, then, out of all the ways to arrange the numbers into sets of those sizes, a way to do it to get the smallest mean-of-means is this:
{1,2},{3,4,5},{6,7,8},{9,10,11,12}.
A different arrangement of the numbers into the same sized subsets is:
{1,2},{3,9,5},{6,7,8},{4,10,11,12}.
And that has a bigger mean-of-means.

In the first case, the mean-of-means is
1/4( 1/2 (1+2) + 1/3 (3+4+5) + 1/3 (6+7+8) + 1/4 (9+10+11+12) )
= 1/4( 1/2+2/2+3/3+4/3+5/3+6/3+7/3+8/3+9/4+10/4+11/4+12/4)
In the second case, the mean of means is
1/4( 1/2 (1+2) + 1/3 (3+9+5) + 1/3 (6+7+8) + 1/4 (4+10+11+12) )
= 1/4( 1/2+2/2+3/3+9/3+5/3+6/3+7/3+8/3+4/4+10/4+11/4+12/4)

Comparing these calculations, everything is the same except that these two terms in the first expression
4/3+9/4
become these two terms in the second
9/3+4/4.
Changing that 4/3 to 9/3 makes the total go up by 5/3. Changing that 9/4 to 4/4 makes the total go down by 5/4. But 5/3 is more than 5/4 and so the total has gone up overall.

This is the argument I’ll use in the proof, though I’ll go the other way, from a partition with the numbers in the wrong order to one with the numbers in the right order.

Proof of Lemma 1:
Suppose a partition of {1,…,n} into k subsets has the number p in a subset of size t and the number q>p in a different subset of size s

The contribution of p and q to the mean-of-means is 1/k (p/t+q/s).

If everything in the partition is kept the same except that p and q are switched, then all numbers will make the same contribution to the mean-of-means except for p and q, whose contribution is now 1/k (p/s+q/t).

So, the change in the mean-of-means is
1/k (p/s+q/t-p/t-q/s)
= 1/k (q/t-p/t -q/s+p/s)
= 1/k ((q-p)/t-(q-p)/s)
= 1/k (q-p)(1/t-1/s)

Now t>s, so 1/t<1/s.
This means 1/t-1/s is negative, and since (q-p) is positive, the change in the mean-of-means is negative.
That is, the mean-of-means has decreased.

Therefore, if the partition is to produce the lowest mean-of-means, then for every pair of numbers in subsets of different sizes, the greater number must be in the larger subset. Otherwise the mean-of-means could be reduced.
End of proof!

Note that technically this proof does not show that there have to be consecutive numbers in each subset. For example,
{1,2},{3,5,8},{4,6,7},{9,10,11,12}
has the same mean-of-means as
{1,2},{3,4,5},{6,7,8},{9,10,11,12}.
But it doesn’t have a properly lower mean-of-means, so that’s ok.

Lemma 2:
Consider a collection of natural numbers that add to n. Of all partitions of the set {1,…,n} into subsets of those sizes, there is one with lowest mean-of-means where every subset has consecutive numbers.

Proof of Lemma 2:
Suppose a specific partition does have the lowest mean-of-means and consider the subsets of size s. Let p be the smallest number in all the subsets of size s and let q be the largest number in all the subsets of size s.

Consider a number c with p<= c <=q. The number c cannot be in a subset larger than s, or it would have to be larger than q, by Lemma 1. The number c cannot be in a subset smaller than s, or it would have to be smaller than p, also by Lemma 1. So c must be in a subset of size s. So, every number from p to q is in a subset of size s.

If there were only one subset of size s, this shows it contains consecutive numbers. If there is more than one subset of size s, consider again the number c in one of these sets.

If there are k subsets in total, then the contribution of c to the mean-of-means is c/(sk) regardless of which subset of size s it is in. So, the numbers in the subsets of size s can be rearranged into other collections of subsets of size s without changing the mean-of-means. Therefore, it is possible to arrange them with consecutive numbers in each subset of size s.
End of proof!!

The next thing I want to prove uses a move like the following.
Start with these two sets
{3,4,5},{6,7,8}.
Their means are 4 and 7.
Move the 5 from one set to the other to get the new sets
{3,4},{5,6,7,8}.
Their means are 3.5 and 6.5.

By moving the largest number in one set to become the smallest number in the other, both means have been reduced.

It is always true that if you remove the largest number from a set (that has more than one element), the mean reduces, and it is always true that if you include an extra number in a set smaller than all the ones already there, the mean reduces, but I just feel a need to prove them to myself.

Lemma 3:
Suppose a set of numbers S has a number p larger than all the others. Then the set S\{p} has a lower mean than S.

Proof of Lemma 3:
Let s be the size of S and the mean of S be σ.

The s-1 numbers in S other than p are all less than p. Therefore their total is less than (s-1)p.
So the total of S is less than (s-1)p+p=sp.
Hence the mean of S is less than sp/s=p. That is, σ

The total of S is sσ, so the total of S/{p} is sσ-p.
Since p>σ, that implies sσ-p

The mean of S\{p} is
(sσ−p)/(s-1)
<(s-1)σ/(s-1)
=σ.
That is, the mean of S\{p} is less than the mean of S.
End of proof!

Lemma 4:
Suppose T is a set of numbers and p is a number less than every number in T. Then the mean of {p}UT is less than the mean of T.

Proof of Lemma 4:
Let the size of T be t and the mean of T be τ. All of the t numbers in T are greater than p, so the total of T is greater than tp. Therefore the mean of T is greater than tp/t=p. That is, τ>p.

The total of T is tτ, so the total of {p}UT is tτ+p. Since p<τ, this implies
tτ+p< tτ+τ=(t+1)τ.

The mean of {p}UT is
(tτ+p)/(t+1)
<(t+1)τ/(t+1)
=τ.
That is, the mean of {p}UT is less than the mean of T.
End of proof!

Now I am ready to prove a different way to reduce the mean-of-means.

Lemma 5:
Let k be a natural number with k<=n. Of all the partitions of {1,…,n} into k subsets, the one with the lowest mean-of-means has the numbers k to n all together in the same subset, and all numbers less than k each in their own subset of size 1.

For example, when you split {1,…,12} into 4 subsets, the lowest mean-of-means belongs to {1},{2},{3},{4,…,12}.

Proof of Lemma 5:
Suppose {1,…,n} is partitioned into k subsets in such a way that there is a subset S with more than one number in it, and there is another subset T all of whose numbers are greater than those in S.

Let p be the largest number in S. Create a new partition by replacing S with S\{p}, replacing T with {p}UT, and leaving all other subsets unchanged. That is, move the number p from S to T.

Since the largest number p has been removed, the mean of S\{p} must be lower than that for S, by Lemma 3. Since a smaller number has been included, the mean of {p}UT must be lower than that for T, by Lemma 4.

Consider the list of k means from the sets in the new partition and compare to the list of k means from the sets in the old partition. All the means are the same except two of them, which are lower in the new partition. Therefore the total of the k means is lower in the new partition compared to the old partition. Since both lists have k means and the total has reduced, the mean-of-means must have reduced also.

Therefore, if there are any subsets with more than one number, whose numbers are below all the numbers in another subset, the mean-of-means is not the lowest possible for partitions with k subsets.

Suppose we do have the partition with k subsets that has the lowest mean-of-means out of all partitions with k subsets. It is necessarily the lowest mean-of-means for all partitions with its particular list of subset sizes.

By Lemma 1, in such a partition, all the biggest numbers are in the biggest subsets. If there are any other subsets, they must have exactly one number each, otherwise the mean-of-means could be reduced by the procedure above.

If there is more than one largest subset, the numbers in them can be rearranged without changing the mean-of-means, as shown in the proof of Lemma 2, so that the set containing n has consecutive numbers. In that case, this partition could not have the lowest mean-of-means for partitions with k subsets as it could again be reduced by the procedure above.

Since there are k subsets in total and all but one of them has size 1, there are k-1 subsets of size 1. The remaining subset contains the remaining largest numbers.
End of proof!

It’s worth noting at this point that the arguments above can actually work for any collection of distinct numbers, not just {1,…,n}. You can even modify them to deal with repeated numbers in the list.

I also realise that you could probably argue this in fewer lemmas, or different lemmas, but I just loved some of the arguments there so much I couldn’t bring myself to whittle it down and risk losing them. Sometimes the most efficient proof is not the most instructive or inspiring.

The final step, which does depend on the actual set being {1,…,n} is to find the optimum number of sets in the partition. To get my first answer for that, I used some calculus. But before I used calculus, I had to figure out a function for the lowest mean-of-means based on the number of subsets.

Lemma 6:
Let k be a natural number and consider the partition of the set {1,…,n} which has the numbers k to n all together in the same subset, and all numbers less than k each in their own subset of size 1. That is, the partition {1},…,{k-1},{k,…,n}.
The mean-of-means of this partition is 1/2 (k +n/k).

Proof of Lemma 6:
The sum of the numbers 1,…,k-1 is 1/2 (k-1)k.
The mean of the set {k,…,n} is (k+n)/2.
So the total of the means of the subsets in the partition {1},…,{k-1},{k,…,n} is
1/2 (k-1)k+(k+n)/2
= 1/2 ( (k-1)k+k+n)
= 1/2 ( k²-k+k+n)
= 1/2 (k²+n)

Therefore the mean-of-means of the partition is
1/k ∙1/2 (k²+n)
= 1/2 (k+n/k)
End of Proof!

Now I am ready to begin finding the number of subsets that has the lowest mean-of-means.

Lemma 7:
For each natural number k, consider the partition of the set {1,…,n} which has the numbers k to n all together in the same subset, and all numbers less than k each in their own subset of size 1. That is, the partition {1},…,{k-1},{k,…,n}.
The value of k with the lowest mean-of-means is either √n if it is an integer or it is one of the integers on either side of √n.

For example, when you partition {1,…,16}, the lowest mean-of-means happens with √16=4 subsets arranged as {1},{2},{3},{4,…,16}. When you partition {1,…,12}, since √12 is between 3 and 4, the lowest mean-of-means happens with one of either 3 groups or 4 groups.

Proof of Lemma 7:
By Lemma 6, the mean-of-means of such a partition is 1/2 (k+n/k)

Consider the function m(k) = 1/2 (k+n/k) for real number k>0. The minimum of this function will happen when its derivative is 0.

dm/dk= 1/2 (1-n/k²) =0
1=n/k²
k²=n
k=√n (since k is positive)

When k<√n, n/k²>1 so dm/dk=1-n/k² is negative.
When k>√n, n/k²<1 so dm/dk=1-n/k² is positive.
Thus as k increases, the function decreases until k=√n and then increases again.

If √n is itself an integer, then the minimum possible value of m occurs when k=√n.

If √n is not an integer, then the minimum value of m for integer k will occur at one of the nearest integers to √n.
End of proof!

The important question is, if √n is not an integer, which of the two integers on either side is the correct choice for the number of groups? Looking at the list of answers for several values of n, it appears that the correct k is actually the nearest integer to √n. That is, to find the correct k, round off √n in the usual way.

I wasn’t able to use derivatives to prove this, so I went back and re-proved the previous lemma in a different way that let me figure it out. This meant I should probably have removed Lemma 7 from the argument, but I really liked the argument I made, so I have left it in.

Lemma 8:
For each natural number k, consider the partition of the set {1,…,n} which has the numbers k to n all together in the same subset, and all numbers less than k each in their own subset of size 1. That is, the partition {1},…,{k-1},{k,…,n}.
The value of k with the lowest mean-of-means is ⌊√n⌉, the nearest integer to √n.

For example, when you partition {1,…,12}, since √12≈3.46 which rounds to 3, the lowest mean-of-means happens with 3 groups.

Proof of Lemma 8:
By Lemma 6, the mean-of-means for this partition is 1/2 (k+n/k).
Let m(k) = 1/2 (k+n/k) for natural numbers k.

Given a natural number k, consider how much the value of m changes between k and the next number:
m(k+1)-m(k)
= 1/2 ( (k+1)+n/(k+1) ) – 1/2 (k+n/k)
= 1/2 ( k+1 + n/(k+1) – k -n/k)
= 1/2 (1 + n/(k+1)-n/k)
= 1/2 (1+n(1/(k+1)-1/k) )
= 1/2 (1-n/(k(k+1))

This is negative if k(k+1)n. So the next value of m(k) is lower until k(k+1) ≥ n, and then after that, the next value is the same or higher.

That means the value of k with the lowest value of m(k) is the first natural value of k with k(k+1)≥n.

k(k+1) ≥ n
(k+1/2-1/2)(k+1/2+1/2) ≥ n
(k+1/2)²-(1/2)² ≥ n
(k+1/2)²-1/4 ≥ n
(k+1/2)² ≥ n +1/4
k+1/2 ≥ √(n+1/4)
k ≥ √(n+1/4) -1/2

So we are looking for the first integer equal to or greater than √(n+1/4) -1/2. Let r= √(n+1/4) -1/2. Then we are looking for ⌈r⌉, the ceiling of r.

Note that ceiling of r must be between r and r+1, so we will find bounds for r and r+1.

First notice r=√(n+1/4)-1/2 > √n-1/2
Now ⌈r⌉ >= r >√n-1/2
So ⌈r⌉>√n-1/2.

Also, from the working above that found the value of r,
(r+1/2)²=n+1/4
=(√n+1/2)²
So r+1/2<√n+1/2
So r<√n

Finally, begin again with
(r+1/2)²=n+1/4
(r+1-1/2)²=n+1/4
(r+1)²-2(r+1)∙1/2+1/4=n+1/4
(r+1)²-(r+1)=n
(r+1)²=n+r-1
=(√n+1/2)²
So r+1<√n+1/2
Now ⌈r⌉ So ⌈r⌉<√n+1/2

Therefore √n-1/2<⌈r⌉<√n+1/2.

The integer ⌈r⌉ is within 1/2 of √n, which means it must be the nearest integer to √n.
That is, ⌈r⌉= ⌊√n⌉.
That is, the value of k with the lowest mean-of-means is ⌊√n⌉.
End of proof!

Later, I came up with a different proof of Lemma 8, but I enjoyed the inequality reasoning and algebra from the proof I first made so much I didn’t want to remove it. So here is the alternative as a bonus.

Alternative Proof of Lemma 8:
By Lemma 6, the mean-of-means for this partition is 1/2 (k+n/k).
Let m(k) = 1/2 (k+n/k) for natural numbers k.

This is negative if k(k+1)n. So the next value of m(k) is lower until k(k+1) ≥ n, and then after that, the next value is the same or higher.

That means the value of k with the lowest value of m(k) is the first natural value of k with k(k+1)≥n.
(This is the same as the previous proof up until here.)

Suppose k is the first natural number with k(k+1)≥n.
Then k²+k≥n
k≥n-k²
n-k²≤k

So, if n is above k² then it is within k of k².

Now (k+1)² = k²+2k+1
= k²+k+k+1
≥ n+k+1
So (k+1)² – n ≥ k+1

Therefore, n is below (k+1)² and the nearest it can be to (k+1)² is k+1.
Hence n is closer to k² than to (k+1)².

If k is the first natural number with k(k+1)≥n, then k(k-1) k²-k < n
k²-k ≤ n-1 (since both k²-k and n are natural numbers)
k²-k+1 ≤ n
k²-n ≤ k-1

So if n is below k², then it is within k-1 of k².

Now (k-1)² = k²-2k+1
= k²-k-k+1
≤ n-1-k+1
= n-k
So (k-1)²-n ≤ -k
So n-(k-1)² ≥ k.

So n is above (k-1)² and the nearest it can be to (k-1)² is k.
Hence n is closer to k² than to (k-1)².

Therefore, k² is the nearest perfect square to n, and so k is the nearest whole number to √n.
End of proof!

The common part of the two proofs above has something interesting worth noting. When n=k(k+1), then m(k+1)-m(k)=0, so that means the two nearest values of k to √n actually both give the same value for the mean-of-means. For example, the number 12 is 3∙(3+1) so actually the lowest mean-of-means for n=12 happens both for 3 groups and 4 groups.

Now that we know what the partition with the lowest mean-of-means is, all that remains is to find the actual lowest value for the mean-of-means.

Lemma 9:
Let k= ⌊√n⌉ and let d = n-k². That is, let d be the signed distance from the nearest perfect square to n.
Consider the partition of the set {1,…,n} which has the numbers k to n all together in the same subset, with all numbers less than k each in their own subset of size 1. That is, the partition {1},…,{k-1},{k,…,n}.
The mean-of-means of this partition is k+d/(2k).

For example, the nearest perfect square to 12 is 9=3², and the distance from 9 to 12 is 3. So the mean-of-means of the partition {1},{2},{3,…,12} is 3+3/6 = 3.5.
For another example, the nearest perfect square to 21 is 25=5², and the distance from 25 to 21 is -4. So the mean of means of the partition {1},{2},{3},{4},{5,…,21} is 5-4/10=4.6.

Proof of Lemma 9:
By Lemma 6, the mean of means of the partition {1},…{k-1},{k,…,n} is
m=1/2 (k+n/k)
= 1/2 (k²+n)/k
= (k²+n)/(2k)
= (k²+k²+n-k²)/(2k)
= (2k²+d)/(2k)
= k + d/(2k)
End of proof!

It is worth noting that if n is itself a perfect square, then the k in Lemma 9 is exactly √n, and the distance d is 0, so the sum-of-sums of the partition {1},…,{k-1},{k,…,n} is exactly k=√n.

Now that the final answers have been found, I will put it all together into a theorem.

Theorem 10:
Out of all partitions of {1,…,n} into any number of subsets, the lowest mean-of-means possible occurs when the number of subsets is k=⌊√n ⌉, the nearest integer to the square root of n.
The lowest mean-of-means occurs for the partition which has the numbers k to n all together in the same subset, and all numbers less than k each in their own subset of size 1. That is, the partition is {1},…,{k-1},{k,…,n}.
If d = n-k², the signed distance from the nearest perfect square to n, then the lowest mean-of-means is k+d/(2k).

Proof of Theorem 10:
The proof is given by all the previous lemmas. I will recap here.

Lemma 1: Out of all partitions with a specific list of subset sizes, the ones with the lowest mean-of-means have larger numbers in larger subsets

Lemma 2: Out of all partitions with a specific list of subset sizes, there is one with the lowest mean-of-means that has consecutive numbers in every subset (proven using Lemma 1).

At this point, we know how to choose a partition to get the lowest mean-of-means for any specific list of subset sizes. The next step is to find the lowest mean-of-means out of all partitions with a specific number of subsets.

Lemma 3: Removing the largest number from a set reduces its mean.

Lemma 4: Including a new smallest number in a set reduces its mean.

Lemma 5: Out of all partitions with a specific number k of subsets, the one with the lowest mean-of-means has the numbers k to n all together in the same subset, and all numbers less than k each in their own subset of size 1 (proven using Lemmas 2, 3 and 4). That is, the partition is {1},…,{k-1},{k,…,n}.

At this point, we have found a specific kind of partition which has the lowest mean-of-means for all the partitions using each specific number of subsets. The overall lowest mean-of-means must belong to one of these few partitions.

Lemma 6: The partition {1},…,{k-1},{k,…,n} has mean-of-means is 1/2 (k+n/k).

Lemma 7: Of all partitions of the form {1},…,{k-1},{k,…,n}, the lowest mean-of-means happens when k is either √n if it is an integer or it is one of the integers on either side of √n (proved using Lemma 6).

Lemma 8: Of all partitions of the form {1},…,{k-1},{k,…,n}, the lowest mean-of-means happens when k=⌊√n⌉ , the nearest integer to the square root of n (proved using Lemma 6).

This now means that the lowest mean-of-means possible for all partitions happens for this specific kind of partition with this specific number of subsets.

Lemma 9: For the partition {1},…,{k-1},{k,…,n},with k=⌊√n⌉, the mean of means is k+d/(2k), where d=n-k², the signed distance from the nearest perfect square to n.

Since this specific partition with this specific number of subsets is the one with the lowest mean-of-means, this is indeed the way to calculate the lowest mean-of-means.
End of proof!!

A list of examples to round it out.

The lowest mean-of-means for partitions of {1,…,4}
happens with 2 sets {1},{2,3,4}
and the lowest mean-of-means is 2.
The lowest mean-of-means for partitions of {1,…,5}
happens with 2 sets {1},{2,3,4,5}
and the lowest mean-of-means is 2+1/4=2.25.
The lowest mean-of-means for partitions of {1,…,6}
happens with 2 sets {1},{2,…,6} and 3 sets {1},{2},{3,…,6}
and the lowest mean-of-means is 2+2/4=3-3/6=2.5.
The lowest mean-of-means for partitions of {1,…,7}
happens with 3 sets {1},{2},{3,…,7}
and the lowest mean-of-means is 3-2/6≈2.67.
The lowest mean-of-means for partitions of {1,…,8}
happens with 3 sets {1},{2},{3,…,8}
and the lowest mean-of-means is 3-1/6≈2.83.
The lowest mean-of-means for partitions of {1,…,9}
happens with 3 sets {1},{2},{3,…9}
and the lowest mean-of-means is 3.
The lowest mean-of-means for partitions of {1,…,10}
happens with 3 sets {1},{2},{3,…,10}
and the lowest mean-of-means is 3+1/6≈3.17.
The lowest mean-of-means for partitions of {1,…,11}
happens with 3 sets {1},{2},{3,…,11}
and the lowest mean-of-means is 3+2/6≈3.33.
The lowest mean-of-means for partitions of {1,…,12}
happens with 3 sets {1},{2},{3,…,12} and 4 sets {1},{2},{3},{4,…,12}
and the lowest mean-of-means is 3+3/6=4-4/8=3.5.
The lowest mean-of-means for partitions of {1,…,13}
happens with 4 sets {1},{2},{3},{4,…,13}
and the lowest mean-of-means is 4-3/8=3.625.
The lowest mean-of-means for partitions of {1,…,14}
happens with 4 sets {1},{2},{3},{4,…,14}
and the lowest mean-of-means is 4-2/8=3.75.
The lowest mean-of-means for partitions of {1,…,15}
happens with 4 sets {1},{2},{3},{4,…,15}
and the lowest mean-of-means is 4-1/8=3.875.
The lowest mean-of-means for partitions of {1,…,16}
happens with 4 sets {1},{2},{3},{4,…,16}
and the lowest mean-of-means is 4.
The lowest mean-of-means for partitions of {1,…,17}
happens with 4 sets {1},{2},{3},{4,…,17}
and the lowest mean-of-means is 4+1/8=4.125.

That’s enough. Thanks for reading, and I hope you enjoyed it.

Making the lie true

David Butler — Thu, 13 Jul 2023 02:25:48 +0000

We at my university regularly sell quite a big lie.

At Open Day and the Ingenuity STEM Showcase and any number of outreach activities, students do puzzles and play with construction toys and walk around with ropes and draw curves on balloons. Whether we say it explicitly or not, there is a message there that says: here at this University, maths is fun. This is a lie.

Maths at university is not fun. There are hours of video content to watch where the presentation is basically slides or handwritten examples. The classes are presentations, possibly with little quizzes breaking them up, or they consist of doing maths problems similar to the relentless weekly quizzes and assignments. Pictures are rare, making sense by manipulating something with your hands is much much rarer, making sense by moving your body is non-existent. The chances to chase your curiosity are few. The chances to have your own thinking validated and celebrated are fewer. It is very far removed from the experience of university maths the prospective students get when they visit us.

We are lying to our prospective students. The experience they have of university maths at our events is a lie.

I do understand that learning does not have to be “fun”, and expecting it to be so all the time is unreasonable and unhealthy. I also understand that ordinary everyday problem-solving and figuring out can feel fun. I understand as well that play, which is essential to learning deeply, is not the same thing as fun. But there is no denying that the activities we do with prospective students are indeed fun, and that experience is not what it will be like at university.

Do I want to change the activities we do with prospective students to look as boring as life will be at uni? Of course not. But there is another way to not lie, and it’s to make your lie true.

One way I make the lie true is to provide One Hundred Factorial, a weekly games, art and puzzle session where students can experience mathematical play without having to be assessed on it. The sorts of things that happen as a one-off at outreach events happen every week at One Hundred Factorial, and I think it would be a good thing to tell prospective students that this exists. (Writing this blog post is partly to help myself pluck up the courage to suggest to the academics in Maths here that they can do so.)

Another way is to actually include some of the features in your outreach activities actually in your teaching. I’ve seen the maths academics do an awesome job of running engaging activities and helping students feel like their efforts are meaningful and valued. They’re good at it. What I want to say to them is this: Perhaps you can actually include some whole-body movement or physical models in your university classes, or at very least in your videos. Perhaps you can actually have some free exploration of new ideas without having to immediately write an assignment about it. Perhaps you can keep the idea of celebrating students’ mathematical thought in the very front of your mind more often when they are doing everyday maths problems or answering questions in the lecture. Even just a little more of any of these things might make university maths a little more like the outreach activities you do so well.

The experience prospective students have in your outreach activities doesn’t have to be a lie. You can make the lie true.

Introducing Digit Disguises with a small game

David Butler — Sat, 08 Jul 2023 04:41:38 +0000

Because [reasons], my game Digit Disguises has been on my mind recently, and reading the original blog post from 2019, I suddenly realised I had never shared my ideas on how to introduce the game to a whole class at once. This blog post fixes that. To keep in the spirit of it, I have not put a link to the original blog post about the game yet, and will introduce it here the way I would in a class. It reads like a recipe, but of course you as the teacher can make your own professional decisions about what to do. I just find that approximately this plan works and want to share it.

Tell the class that you’re going to be playing a game called Digit Disguises in small groups, but first we’ll play a smaller version as a class.
Choose one volunteer to be the One Who Knows by whatever method you like best. You might like to choose someone who would get a boost from succeeding publicly. In my experience, even someone not confident with algebra will totally understand what to do as the One Who Knows, and will get a kick out of being the one with the secret knowledge.
Give the One Who Knows a sheet with A, B, C, D and spaces next to them. Show everyone what the sheet looks like before you hand it over to the One Who Knows. There is a printable version of the sheet at this link.
Ask the One Who Knows to secretly write the numbers 1, 2, 3, 4 next to the letters A, B, C, D in any order they like, but maybe mix them up a bit. Tell them to show nobody, not even you.
(I have chosen the numbers 1, 2, 3, 4 for a reason. I deliberately did not include 0 because I think it’s good for people to have the success of come up with the strategies around 0 for themselves later during a real game. Also I need four different numbers so that there is just enough to have to tease apart the logic in this game.)
Tell everyone that One Who Knows has disguised the numbers 1, 2, 3, 4 as letters and their goal is to collectively figure out which number is disguised as which letter.
The way they’re going to do this is to ask the One Who Knows to do a calculation using two different letters and +, -, × or ÷. Then the One Who Knows will tell us all what letter is the answer.
It’s important that the One Who Knows never says a number, only a letter.
To start them off, ask the One Who Knows for A+B, and remind them that they are not allowed to say a number out loud. There’s a 2/3 chance that the answer won’t be one of the letters, and they’ll ask you what to do. You can say they’ve already done it: if it’s not one of the letters, then just say it’s not one of the letters.
(If it doesn’t come up here, you can deal with the “not a letter” thing whenever it first comes up.)
Regardless of what happens with that first question, write down the question and the response on the board/screen/document camera, and then ask the class if they can say what that means for which number is disguised as which letter.
Now ask the class to suggest things to ask, and get the One Who Knows to respond. Write the questions and the answers up as you go. Each time, ask everyone what the response means for what you know about which number is disguised as which letter. You can also ask the person who suggested the calculation why they suggested what they did.
Note: There’s no need to push them too hard on the “what does it mean”. Just one idea is enough each time. However I do think it’s important to make sure they realise at some point that a response of “not a letter” is not actually failure but can give them information.
Note: It is very likely for someone to suggest early on something like A÷A. Tell the One Who Knows not to answer and ask the person why they suggested that, to which they are likely to say that the answer is 1, and then they’ll know what letter 1 is. Celebrate their thinking, because it’s a really important thing about numbers that they’re using there. But then say that actually the rules of the game say you have to use two different letters. In fact, if nobody does suggest a move like A÷A, then it might be worth asking at some point why the rules say you have to use two different letters, and then someone will possibly suggest this as a reason.
At some point, you will have figured out which number is disguised as which letter, and at that point, you should ask ask the class if they are sure. It’s a good idea to ask someone to go through the reasoning so far that got you to the point you are at, and then ask again if they’re sure. When they are, ask the One Who Knows to reveal the sheet and tell everyone if the class is right.
Celebrate the win, and thank the One Who Knows for their help.
Now it’s time to play the game yourselves, but there will be some differences…
- This time, you won’t just be disguising 1 to 4, but the digits from 0 to 9. How might that be different?
- This time, instead of everyone and the One Who Knows, it will be two teams. Both teams will disguise the digits as letters, and both will take turns ask questions to figure out the other team’s disguises. How might that be different?
- This time, when you think you know the other team’s letters and numbers, you can ask them all by saying which letter is which number, but you’ll only get one chance. If you’re right for all of them, you’ll win. If you’re wrong for any of them, you’ll lose. How might that be different?
Split them into teams and put the teams into pairs by whatever method you like best.
Give them the Digit Disguises game board handout.
Remind them of the rules.
- Take turns asking questions.
- Your questions are calculations with two different letters.
- Respond with the letter that the answer is disguised as, or say “not a letter”.
- Nobody ever says a number …
- …until the very last turn when you guess them all.
- Feel free to write down whatever you want as you go to help you figure it out.
- The handout for the game has all the instructions if you forget.
Let them play and circulate to hear their awesome thinking and point it out.
Stop them after everyone has had a few turns but before anyone has finished the game to have a discussion about how the game is going so far, and to highlight some great thinking you’ve seen. It’s important to remember you don’t have to wait until the end of the game to get some good learning out!
Whenever you choose to end, have a debrief. You may want to ask the class if they have any questions about the game they want to investigate or discuss. Some possible questions are listed in the original Digit Disguises blog post. But basically it’s up to you and your class what you do with the game now.

I hope this is helpful to you. It will certainly help future me when I next introduce Digit Disguises to a group.

PS: Special thanks go to the participants in my 2021 MASA conference session who lived through the first version of this, and to Michaela Epstein, who I discussed it with while originally planning that conference session.

Why mathematical induction is hard

David Butler — Tue, 06 Jun 2023 00:01:04 +0000

Students find mathematical induction hard, and there is a complex interplay of reasons why. Some years ago I wrote an answer on the Maths Education Stack Exchange describing these and it’s still something I come back to regularly. I’ve decided to post it here too.

Some reasons why students find mathematical induction difficult.

These come from a combination of reading various research articles and my own experience helping students in the Maths Learning Centre.

Many students don’t know what proof is.
Many students don’t realise it’s actually about statements.
Many students don’t have experience in manipulating inequalities and divisibilities.
Many students are philosophically uncomfortable with it.
Many students find analogies unhelpful.
Many students don’t know what to focus on to come up with something to prove.

Below I give further explanation of each point.

1. Many students don’t know what proof is

For many students, the problem with induction proofs is wrapped up in their general problem with proofs: they just don’t know what a proof is or why you need one.

Most students starting out in formal maths understand that a proof convinces someone that something is true, but they use the same reasoning that convinces them that everyday things are true: empirical reasoning. That is, in everyday life they can use several pertinent examples to convince themselves something is true, so they do the same with maths. If you ask them to prove that, say, all prime numbers beyond 3 are one more or one less than a multiple of 6, they’ll say “7 = 6+1, 11 = 12-1, 13 = 12+1, 17 = 18-1, so yes it’s true”. They transfer this approach to mathematical induction too [2].

After some experience with maths teachers, students become aware that this is unacceptable. For many it is literally unacceptable in the sense that “my maths teacher doesn’t accept this sort of thing”. So, they learn to apply other methodologies, such as manipulating symbols [2], and think that now this constitutes proof, because it is what is convincing to their teachers. They will attempt to apply symbol manipulations to an induction problem, which often won’t work, because the logical structure of induction is different to their previous experience.

It’s probably a good idea to get the students to discuss what constitutes a proof, and bring out ideas of why it’s necessary to use certain ways of arguing in maths.

2. Many students don’t realise it’s actually about statements.

You can define mathematical induction as being sure the statement “true for n=1” is the truth, and being able to transform the statement of “true for n=k” into the statement “true for n=k+1”. As such, it’s actually something you do to statements, rather than objects or numbers per se. That means it’s much closer to propositional calculus than, say, a geometry or algebra proof. If your students have learned any propositional calculus, this has a chance of help them [1], (though in my experience, sometimes it gets in the way).

In any case, knowing it’s about statements helps them to structure their proof appropriately because they know it hangs upon certain statements in certain places. This is something many students don’t realise about proofs in general, and I find it really helps them to focus their attention when they prove almost anything.

3. Many students don’t have experience in manipulating inequalities and divisibilities

Whether students are attacking induction proofs operationally or with an understanding of how the idea actually works, they often don’t have the operational skill with the types of algebraic manipulations required [3]. Standard fare for induction proofs are inequalities and divisibilities, and most students I meet simply don’t know how to reason with these, even if they’ve done the highest level of maths at school.

For example, consider a sequence of working like this:
(n+1)²
=n²+2n+1
>2n+2n+1
≥2n+6+1, since n≥3
>2n+2
=2(n+1)
Most students don’t understand that each equality/inequality symbol only refers to the step immediately before it, so that the final statement is actually (n+1)²>2(n+1). They also don’t realise you can replace the “6+1” with “2” because you want to make a statement that one number is bigger than another. Moreover, they don’t have well-developed instincts for what they should choose to do next to advance towards the goal.

Divisibility proofs generally involve rewriting things so that they have the divisor out the front, both for the k and the k+1 case. Many students don’t realise this is what divisibility means, and also have trouble seeing how to split up the expression to sub in the induction hypothesis.

If you show any examples of doing a proof by induction in these situations, you’re going to have to be extremely explicit about your ordinary algebraic reasoning at every step, and how you made the decision of what to do on every line, so that you can help them develop these skills.

4. Many students are philosophically uncomfortable with it

Some students, no matter how many analogies of dominoes or ladders you describe, will still just feel that it’s somehow all a bit too unbelievable. You start by assuming it’s true, and prove something with an unknown value of n, and then somehow the thing is proved for all values of n? It seems a bit too much like Baron Munchausen pulling himself up by his own hair. A common phrase the students utter is “it feels like magic”. They can’t even consider giving it a go with an actual proof assignment question when they feel so philosophically uncomfortable with it.

Partly this is due to them not understanding that you actually aren’t assuming it’s true for any specific value of n at all. You’re doing a thought experiment of what would happen if it was true for n=k [3]. I use the phrase “suppose true when n=k” rather than “assume true when n=k” to emphasise this.

Even if they do realise this, some students still want to somehow prove to themselves that it’s ok to just say “therefore it’s true for all n”. A mathematically mature student will often understand things by proving to themselves that they work (such as deriving the quadratic formula). However, you can’t derive mathematical induction. It’s an axiom. Telling them this removes the burden of proof and they can feel slightly better about using it. Alternatively, you could prove it from the well-ordering of the natural numbers, and that might make them feel better if they think the well-ordering is a more obvious sort of axiom to use.

5. Many students find analogies unhelpful

Everyone always advocates using the domino analogy for teaching induction, but there are problems with this. One is that some students simply will have never seen dominoes set up in a line to knock down! If you’re going to use this analogy, bring actual dominoes!

A second problem is that if you are not very specific about exactly what in the domino analogy corresponds to what in the induction proof, students will develop unhelpful ideas about induction. This of course goes for any analogy you might use, such as climbing a ladder or a game of whispers [4].

Worse than this is the problem that no amount of very salient analogies will actually help them to attack an actual problem when faced with it [5]. Analogies are really only good for helping them understand how induction works in a general way. You’re going to have to give them examples and experiences of the act of coming up with a proof because that is a separate skill from describing how induction works in a general way.

6. Many students don’t know what to focus on to come up with something to prove

Students can actually become quite successful in solving your standard identity, inequality and divisibility induction proofs. But anything other than this leaves them completely stumped. Mostly this is because they have learned some standard classes of problems and how to deal with them. (Many students actually think this is what mathematics is — a list of problems and how to solve each one!)

When faced with something entirely different (eg the Tower of Hanoi, or a visual pattern), they don’t know what to focus on. First, you need to help them find where the natural number is that describes the possible situations [3]. Induction only works on natural numbers, and is the gold standard of proving things for all natural numbers, so you really need to look for a natural number! But it has to describe the situation well.

After this, some authors suggest getting them to focus on a process pattern generalisation [2,3]. That is, figure out how you would move from size 1 to size 2, from size 2 to size 3 etc, and see if you can figure out if the process of moving from one size to the next follows a pattern. This pattern is the shape of your proof of “true for n=k implies true for n=k+1”.

An unhelpful thing to focus on is verifying that the formula works for each number in turn. This is just reinforcing their empirical reasoning, which won’t help them figure out how to prove! [2]

References:

[1] Dubinsky, E. (1986) Teaching Mathematical Induction I, Journal of Mathematical Behavior, 5, 305-317

[2] Harel, G. (2002) The Development of Mathematical Induction as a Proof Scheme: A DNR-Based Instruction, in Campbell, S. & Zazkis, R. (Eds.) Learning and Teaching Number Theory: Research in Cognition and Instruction, Greenwood Publishing Group

[3] Palla, M., Potari, D. & Spyrou, P. (2012) Secondary school students’ understanding of mathematical induction: structural characteristics and the process of proof construction, International Journal of Science and Mathematics Education, 10, 1023-1045

[4] Ron, G. & Dreyfus, T. (2004) The use of models in teaching proof by mathematical induction, in Proceedings of the 28th Conference of the International Group for the Psychology of Mathematics Education, 113-120

[5] Segal, J. (1998) Learners’ difficulties with induction proofs, Journal of Mathematical Education in Science and technology, 29, 159-177

Space to enter

David Butler — Wed, 01 Mar 2023 21:26:46 +0000

This is a photo of the entrance to my Maths Learning Centre. What do you notice?

There are many many things to notice in that photo, and if you ever want to ask me about any of them, please do. Today, the thing I want to focus your attention on is the empty space right at the front as you walk in. Every so often someone asks me why I leave that space empty and I don’t put an extra table there, and there are a couple of very good reasons why.

First, the space isn’t empty: it has the floor graph in it. The floor graph used to be at the back between the % and the 3 on the wall, but one day I realised that I could have a bit of extra space for it if I put it in the entrance. I also hoped it might send the message to people arriving not only that maths is a thing that happens here, but also that we do things a little differently to your regular university maths classroom. One day I will write about the floor graph to tell you all about how we use it, but there is one purpose I want to tell you today: the floor graph helps us to break students out of staring forlornly at their page or screen. The open space on the floor graph gives a sense of physical freedom, which can translate to a sense of mental freedom.

The openness of the floor graph space was the main reason I moved it to the entrance, actually, because it makes the space easier to enter, for several different kinds of students:

We regularly get tours of new students or prospective students come past the MLC, and with an open space in the entrance, we can bring those tour groups right into the MLC, rather than standing outside and pointing. The experience is so much realer if you can come right in and stand surrounded by the art and whiteboards. They can remember that we asked them to come all the way in. Without the space in the entrance, we’re just pointing from outside and they miss so much.

Students who are familiar with the MLC stand on the floor graph for a moment when they arrive and look around to find a good table, or other students they recognise. The empty space allows them to take a moment to make a choice, and to prepare themselves for working in the space.

Students who are not familiar with the MLC have a place to stand looking lost. (We tutors even call it the “lost soul zone”.) When there was a table in the entrance, people wouldn’t want to be too close to the students already studying there, so newcomers would do their unsure dithering stance outside where we couldn’t see them, and more often than not they would just leave without us ever knowing. The emptiness of the space now means that they can come in without feeling like they’re encroaching on the work of the students and staff already in the room. Just like our regulars, they can prepare themselves for asking for help while actually being in the room. And since they’re in the room, we can see them and go up to them to ask what they are looking for.

Without an empty space in the entrance, we would not be able to welcome as many new students to the MLC as we do. The emptiness is important to provide space for the complex process of deciding whether and how to engage with us. I am so happy I managed to created the space to enter.

Book Reading: You’re Not Listening

David Butler — Mon, 23 Jan 2023 00:07:55 +0000

This blog post is about the book You’re Not Listening by Kate Murphy, and in particular my reactions to it from a teacher’s perspective.

First, I want to apologise to Chelsea Avard for borrowing the book from her little student leadership library and holding onto it for a whole year while I got round to reading it and then got round to writing about it. Thanks for your patience and thanks in advance for forgiving me for the slightly battered state the book will be when I return it.

Second, I want to say what an excellent book it is. It has a lot to say about what listening is and is not and how it makes a difference to us in everyday life. It’s full of vivid stories to illustrate and lovely turns of phrase, and it is very clear that Murphy clearly researched it extensively. I would recommend anyone to read it.

Ok, now on to the actual purpose of the blog post. The book is not a teaching book, and it doesn’t even mention teachers as far as I recall, but I am a teacher and I can’t help reading it from that perspective. As a teacher who mostly works one-on-one with students, what Murphy had to tell me felt particularly relevant. And yet I’ve had trouble organising it into a cohesive blog post. I have decided to give up and just list some things I learned. Most of them are about teaching. Some of them are just life lessons. I may or may not distinguish them.

Thoughts from the book

Your brain thinks faster than people speak. Use it to listen better.

In Chapter 6, Murphy clearly describes exactly how conversations feel to me, which is that the person is talking, but my mind is buzzing all around thinking about all sorts of things. Almost all of those things are related to what the other person is saying — memories their stories brought up, how I feel about the way they’re talking, what I am going to say next — but only a small amount is actually about the other person’s meaning.

Murphy asserts this is totally normal because your brain thinks at least twice as fast as the other person speaks. Reading that, I was so glad that I wasn’t some sort of freak. However, she recognised that the problem with this normal state is that you can get wrapped up in your thoughts and miss what the other person is saying. Also, she points out that a lot of what people say isn’t contained in the words they speak, but in their silences and their body language and the things they look at.

What she suggests is to use all that extra mental bandwidth to listen better. Instead of letting your mind wander or worrying about what to say next, direct your attention to all those extra things that aren’t audible that might tell you more about what they mean.

This spoke to me powerfully as a teacher, particularly the bit about missing important things when you are planning what to say. In the MLC with a student, my mind can be very much occupied with what explanation or example I’m going to give the student in response to what they’re saying. But in worrying about this while they’re talking, I am missing important things. I’m missing what they’re telling me about how they feel about their maths. I’m missing the pauses that tell me when they’re figuring things out or struggling to articulate their thoughts. I could be a much more responsive teacher if I used my mental powers to think about their meaning while they are talking rather than think about mine.

You don’t have to respond right away.

The big problem in my line of work with the advice above to use your brain’s bandwidth to focus even more on listening and watching is that you really do feel like it’s your responsibility to respond immediately as soon as the student stops talking with useful explanations or advice. The desire to plan your explanation while they are still talking isn’t just your mind wandering, it’s motivated by a fear that you need to be immediately helpful and clever.

However, Murphy argues that actually silences are really important in conversation. Space between speakers allows everyone time to process. And most importantly, it’s a sign of respect for the speaker that the listener takes time to process. She relates stories of many different cultures around the world where they have an expectation of silence in conversation as a sign of respect. I know students have said to me they appreciate me telling them “just let me think about that for a minute” because then they know I’m working on something good.

I’ve heard this advice about silence before, for example in Making Number Talks Matter, and also in a recent UX Research Methods training. But not until reading Murphy’s book did I connect it to the idea of giving yourself the space to really listen in the first place. Knowing that it’s ok and actually respectful to think quietly before responding to students means that you are free from the expectation to have something ready and you can actually focus on listening interpretively. It has taken a huge worry off my shoulders as a teacher that I wasn’t even aware I was worrying about before.

Assumptions stop you listening.

Murphy has a whole chapter titled “I know what you’re going to say: assumptions as earplugs”, and that title really does sum it up extremely well. It just shows her skill as a writer that she can pack so much meaning into just a title.

The idea is that when you assume things about people, it stops you listening to them clearly, and sometimes stops you listening at all. If you literally think you know what someone will say, then you think you have no need to hear it.

The most shocking thing Murphy does is explain how finishing someone else’s sentences is proof that you’re not actually listening. Yes two people being able to finish each other’s sentences is often used as proof that two people are in sync, but she related stories from marriage counselors who noted that couples who are really familiar with each other just assume the other person doesn’t need to be told things or doesn’t need to tell them things and so people never find out important stuff.

This idea that assumptions stop you listening really gave me pause, because I have seen it first-hand in my own work. A student asks me a question about Question 3 on their assignment, and I assume they are struggling in the same place I would struggle, so I tell them how to deal with that part, even though it might not be the place they need help at all. A student is studying engineering, so I assume they don’t want to know the theory behind the maths and I only tell them procedures, even though they might actually want to make sense of it. A student asks a question about the topic at hand and I just respond to the key words and the question I assume they probably have, but their actual question is about quite a different thing. A student is slow to answer my question, so I assume they don’t know and I answer for them, but actually they are just taking time to figure out how to say what they want to say.

There are so many times when I assume what a student wants or needs rather than listening to their actual words and demeanour or seeking more information. I really need to turn off my assumptions and find out more about the actual situation before responding.

People won’t tell you things if they don’t think you want to hear them.

There were a couple of quotes close together that really spoke deeply to me.

“Researchers at the University of Utah found that when talking to inattentive listeners, speakers remembered less information and were less articulate in the information they conveyed. Conversely, they found that attentive listeners elicited more information, relevant detail, and elaboration from speakers, even when the listeners didn’t ask any questions. So if you’re barely listening to someone because you think that person is boring or not worth your time, you will actually make it so.”

“Think of how you, yourself, might tell different people different things. It doesn’t necessarily have to do with the type of relationship you have with them or the degree of closeness. You might have once told a stranger something you hadn’t told anyone else. What you tell, and how much you tell, depends on how you perceive the listener at that moment. And if someone is listening superficially, listening to find fault, or only listening to jump in with an opinion, then you’re unlikely to make any kind of meaningful disclosure and vice versa.”

The things that spoke most deeply to me were the things that can derail the students I work with saying anything useful:

Not worth my time
Phew this one is hard. All those second- and third-year students who I have said are second and third priority after first-year students, what does knowing they’re not my priority do for them sharing things that are important? But they still have deeply important things about studying and learning maths to discuss, even if I don’t know their content.
Listening superficially
I can do this so easily. I can wait until they say a key word and then launch into an explanation in response, and totally ignore the things they’re saying about how they feel about maths and their experiences studying it, or just ignore how what they say tells me about how much they already understand.
Listening to find fault
How many students’ experience of maths in the past is that someone is waiting for them to be wrong? Knowing this, how might they shut themselves up if the first experience with me is telling them something they did wrong?
Listening to jump in with an opinion
Phew this is so easy to do as a teacher. They’ve come for help, so I feel the need to provide it as soon as possible, so I wait until they say something I could comment on and make the comment. But they may not want a comment on that thing and they are unlikely to give me what they really want because they know I’m not really listening.

I believe that to be the best teacher and support worker I can be, I need to know the students’ context and know what they already know, but if I really do believe that, then I need to get out of the way so that they will actually tell me what I need to know in order to help them.

Other people’s thoughts really are more interesting than anything you have pre-prepared to say.

The previous section was about how the way you (don’t) listen affects how much people tell you, and I made it all about how I need information to help people the best I can. Except the thing Is, I think people would prefer someone who actually was interested in what they have to say for its own sake, rather than just as a means to an end. I have been working on fostering the belief that all students have existing thoughts worth listening to, and this book confirmed how true and important this belief is.

This quote says it clearly:

The most valuable lesson I’ve learned as a journalist is that everybody is interesting if you ask the right questions. If someone is dull or uninteresting, it’s on you.

Also it’s just a vibe running through the whole book: it’s not just that it’s good to listen to people, but that people are worth listening to. Listening can be hard work, but it is worth the work, because everyone has something interesting to say, something you can learn from.

It’s not about you: shifting versus supporting.

Murphy spends a lot of time talking about how much time we spend talking and not listening, even when we think we’re listening. She describes two main types of response a listener makes: the shift response and the support response. When you make a shift response, you shift the focus to yourself; when you make a support response, you keep the focus on the other person, supporting them to continue and to share what’s important to them. This is one of her examples:

Sue: I watched this really good documentary about turtles last night.
Bob: I’m not big on documentaries. I’m more of an action-film kind of guy. (shift response)

Sue: I watched this really good documentary about turtles last night.
Bob: Turtles? How did you happen to see that? Are you into turtles? (support response)

I have to say I am repeatedly guilty of the shift response. Just now I responded to someone telling me about something that happened to them with a story about my own experience, when I could have so easily asked them more about theirs. Not to say you should never talk about your own story in a conversation, since of course a conversation moves both ways, but you miss out on so much if you do it too early.

Support responses tend to be questions, where you seek more information from the speaker, but there are lots of questions that are still shift responses. There are ones where you set up what you want them to say such as, “Wouldn’t you agree that …?”, and the ones which really just describe your own thoughts or shift the conversation to an entirely new topic that was on their mind already. I found it appropriate that the example of this cited in the book was from an academic at a conference.

In my work in the MLC, I am guilty of shift responses so often, especially when the conversation turns to study skills or experience in classes. I always end up just saying what my experience is, when I could learn so much more by finding out about the student’s. You could argue that the students need to know they’re not alone in their frustration, and you could argue that the might appreciate knowing how someone else dealt with similar situations. However, listening just a little longer will also let them know their feelings are valid, and will tell you whether they need advice at all or just want to vent. (Not to mention that my experience as a student is now 20 years old, and might not be as relevant as I feel it is.)

Also I am suddenly reminded of the distinction between focusing and funneling questions: a focusing question while a student is doing problem-solving helps a them student focus on the relevant details at hand so they can use it in their problem-solving, whereas a funneling question pushes them towards a path you the teacher have in your head. This is very similar to the concept of the shift and the support response. Both shift responses and funneling responses are about you, whereas support responses and focusing responses are about them.

The listening itself is what helps people.

The final thing that stood out to me is that people don’t usually want or need your advice. The very act of listening supportively helps people to achieve clarity and sort out what they want to do. Here’s a useful couple of quotes:

Being aware of someone’s troubles does not mean you need to fix them. People usually aren’t looking for solutions from you anyway; they just want a sounding board. Moreover, you shut people down when you start telling them what they should do or how they should feel. … Your answer to someone else’s deepest difficulties merely reflects what you would do if you were that person, which you are not.

The solutions to problems are often already within people, and just by listening, you help them access how best to handle things, now and also in the future. … If you jump in to fix, advise, correct, or distract, you are communicating that the other person doesn’t have the ability to handle the situation.

This is a really hard thing to hear because we are so used to providing advice as the way to help people. It’s particularly hard to hear for someone in my line of work, where people are literally talking to me because they do actually want help. But if I really do believe that people have thoughts of their own and I really do believe that all people are capable of figuring stuff out, then the best way to show that belief is to listen to them. And if I’m honest, whenever I’ve let them, they really do surprise both me and themselves with what they figure out on their own.

Conclusion

This book was full of such interesting and compelling stuff. I’d recommend it to anyone to read. I think I have listening on my mind even more than before after reading it, both in my life and in my work in the MLC. To sum up, here are my titles from above again, to hold onto for the future:

Your brain thinks faster than people speak. Use it to listen better.
You don’t have to respond right away.
Assumptions stop you listening.
People won’t tell you things if they don’t think you want to hear them.
Other people’s thoughts really are more interesting than anything you have pre-prepared to say.
It’s not about you: shifting versus supporting.
The listening itself is what helps people.

Seeing them all together really highlights to me how much they interact with each other, and how much they all hang off the idea that other people have a lot to say that is worth hearing. Thanks for reading.

Four levels of listening

David Butler — Thu, 22 Sep 2022 22:38:37 +0000

Introduction

Listening is one of the most important aspects — no, scratch that — the most important aspect of my work in the Maths Learning Centre.

It is not obvious to people starting out tutoring in the MLC that this should be the case. To a beginning tutor, it seems that it’s their job to explain things to the students, and to show them how to do stuff. But even if the actual goal was to explain, you can be much surer which explanation to give the student if you first listen to their current understanding. More importantly, you can never improve as a teacher unless at some point you listen to the students to see how well your explanation has gone.

But how do you go about doing the business of listening? This blog post is about my interpretation of a framework that describes different levels of listening for the purposes of teaching, which I read about in two papers:

1: Davis, B (1997) Listening for Differences: An Evolving Conception of Mathematics Teaching, Journal of Research in Mathematics Education, 28, 255-376

2. Yackel, E, Stephan, M, Rasmussen C and Underwood, D (2003) Didactising: Continuing the work of Leen Streefland, Educational Studies in Mathematics, 54, 101-126

I spoke at a conference about this framework some years ago, and I have been meaning to write about it ever since. I am finally actually writing about it now (and you are reading it). My thinking has evolved a little since then, so you get the updated and extended version.

The papers

Davis 1997

In the first paper, Davis tells us about how he and schoolteacher Wendy reflected on the types of listening Wendy did in her classroom, and how they were related to her beliefs about what mathematics is and what the teacher’s role is in helping students learn it. It is a truly fascinating and powerful paper and I recommend everyone read it.

Davis notes from previous research that “the quality of student articulations seemed to be as closely related to teachers’ modes of attending as to their teaching styles”, which is a very deep observation. Before, I said that at the very least a teacher needs to listen in order to figure out what to do next, but this says even more that the way you listen may change the very things the students say. Davis goes on to give three vignettes from Wendy’s classroom to display three types of listening.

1. Evaluative listening

When a teacher is listening evaluatively, their reason for listening is to evaluate the correctness of what the student is saying. Ultimately, they are “listening for something in particular, rather than listening to the speaker”. The vignette describes a whole-class discussion where student responses are dismissed until the exact right one was finally accepted. Even right responses that were perceived to be in the wrong form were dismissed. This reminded me of so many times when I had been a frustrated student in such class discussions (and several times I had been a teacher leading one).

I had never quite put my finger on why this felt frustrating until reading this quote from the paper: “No one is attending to the answer in a way that will make a difference to the course of subsequent events”. The teacher in such discussions is waiting for the right response in order to continue on their pre-planned course. As Davis says of Wendy’s vignette, it was “a teaching sequence that seemed impervious to student input”. It sounds harsh, but Davis was more forgiving than that. He noted that Wendy was indeed seeking information from the students. She could see that the students were or were not able to give the responses she hoped for, and she could see that her lesson was more or less successful based on how quickly students could use her explanations to produce the right answers. The listening was doing exactly what she wanted it to do: evaluating.

2. Interpretive Listening

When a teacher is listening interpretively, their reason for listening is to interpret what ideas are actually happening inside the student’s minds. They are still usually seeking to bring students to the understanding they perceive as the correct one, but now they do it through figuring out how to talk about ideas in shared ways that move students forwards in their thinking.

Davis notes that teaching sequences with an interpretive listening stance need to have materials that “serve as a commonplace for learners to talk about ideas, enabling the process of re-presentation and revision”. For example, in the vignette, Wendy used two-coloured chips to help her students talk about adding and subtracting negative numbers. In my own teaching in the MLC, drawings or play dough often play this role.

3. Hermeneutic Listening

When a teacher is listening hermeneutically, they are listening not only to interpret what their students are thinking, but also to understand how their own thinking relates to that, and how the group as a whole understands. This is my description of it, anyway. Davis has several long paragraphs discussing philosophical and theoretical standpoints, which are a bit heavy (though his style makes it much lighter than I’m sure it could have been). The two main takeaways for me are that understanding isn’t only something that lives in one person, but lives in the shared communication of many, and that teachers listen not just to help students grow their understanding but also to change the teacher’s own understanding. A relevant quote: “Instead of seeking to prod learners toward particular predetermined understandings, Wendy seems to have engaged, along with her students, in the process of revising her own knowledge of mathematics.”

Davis notes that this type of listening seems to go hand in hand with a teacher’s conception of what mathematics itself is. You have to be prepared to believe that mathematics concepts have multiple valid ways to understand and describe them, and that mathematics is at least in part a construction of a community, all of whom (including novices) have a part to play in the construction. Otherwise you won’t be ready to listen in this way.

A final comment on the terminology… The word “hermeneutic”, no matter how often I look up definitions, still remains more-or-less meaningless to me. It seems to refer to a type of inquiry that in itself seems difficult to describe and has different meanings in different disciplines, so I can’t borrow meaning from whatever it means elsewhere to make it meaningful in this context, like Davis seems to have done for himself. This makes it hard for me to hold onto the framework.

Yackel et al 2003

In the second paper, the authors are thinking about how teachers structure and restructure their instruction, a process they call “didactising” after Leen Streefland. The reason the paper is here in a post about listening is that the way that you get information about what needs reworking in your instruction is to listen to the students.

I have used the word instruction, as opposed to teaching, because it’s the word the authors used. And they really do seem to be thinking about instruction, in the sense of a sequence of explanations and activities you do with students. The main theme of the paper is about how listening to students helps design these instructional sequences, which I do not question the importance of. It’s just that the overall feeling I get is that students aren’t quite real people but sources of data, and that making good instructional sequences is a good in an of itself, as opposed to being something for the students. I’ve been a bit too dramatic there, and it’s not really as bad as I’ve made it sound, but still my feeling is that it dances a little too far from viewing the students as people.

Anyway, the most useful thing in the paper for me was a new terminology for what Davis called hermeneutic listening; these authors call it “generative listening”.

3. Generative listening

These authors decide to use the word generative rather than hermeneutic because it’s easier to process for their purposes. They say, “Listening in this way can generate or transform one’s own mathematical understandings and it can generate a new space of instructional activities.” While Davis was more focused on the way that hermeneutic listening changes the listener and the community’s understandings, these authors are more focused on the way generative listening generates new instructional activities. I’m happy to have both in my life. I think it’s important to recognise that teachers still have to decide what to do each day and that listening can help them make those decisions!

I’ll finish off with three questions the authors list to help people focus on generative listening, which really do bring it back to the students as people at the last moment: “How does student thinking suggest alternative ways of thinking about particular mathematical ideas? How does student thinking suggest what mathematical ideas are experientially real for them? How can the instructional sequence be redesigned to capitalize on the fresh points of view that students offer?”

Some thoughts

From these two papers, we have a framework with three levels of listening: evaluative, interpretive, and generative. The authors of those papers focused a lot on the mindset of the teacher, and how this makes a difference to how you attend to what the students are doing and saying. Davis talks a bit about the kinds of questions people ask when they have those mindsets. But it occurred to me that even if you have a particular mindset, if you ask the wrong questions, you still won’t get the information you need. So yes the kind of question you ask is evidence for the kind of listening you hope to do, but also the kind of question you ask can also dictate the kind of listening you have to do, because you will only get certain kinds of responses.

For example, if you ask a yes-or-no question (eg Is this a subspace or not?) or a direct question about factual information (eg What is the definition of subspace?), you are unlikely to get much information about what a student is thinking, even if that’s what you hoped for. You will have no choice but to simply evaluate their response.

And there is one question that is famous for giving you no choice for what to listen to: “Does that make sense?” If you ask this of a whole class, students will usually give no response at all. If you ask it of a single student, they’ll say “yeah ok”. So basically it tells you nothing at all: to ask this question is to give no opportunity for you to listen. So actually there is another lower level of listening: not listening.

And if we’re talking about not listening, then there is something worse than asking “Does that make sense?”, which at least shows you think things ought to make sense, and theoretically has a chance of a student saying “no” and so giving you some information to work with. What’s worse is asking no questions of any kind. It is amazing how often a maths teacher, even one-on-one, will speak continuously for half an hour with no opportunity for the student to say anything. I always feel such a sense of shock and shame when I realise I’ve done this and that I have absolutely no idea how the student is going.

I think sometimes the impulse to talk continuously comes from a belief that it’s your main job to provide explanations, and sometimes it comes from believing in the power of an explanation you’ve worked hard to perfect. However, even if maths teaching were transmission, that process can’t possibly be perfect, and so you really do need to check in every so often! As Davis says, “Implicit in the act of questioning is a certain lack of faith in the transmission process.” I think everyone needs to have that certain lack of faith.

So it’s good not to have total faith in the power of a single explanation. But what should you have faith in? I think you need to have faith that students actually are thinking. Implicit in the interpretive listening stance is the assumption that there is something to listen to. You have to believe that students have ideas if you seek to interpret them. If you don’t believe they do have ideas already, then of course you don’t seek to listen to them. For me, this is a huge part of working in the MLC that changes the whole approach. The next level above this is to believe that students have ideas that can change your own, which is where generative listening lives.

My version of the framework

So, finally, this is my interpretation of the listening framework of Davis (with the third level renamed by Yackel et al). There are a lot more aspects to this, such as the nature of the teacher’s role, but this version helps me think about what I am doing with students on the fly. You can download a handout PDF version of the framework if you want.

LEVEL 0: NOT LISTENING

Goal:

Tell what the teacher thinks is important
Give clear explanations

Types of questions:

Not asking questions
“Does that make sense?”

Beliefs:

Faith in the power of the teacher’s explanation
Students are waiting for your ideas

LEVEL 1: EVALUATIVE LISTENING

Goal:

Judge student responses against a standard
Get a specific response so you can continue the plan

Types of questions:

Yes/no questions
Direct questions about raw information
Results of calculations

Beliefs:

The teacher’s explanation is not perfect
Students are waiting for your ideas

LEVEL 2: INTERPRETIVE LISTENING

Goal:

Decipher the sense that students are making
Understand student thinking
Create a shared language to describe thinking

Types of questions:

Open-ended questions about thinking or process

Beliefs:

Students are reasoning
Student ideas are worth listening to

LEVEL 3: GENERATIVE LISTENING

Goal:

Jointly explore ideas
Discover new ways to think about or to learn concepts

Types of questions:

Open-ended questions about thinking or process
What-if questions and I-wonder questions

Beliefs:

Students are reasoning
Student ideas are worth listening to
Student ideas can change yours

Final thoughts

I have deliberately numbered the types of listening and called them levels, because I wanted to explicitly say to myself that some are higher than others. However, I don’t want to say that you should never seek to provide clear explanations and never listen evaluatively. Of course you should explain things when you need to, and of course there are times when you need to know students can do things in a standard way. And I also don’t want to say you should spend all your time listening generatively. That would be exhausting for everyone. It’s just that the types of listening definitely do progress in how student ideas shape what happens, and it is definitely a good thing for students to feel that what they think and do makes a difference to the outcome.

What I want is to always be open to the opportunity of finding out how students think and possibly having it change the way I think. I also know that while beliefs definitely guide actions, it also works the other way too. If I spend all my time talking, I may come to believe implicitly that the students have nothing to say. If I spend all my time evaluating against a standard, I may come to believe implicitly that the students have nothing wonderful to say. I need to actively work in opportunities to listen at the higher levels, so that I never go too long without them.

In daily work, where I spend most of my time one-on-one with students, this is even more important. Because when you’re right there next to the student, what a waste it would be to never hear the wonderful things they have to share, or to never make something wonderful together.

Other(ing) Explanations

David Butler — Fri, 09 Sep 2022 04:44:23 +0000

Most people who teach mathematics are aware that it’s useful to have alternative explanations for concepts, and useful to have different ways to approach problems. Given enough time, you are guaranteed to come across students for whom the standard explanation isn’t working today (as long as you give students a chance to tell you about their understanding).

Having worked with thousands of students one-on-one, I have tried quite a few alternative explanations and methods for many things. Sometimes they’re whole different approaches; sometimes they’re just little tweaks. Sometimes they are just a different order of the sentences you might otherwise say; sometimes they use physical manipulatives like the floor graph or play dough. Many teachers, like me, have such a bank of alternatives.

The problem is… Well, you can see it already in the way I’ve talked about these explanations: I have called them “alternative”, as opposed to “standard”. They are different, unusual, other. And the students know they are. A student who always has to have the other explanation can come to feel that they themselves are other.

A prime example of this is when the “dumb class” use physical toys to learn, whereas the “smart class” only uses symbols. (I use “dumb class” and “smart class” because that’s what the kids call them. Don’t fool yourself into believing that they don’t.) If you set up this sort of dichotomy, then any child who ever has to use the physical tool to help them understand knows they are stupid.

Another example is when mathematicians do not provide pictures when showing how to work out problems, and only provide them when someone doesn’t understand the text version. Students come to think that pictures are only for the “dumb kids” who aren’t capable of understanding the text alone, and they try to avoid drawing them, even if they could solve a problem ten times faster with one.

Obviously if the first explanation you try doesn’t help a student, then you do need to try another one – I never want people to stop providing alternatives!

But perhaps the explanation you use as the standard one doesn’t have to be the standard. Perhaps the other one you usually save for second might work as the first explanation for all the people the standard one works for, and also a few more. Each new explanation needs a bit of consideration to decide if maybe it can supplant the one you usually use first. At the very least, when you hear or think of an alternative explanation, don’t say, “I will keep that in mind for my struggling kids.”

Even better, perhaps we should more often just provide more than one explanation to begin with, rather than just one. No explanation can possibly work for all possible students, and even the “smart kids” will benefit from having more than one way to think about something. So maybe we can avoid othering people by simply giving more options from the outset. For example, to stop students feeling like they’re a “dumb kid” when you draw pictures, you can just draw pictures for everyone a lot more of the time.

So please, do seek out and try other explanations, but make sure you are careful for them not to become othering explanations.

Arbitrary mnemonics

David Butler — Thu, 01 Sep 2022 22:45:42 +0000

A mnemonic is a mental trick to help you remember things. People use them all the time for all sorts of things, like the traditional colours of the rainbow (ROY G BIV), the order of the letters in the English alphabet (a song to the tune of Twinkle Twinkle Little Star), the order of operations (BODMAS or PEMDAS), which months have 31 days (“30 days hath September…” or your knuckles), and which kind of camel has one or two humps (Dromedary starts with D which has one hump; Bactrian starts with B which has two humps).

The purpose of a mnemonic is to connect something that is hard to remember to something that is easier to remember. If you can remember the mnemonic and the connection, then you can remember the thing. They are especially useful for things that are arbitrary, where there is no obvious or no particular reason why they are the way they are (such as the number of days in each month).

However, there are a lot of things that most people don’t need mnemonics to remember, and it seems to me they tend to be the things that make sense to them — things that are already connected to other things in an obvious or natural way. Indeed, the very connectedness of things to each other is what causes the sensation of understanding. You feel you understand things when they are highly connected to other things, and you often don’t have to try to remember things that you understand.

So, a mnemonic helps you remember arbitrary things, and un-arbitrary things often don’t need much assistance to remember because they make sense.

What happens if you advocate that learners use a mnemonic for something that is understandable? I think that it sends a signal to learners that the thing is arbitrary — because they know implicitly that arbitrary things are what mnemonics are for — and since it’s arbitrary, they shouldn’t attempt to understand it. So they don’t try. They just try to remember.

For example, to remember which of sin(.), cos(.) and tan(.) are positive for angles in which quadrants, many people use the mnemonic All Stops To Central (or something similar), to remember it’s all of them in Q1, only sin(.) in Q2, only tan(.) in Q3 and only cos(.) in Q4. But I have met so many learners who have not the slightest clue why this is the truth, and don’t even expect there to be a reason. The fact that it’s a mnemonic signals to them there is nothing to understand. On the other hand, when you remind them that sin(.) is the y-coordinate of the matching point on the unit circle, and the y-coordinate is positive in the top half of the circle, you can see the light go on and the sigh of relief that they don’t have to try to remember any more.

So my advice is just to be careful with mnemonics. I would recommend not introducing them too early. Help your learners try to make sense of things as much as they can, and when there are a few spots left that are arbitrary and they have trouble remembering them, then you can introduce a mnemonic to help remember. Otherwise, you may signal to them that what they are learning is arbitrary and they shouldn’t attempt to understand it.

The line joining two complex points using i-arrows

David Butler — Mon, 08 Aug 2022 03:33:19 +0000

Reminder about i-arrows

Nearly two weeks ago, I first wrote about the i-arrow visualisation of the points in the complex plane. Here’s a reminder of how they work:

Every point with complex coordinates is represented as an arrow (which I call an “i-arrow”) from one place to another on top of the Cartesian plane.

Real points are dots on the Cartesian plane, the same as they have always been.
The complex point (p+si,q+ti) is represented as an i-arrow, which is an arrow based at the point (p,q) and extends along the vector (s,t) to have its arrowhead at the point (p+s,q+t).

In the picture below, there are three examples of i-arrows.

The complex point (1+4i,2+i) has been drawn as an i-arrow. Its base is at the point (1,2) and its arrowhead is at the point (1+4,2+1)=(5,2).
The complex point (7,2i) has been drawn as an i-arrow. Its base is at the point (7,0) and its arrowhead is at the point (7,0+2)=(7,2).
The complex point (12-2i,3) has been drawn as an i-arrow. Its base is at the point (12,3) and its arrowhead is at the point (12-2,3)=(10,3).

I want to know where the line joining two complex points is

In an earlier blog post, I investigated the complex points on various kinds of lines. I came to a complete understanding of the complex points on a real line with an equation like ax+by=c for a, b, c real. (They are the i-arrows drawn from one point on the real line to the other.) I also came to a complete understanding of the complex points on an unreal line with real slope with an equation like ax+by=C for a, b real and C unreal. (They are the i-arrows drawn between two parallel real lines.) But I was still not sure about a lot of things to do with a line of unreal slope with an equation like Ax+By=C for at least one of A, B unreal and C any complex number.

The thing that really bothered me was this: given any two points in the complex plane, it should be possible to find the unique line that joins them. That is, I want to be able to take two points, and find the other points on the line that joins them. That’s how lines should work. I know I could find the equation of the line and use that equation to find new points, but that is really unsatisfying to me. I don’t just want to fiddle around with algebra, I want a geometrical way to find the other points. I want to take a pair of i-arrows, or an i-arrow and a real point, and be able to find all the other i-arrows on the complex line that joins them using some sort of construction.

Pairs of points on a line of real slope

In that first blog post about lines and the one after it, I had made some small headway on this problem. For some very special pairs of points, I can see all of the other complex points on the line joining them. They are the situations where the two points create a line of real slope, because lines of real slope already have very simple constructions for the i-arrows of their complex points.

A real point and an i-arrow aligned with it

The real line aligned with an i-arrow is the unique real line it is on, so if an i-arrow is pointing at or pointing away from a real point, then the two are both on that unique real line the i-arrow is on.

Two i-arrows pointing along the same line

If two i-arrows both lie along the same real line, then this is the only real line either of them is on and so it is the line that joins them.

Two i-arrows sharing a base

If two i-arrows share a base, then they are on an unreal line of real slope. Draw the line joining their arrowheads, and a line parallel to that through the shared base, and the i-arrows you want are all the arrows from the base line to the arrowhead line.

Two i-arrows sharing an arrowhead

If two i-arrows share an arrowhead, they are also on an unreal line of real slope. Draw the line joining the bases, and a line parallel to that through the shared arrowhead, and the i-arrows you want are all the arrows from the base line to the arrowhead line.

Two i-arrows where the line joining the bases is parallel to the line joining the arrowheads

The above two situations are actually special cases of a more general situation where the line joining the bases of the two i-arrows is parallel to the line joining their arrowheads. In that case, they must be on an unreal line with that slope. All the other points on that line are the i-arrows drawn from the base line to the arrowhead line.

A new way to work with complex points

What I really hope for is some sort of construction that tells me the complex points on the line of unreal slope joining any two points. Up until now, I’ve been able to take either two i-arrows, or a real point and an i-arrow, and use them to find some of the other points on the line joining them. (These were described in the second blog post about unreal lines.) This week, I suddenly realised that the reason I was only able to make some of the points was that I was only combining the points together in some of the possible ways, and this led to me trying a new way to work with the complex points.

At the top, I talk about the complex point (p+si,q+ti) having an i-arrow based at (p,q) and extending along the vector (s,t). There’s nothing stopping me, if I’m using complex number arithmetic, from literally writing the point using vector notation as (p,q)+(s,t)i. And if I do that, I could give those two parts names of their own. I’ll call (p,q) the point P and (s,t) the vector a, so that the complex point (p,q)+(s,t)i becomes P+ai. This allows me to do algebra in a new way.

In the regular real Cartesian plane, all the points on the line joining point P and point Q care of the form P+m(Q-P) for real numbers m. In this calculation, Q-P is the vector from P to Q, and m(Q-P) is the same vector but stretched or shrunk by the factor m, so that P+m(Q-P) is the point found by starting at P and moving some multiple of the journey from P to Q, to arrive at some point on the line defined by P and Q.

There is nothing stopping us doing the exact same thing with complex points.

Consider the complex points P+ai and Q+bi. Then all the complex points on the line joining them are given by P+ai + (m+μi)[(Q+bi) – (P+ai)] for some complex number m+μi. Doing some algebra on this point…

P+ai + (m+μi)[(Q+bi) – (P+ai)]
= P+ai + (m+μi)[Q+bi – P-ai]
= P+ai + (m+μi)[Q- P + bi –ai]
= P+ai + (m+μi)[(Q- P) + (b –a)i]
= P+ai + m(Q- P) + m(b –a)i +μ(Q- P)i – μ(b –a)
= P + m(Q-P) – μ(b –a) + [a + μ(Q- P) + m(b – a)]i

The i-arrow for this point has base at P +m(Q-P) – μ(b –a)
arrow a + μ(Q- P) + m(b – a)
and arrowhead at P + m(Q-P) – μ(b –a) + a + μ(Q- P) + m(b – a).

Notice how the new base and arrow are the old base and arrow for P+ai adjusted by some multiples of the vectors Q-P and b–a. This is already hinting at a geometrical way of figuring out new points on the line from the original two.

An i-arrow and a real point create parallel i-arrows

Suppose we have a real point R and an unreal point Q+bi. (That is, the point P+ai in the formula above is actually the real point R, so that P=R and a=0.)

Then the other points on the line become
R + m(Q-P) – μb + [μ(Q- R) + mb]i which has i-arrow with
base R + m(Q-R) – μb,
arrow μ(Q- R) + mb,
and arrowhead P + m(Q-R) – μb + μ(Q- R) + mb.

If μ=0, then this becomes
R + m(Q-R) + mbi, which has i-arrow with
base P + m(Q-R),
arrow mb,
and arrowhead R + m(Q-R) + mb.

The point R+m(Q-R) is a point somewhere on the line joining R and Q.
The vector mb is parallel to b.
The point R+m(Q-R)+mb = R+ m(Q+b – R) is a point somewhere on the line joining R and Q+b.
So to find a new point on this line, draw the line joining R and the base Q and choose a point on this line for the base of the i-arrow, then move parallel to b until you reach the line joining R and the arrowhead Q+b and this is your i-arrow.

This neatly confirms the previous work I did, but as I said before, they are only some of the complex points on this complex line. In fact, they are the ones formed by adding a real multiple of the journey from R to Q+bi onto R. What if we don’t use a real multiple?

An i-arrow and a real point create spirals and spiderwebs

In the situation above with real point R and unreal point Q+bi, all the complex points on the line joining them were given by
R + m(Q-R) – μb + [μ(Q- R) + mb]i which has i-arrow with
base R + m(Q-R) – μb,
arrow μ(Q- R) + mb,
and arrowhead R + m(Q-R) – μb + μ(Q- R) + mb.

Look what happens when μ=-1 and m=1, so that the number m+μi = 1-i. The complex point is:
R + (Q-R) +b + [-(Q- R) + b]i = Q + b + [R-Q + b]i, which has i-arrow with
base Q + b,
arrow R – Q + b,
and arrowhead R + 2b.

How ridiculously simple is THAT?!

If I take a real point R and a complex point Q+bi, then I can create a new complex point on this line whose base is at the arrowhead of the i-arrow I already have. And to figure out where the new arrowhead is, I just go twice the length of the i-arrow I already have starting at the real point.

But wait. I could repeat this process as long as I like to create spirals like those ones I made at the end of investigating lines in a finite plane. Only this time, there’s no cumbersome fiddling around with algebra. It’s a clean and simple geometric process. So gorgeous!

I can even do this process backwards to make the spiral go in the other direction. The line from R to the existing arrowhead is twice the previous arrow, so halve it and put that pointing at the base of the existing i-arrow. And so I can make a spiral going inwards.

At first, I thought this would give me all different directions for arrows, but actually it doesn’t seem like it does. It looks like every fourth arrow points in the same direction. I wonder if I can prove that?

The first arrow goes from Q to Q+b and is the vector b.
The next arrow goes from Q+b to R+2b and is the vector (R+2b) – (Q+b) = (R – Q) + b.
The next arrow goes from R+2b to R+2((R – Q) + b) = R + 2(R – Q) + 2b, and is the vector (R+2(R – Q) + 2b) – (R+2b) = 2(R – Q).
The next arrow goes from R + 2(R – Q) + 2b to R + 2(2(R – Q)) = R + 4(R-Q), and is the vector (R + 4(R-Q)) – (R + 2(R-Q) + 2b) = 2(R – Q) – 2b.
The next arrow goes from R + 4(R-Q) to R + 2(2(R – Q) – 2b) = R + 4(R-Q) – 4b, and is the vector (R + 4(R-Q) – 4b) – (R + 4(R-Q)) = 4b.

So yes, every fourth arrow is in the same direction, and not only that, is exactly four times the length. That was completely unexpected. How weird is it that every possible complex line does this?!

Anyway, I can combine the spirals with the propagation of parallel i-arrows to create a spiderweb starting with any i-arrow and a real point.

It’s still not every i-arrow on this complex line, but it sure is pretty!

But wait! Making the spiderweb is even simpler than making the spiral. According to the calculations I did earlier, given R and Q+bi, I can construct the complex point R+2b+2(R-Q)i. Even better, I can follow parallel i-arrows inwards to get the complex point R+b+(R-Q)i. To be absolutely sure this works, look at the original algebra for new points on the line joining R and Q+bi:

R + m(Q-R) – μb + [μ(Q- R) + mb]i which has i-arrow with
base R + m(Q-R) – μb,
arrow μ(Q- R) + mb,
and arrowhead R + m(Q-R) – μb + μ(Q- R) + mb.

If m=0 and μ=-1 (corresponding to the number m+μi being -i), then the point is
R +b – (Q- R)i = R+b+(R-Q)i which has i-arrow with base
base R+b,
arrow R-Q,
and arrowhead R+b+(R-Q).

This is a remarkably simple construction to produce that spiderweb from real point R and complex point Q+bi.

Copy vector b and place two of them one after the other starting at R. Draw the i-arrow from Q+b to this new point R+2b. This is a second i-arrow on the line.
Copy vector R-Q and place it at R+b. This is a third i-arrow on the line.
Copy vector R-Q and place two of them one after the other starting at R. Draw the i-arrow from R+b+(R-Q) to this new point R+2(R-Q). This is a fourth i-arrow on the line.
The lines through R and the bases of these four i-arrows define eight regions of the plane where the i-arrows are parallel to these four original i-arrows and have base on one line and arrowhead on the next.

A real point helps to find i-arrows based along an i-arrow

Even though the spiderweb of i-arrows shown above doesn’t tell you where every i-arrow is, we have at least covered the entire of the Cartesian plane with i-arrows. Given any point in the plane, even though we don’t have the i-arrow based at that point, we do have an i-arrow passing through it. I think this is enough information to find the i-arrow based there.

Let R be a real point and Q+bi be an unreal point. As calculated before, all the complex points on the line joining them are given by
R + m(Q-R) – μb + [μ(Q- R) + mb]i which has i-arrow with
base R + m(Q-R) – μb,
arrow μ(Q- R) + mb,
and arrowhead R + m(Q-R) – μb + μ(Q- R) + mb.

If I want the base P to be on the actual line segment that is the drawing of the i-arrow, that means the base has to be Q+ρb for some number ρ between 0 and 1.
That is, R + m(Q-R) – μb = Q+ρb.
I can make that happen by setting m=1 and μ=-ρ.
This makes the arrow -ρ(Q- R) + 1b = b + ρ(R-Q),
and the arrowhead Q+ρb + b + ρ(R-Q) = Q+b + ρ(R-Q+b).

The vector R-Q+b is the vector for the i-arrow with base Q+b! That is, the next i-arrow outwards in the spiral. Which means that when the base of an i-arrow is on another i-arrow, then the arrowhead is on the next i-arrow in the spiral. Even more, if the base is proportion ρ along the first i-arrow, then the arrowhead is proportion ρ along the second.

The arrowhead itself being b + ρ(R-Q) gives me a geometrical construction for finding the arrowhead. The vector b is the first i-arrow, and the vector R-Q is the vector from Q to R. So, to find the arrowhead (A in the above diagram), I go another journey of vector b from P, and then go parallel to the vector from Q to R until I meet the next i-arrow in the spiral.

I think this is really cool, because it also gives an alternative way to find the next i-arrow in the spiral from any i-arrow Q+bi and the real point R. Go another copy of the vector b from Q+b, then go a copy of the vector Q-R from there, and this is the arrowhead of the next arrow in the spiral.

Oh my. I just realised this makes a whole new way to construct the starting arrows for the spiderweb using R and Q+bi.

Start at Q+b and draw the i-arrow that is the vector b+(R-Q).
From the arrowhead of this i-arrow, go –b from there (which is halfway to R). This is the base of the next i-arrow.
Draw the i-arrow that is the vector R-Q.
From the arrowhead of this i-arrow, draw the i-arrow that is the vector (R-Q) – b.

I’m not sure if this is any better than the previous one, but it certainly has a different feel for me, and it might feel nicer in the moment when I need it.

What all of this means is that given a real point R and a complex point Q+bi, I can now find the i-arrow for any complex point on the line joining these two points based at any point P in the Cartesian plane.

Construct the four starting i-arrows and spoke lines of the spiderweb using R and Q+bi.
Draw an i-arrow from the spiderweb through P, parallel to the appropriate base i-arrow.
Draw the next i-arrow in the spiral from this one just drawn.
Construct the i-arrow based at P with arrowhead on the i-arrow just drawn.

It might not be the most simple of constructions, but I think it has a certain charm. I actually love making parts of the spiderweb until you get to the point you want.

So now I have a geometrical process that can find every other complex point on the line joining a real point and a complex point. I am so so happy.

The line joining two i-arrows arranged head to tail

The last part in this journey is to find the line joining two general complex points. If I can find the real point on this line, then the construction above will find all the other points too.

At the very least I can find the real point on a line joining two i-arrows arranged head to tail.

If Q+bi and P+ai are arranged so that the arrowhead of Q+bi’s i-arrow is the base of P+ai’s i-arrow, then they are two steps on the same spiral and it ought to be true that the real point on the line joining them is -2b away from the arrowhead of P+ai’s i-arrow.

Just to be sure, let me do the calculations…

The points on the line joining P+ai and Q+bi are of the form:
P + m(Q-P) – μ(b –a) + [a + μ(Q- P) + m(b – a)]i.
This point’s i-arrow has base at P +m(Q-P) – μ(b –a),
arrow a + μ(Q- P) + m(b – a),
and arrowhead at P + m(Q-P) – μ(b –a) + a + μ(Q- P) + m(b – a).

If the base of P+ai is the arrowhead of Q+bi, then that means P = Q+b and b=P-Q. So the point becomes:
P + m(-b) – μ(b –a) + [a + μ(-b) + m(b – a)]i = P – mb – μ(b –a) + [a – μb + m(b – a)]i
whose i-arrow has base at P – mb– μ(b –a),
arrow a – μb + m(b – a),
and arrowhead at P – mb – μ(b –a) + a – μb + m(b – a).

If this point is real, then
a – μb + m(b – a) = 0
a – μb + mb – ma = 0
(1-m)a +(1-μ)b = 0

If a and b are parallel, then they are on a real line together, so I assume they’re not parallel, which means they’re independent vectors, so the only way to produce the zero vector as a linear combination is for both coefficients to be zero.

Therefore m=1 and μ= 1.
So the base (which is now the real point’s location) is P-b – (b – a) = P+a+2b, exactly where it should be. Nice.

But what if the two i-arrows aren’t arranged head to tail?

Two i-arrows create two i-arrows arranged head to tail

I am now left with the most general possible arrangement of two i-arrows. Two i-arrows not arranged head to tail, and not in such a way that the line joining their arrowheads is parallel to the line joining their bases. I have actually discovered in an earlier blog post that when I join these two complex points to make a line, then the i-arrows on the line whose bases are on the line joining their bases also have their arrowheads on the line joining their arrowheads. Let me confirm that with this new calculation style…

The points on the line joining P+ai and Q+bi are of the form:
P + m(Q-P) – μ(b –a) + [a + μ(Q- P) + m(b – a)]i.
This point’s i-arrow has base at P +m(Q-P) – μ(b – a),
arrow a + μ(Q- P) + m(b – a),
and arrowhead at P + m(Q-P) – μ(b – a) + a + μ(Q- P) + m(b – a).

If this point has its base on the line joining P and Q, then the base has to be P+ρ(Q-P) for some real number ρ. The base is actually P +m(Q-P) – μ(b –a), so we need μ=0. That means the complex point is now
P + m(Q-P) + [a + m(b – a)]i,
which has an i-arrow with base at P +m(Q-P),
arrow a + m(b – a),
and arrowhead at P + m(Q-P) + a + m(b – a)
= P+a+m(Q-P+b – a)
= P+a+m((Q-b) – (P+a)).

This is a point on the line joining P+a and Q+a, which are the arrowheads of the original two points. Even more, if you move the base a multiple m of the journey from P to Q, then you also move the arrowhead the same multiple of the journey from P+a to Q+b, which is exactly what I found last time.

The formula for the arrow suggests a construction to find the arrowhead given the base point too. The arrow is a + m(b – a), which is the vector a, plus a multiple of (b – a). If I draw the vectors a and b at the base point, then the vector (b – a) is the vector from the end of a to the end of b . Following this vector will bring me to the arrowhead of the i-arrow. But I know where that arrowhead lies: on the line joining the arrowheads of the original two i-arrows! So I don’t need to calculate how long to go, I only need to follow that arrow to the arrowhead line.

I like this construction. I like how it uses the two original arrows sitting at the base point we want to find the arrowhead we want. That seems to me to be how it really ought to work.

It should also be possible to work backwards from a specific arrowhead on the line joining the original two arrowheads to find where the matching base point is. I have to follow the vector a + m(b – a) backwards, which means following the vector –a – m(b – a) = –a + m(-b -(-a)) . But this vector can be found by placing the vectors –a and –b at the starting point and following the line joining their vector endpoints. I know the base point is on the line joining the original two base points, so again I don’t have to calculate anything, only follow the line to where it meets the base point line.

But wait. The intersection of the base line and arrowhead line is capable of being both a base point and an arrowhead, which means I can draw a pair of i-arrows there arranged head to tail! And if I can do that, then I can find the real point on this complex line!

So, given two i-arrows for complex points P+ai and Q+bi, I can find another pair of i-arrows on their line arranged head to tail like this:

Join the two base points to make a line, and the two arrowheads to make a line. Find where these two lines meet. Call this point S.
At S, draw the vectors a and b and join their endpoints. Call the point where this meets line the arrowhead line C.
At S, draw the vectors –a and –b and join their endpoints. Call the point where this meets the arrowhead line D.
Then the two i-arrows from S to C and from D to S are both on the line joining P+ai and Q+bi, and they are arranged head to tail.

From these two i-arrows arranged head-to-tail, I can find the real point on this complex line, and using that real point and any one of the i-arrows so far, I can create all the points on this complex line.

I wouldn’t be surprised if there was a more direct way to find the real point from the original i-arrows without going through the head-to-tail pair, but for now I am content. I do know a way, and it’s enough for me for now.

Conclusion

I have now found geometrical constructions that allow me to find any complex point on a line joining two complex points, whether the original points are real and real, real and unreal, or unreal and unreal. There are some lovely special cases with particularly simple constructions. I think my favourite bit is that spiderweb that neatly partitions the plane into eight slices where all the i-arrows are parallel.

I think I can finally let this go for a bit, and think about other things. It’s been an amazing ride. If you’re here at the end, I hope you’ve enjoyed it. These are all the other posts in this blog series, if you want to find them.

Where the complex points are: i-arrows
The complex points on a line using i-arrows
Further updates on the complex points on an unreal line using i-arrows
The complex points on a line in finite geometry using i-arrows
The complex points on a parabola using i-arrows
The complex points on real circles using i-arrows
The complex points on unreal circles using i-arrows
The line joining two complex points using i-arrows (YOU ARE HERE).