A logarithmic party trick
Every so often, I’m confronted with the logarithm. If you’re not familiar, the mathematical soup for a logarithm of base is:
The spoken version of the definition doesn’t feel any more intuitive. It feels like the definition is always inverted – and that’s because it is! As of writing this, the version on Wikipedia is:
Given a positive real number such that , the logarithm of a positive real number x with respect to base b is the exponent by which b must be raised to yield x. In other words, the logarithm of x to base b is the unique real number y such that .1
As people, we work with decimals2. That is, numbers with base-10. I’d assert that it’s an unfortunate base. Hexadecimal would have been far more useful, but we’ve got ten fingers. It was almost guaranteed to become the dominant choice.
So, I’m not here to discuss log laws, but rather a neat party trick if you work in any way with data, data structures or algorithms. These all share the habit of involving (specifically base-2) logarithms.
Example - Binary search tree
Imagine you are given a binary search tree (BST) and told that it has “thousands of nodes”. That means it must be strictly less than a million nodes. You are asked to estimate how many comparisons you would need to make, worst case, to find a value in this BST. Why did they give you the bound in decimal? Are they sadists? More likely, it’s just because we all like round decimal numbers. If they said it was bounded by , you could just keep halving the number until you have . The number of halves is your answer. In this case, it’s . It’s actually halves plus one, so . But wasn’t it satisfying to believe it might have been ten?
Probing further on this, notice how (which is ) is very close to .
(which is ) is very close to .
This progression continues up until a trillion () with less than 10% difference. This difference does eventually become too big, but it’s only over 50% at . I don’t work with numbers near that size, and I don’t suspect I ever will.
So it seems like we have a formula where is the number of groups of three zeroes, such that our number, , is:
We could have defined , but we want a party trick out of this, not work. You can just eyeball or count the groups of three zeroes. So, back to the original example, if we have a BST bounded by a million, we know we have:
This is useful, because a binary search takes up to comparisons to find a value contained within. In other words:
So we expect roughly comparisons.
Example - Data storage
Let’s say you think data structures and algorithms are the worst. And the above example was not compelling. You create bits, directly, in a hot forge. Why use a tree when fetching from main memory is so expensive? Ridiculous3.
Well, there are cases for using exact-sized memory. In C/C++, you have bit-fields, and Zig supports arbitrary bit-width integers out of the box. Want a u14? Sure.
If you are given some number, and you want to determine the number of bits needed, you know the approximation we’ve devised above at least bounds the decimal value described. So if someone says you need to store something up to a value of , you know how to get to a million. It’s about . Again, taking the (base-2) log of this number is useful. It gives us the bits required to store the original number. Now, in this case, our number is a quarter of that so we just take off two bits (halving the maximum number twice). Great, we need a u18, since our number is going to be at least . You may be off by with this trick for larger numbers, but it’s a really quick way to work with the two seemingly incompatible number bases whenever a logarithm underlies the problem.
Generally, for this example, you’ll want to choose the closest number to your number that evenly divides into a thousand and then halve or double your way to the final bit count4.
Footnotes
-
Well, we do now. People have used hexadecimals, vigesimals and likely other bases in the past. ↩
-
Pretend that you can’t represent a binary tree inside an array for this joke. ↩
-
Numbers like 32,768 () are especially evil because it takes the same amount of doubling/halving regardless of your choice for “number of groups of three zeroes”. ↩