Every so often, I’m confronted with the logarithm. If you’re not familiar, the mathematical soup for a logarithm of base bb is:

logb(x)=yby=x\begin{align*} \log_b(x) &= y \\ b^y &= x \\ \end{align*}

The spoken version of the definition doesn’t feel any more intuitive. It feels like the definition is always inverted – and that’s because it is! As of writing this, the version on Wikipedia is:

Given a positive real number bb such that b1b \ne 1, the logarithm of a positive real number x with respect to base b is the exponent by which b must be raised to yield x. In other words, the logarithm of x to base b is the unique real number y such that by=xb^y = x.1

It’s soup all the way down.

As people, we work with decimals2. That is, numbers with base-10. I’d assert that it’s an unfortunate base. Hexadecimal would have been far more useful, but we’ve got ten fingers. It was almost guaranteed to become the dominant choice.

So, I’m not here to discuss log laws, but rather a neat party trick if you work in any way with data, data structures or algorithms. These all share the habit of involving (specifically base-2) logarithms.

Example - Binary search tree

Imagine you are given a binary search tree (BST) and told that it has “thousands of nodes”. That means it must be strictly less than a million nodes. You are asked to estimate how many comparisons you would need to make, worst case, to find a value in this BST. Why did they give you the bound in decimal? Are they sadists? More likely, it’s just because we all like round decimal numbers. If they said it was bounded by 10241024, you could just keep halving the number until you have 11. The number of halves is your answer. In this case, it’s 1010. It’s actually halves plus one, so 1111. But wasn’t it satisfying to believe it might have been ten?

Probing further on this, notice how 10241024 (which is 2102^{10}) is very close to 10001000.

1,048,5761,048,576 (which is 2202^{20}) is very close to 1,000,0001,000,000.

This progression continues up until a trillion (240\approx 2^{40}) with less than 10% difference. This difference does eventually become too big, but it’s only over 50% at 21802^{180}. I don’t work with numbers near that size, and I don’t suspect I ever will.

So it seems like we have a formula where tt is the number of groups of three zeroes, such that our number, nn, is:

n=210tn = 2^{10t}

We could have defined t=log10(n)3t = \frac{\log_{10}(n)}{3}, but we want a party trick out of this, not work. You can just eyeball or count the groups of three zeroes. So, back to the original example, if we have a BST bounded by a million, we know we have:

1,000,000210(2)220\begin{align*} 1,000,000 &\approx 2^{10(2)} \\ &\approx 2^{20} \end{align*}

This is useful, because a binary search takes up to log2(n)\lceil\log_2(n)\rceil comparisons to find a value contained within. In other words:

log2(1,000,000)log2(220)20\begin{align*} \log_2(1,000,000) &\approx \log_2(2^{20}) \\ &\approx 20 \end{align*}

So we expect roughly 2020 comparisons.

Example - Data storage

Let’s say you think data structures and algorithms are the worst. And the above example was not compelling. You create bits, directly, in a hot forge. Why use a tree when fetching from main memory is so expensive? Ridiculous3.

Well, there are cases for using exact-sized memory. In C/C++, you have bit-fields, and Zig supports arbitrary bit-width integers out of the box. Want a u14? Sure.

If you are given some number, and you want to determine the number of bits needed, you know the approximation we’ve devised above at least bounds the decimal value described. So if someone says you need to store something up to a value of 250,000250,000, you know how to get to a million. It’s about 2202^{20}. Again, taking the (base-2) log of this number is useful. It gives us the bits required to store the original number. Now, in this case, our number is a quarter of that so we just take off two bits (halving the maximum number twice). Great, we need a u18, since our number is going to be at least 250,000250,000. You may be off by ±1\pm1 with this trick for larger numbers, but it’s a really quick way to work with the two seemingly incompatible number bases whenever a logarithm underlies the problem.

Generally, for this example, you’ll want to choose the closest number to your number that evenly divides into a thousand and then halve or double your way to the final bit count4.

Footnotes

  1. https://en.wikipedia.org/wiki/Logarithm#Definition 

  2. Well, we do now. People have used hexadecimals, vigesimals and likely other bases in the past. 

  3. Pretend that you can’t represent a binary tree inside an array for this joke. 

  4. Numbers like 32,768 (2152^{15}) are especially evil because it takes the same amount of doubling/halving regardless of your choice for “number of groups of three zeroes”.