I used logarithms a lot in school while studying math and computer science, but forgot a lot about the why as I got older. We often see logarithms used in machine learning and so I wanted to have a refresher as to why we use them.

**Mathematics**

The logarithm is the inverse of exponents. So 2^4= 16 is log2(16) = 4. It is saying, if I start with a base number x, (in our example 2), how many times does it need to be multiplied by itself to get to 16, which is 4. So in algebra, it can be used as an inverse to exponents just like we often need to calculate the inverse of addition(subtraction) and multiplication(division). Logarithm base 10 is most often used in science and engineering while the natural logarithm e, (2.718…) is used most often in mathematics and physics.

Sometimes looking at data, especially in graphs, there may be data points that mess with the scale of the data because they are too extreme. That means it will mess with the scale or it won’t even fit onto the graph, we can’t compare the data points. So if the data has these extremes, the data will be passed through a logarithm function to smooth out the data. The relationships will still be there and you will be able to put them on the same scale. Check out the charts below, left is raw volume of the celestial bodies in our solar system and the right side shows the same volumes after they have gone through a logarithm transformation.

Now we can compare them!

**Computation**

Computers have a fixed limit on the size of numbers they can represent. So an integer or decimal might be able to measure up to 10 spaces. For an integer that could be 9999999999 and a decimal might be 0.00000000001. If a number goes over or under those limits, it will have an overflow or underflow and will just convert that to a zero, essentially destroying the computation. Often times a log is used to convert those numbers to safer ranges that a computer can use without causing overflows and underflows.

When measuring algorithmic complexity, how long an algorithm is expected to run, often times logarithms will be used.

The reason it shows up so often in algorithm complexity measurements is because divide and conquer style algorithms. Many search algorithms work by continuously splitting the data in half and then processing it. Which means the algorithm is working on increasingly smaller datasets.

**In Machine Learning**

In machine learning, its a similar case, sometimes the values are too extreme and those extreme values may bias the learning algorithm too much, so if we convert them to a log scale, then relationships are still kept and the machine learning algorithm can find and learn other important relationships in the data.

Another way to look at it is that many machine learning algorithms work better on linearly separable data. If you can use a logarithm to convert the data, you may add linearity to the data and be able to use a wider range of machine learning algorithms. For example regression models work better on linear data.

Sometimes, log transformations are used as a squashing/activation function in artificial neural networks to flatten data to certain ranges, check out this highlighted table :

So logarithms are often used to massage numbers to make computation and understanding of data easier.

References

https://en.wikipedia.org/wiki/Logarithm

http://proceedings.mlr.press/v15/glorot11a/glorot11a.pdf

https://www.quora.com/Why-squashing-function-is-important-in-neural-network