Click to confirm you are 18+

Processing Music: The Basic Problem

NickApr 21, 2020, 3:51:56 PM

One common musical representation of frequency is a spectogram. A spectogram is a graph whose x axis is time, y axis frequency, and color the magnitude. You might be wondering how it works, but that's a topic for a different article.

A standard one is posted below:

Mel Spectogram

Here you can clearly see the fundamental pitch (usually the lowest bar on the graph, and also the most dense in color), and it's pretty clear that the partials above it are in integer multiples. It also looks like there are vertical columns, and we should turn our attention to these.

If there were a way to extract the vertical columns, we could identify pitches. At least that's the goal. As we'll see in future articles, we'll need a more nuanced approach.

Let's assume we only care about the bottom line (this is a reasonable first guess, but it's wrong). If that's the case, we can segment the image based on the magnitude of the color and achieve some sort of vertical segmentation.

In the next article, I'll show you some segmentation techniques using OpenCV, numpy, and matplotlib. I'll also show you why this approach ultimately fails.