9
$\begingroup$

I'm working on a piano tuning program and part of it requires real-time pitch detection. Here is the scheme I have so far which works to some degree but could probably use some refinement.

I'm capturing mono, 44.1kHz, 16-bit PCM audio in chunks of 2^14 samples. I combine the last 4 samples into a 2^16 length buffer, apply a Hann window to the buffer and run a FFT on it. Then, I bucketize the results of the FFT in two resolutions. First, I bucketize into 200 buckets and then run the HPS pitch detection algorithm at this granularity. I don't need to get an exact frequency here, I just want to get close. Then, I bucketize into 12000 buckets which gives me 1 cent resolution from 10Hz to 10kHz. Once I know an approximate frequency from the 200 bin HPS algorithm, I search that range of the 12000 bin case for a peak to get a more exact frequency.

This seems to work okay for the notes in the middle of the keyboard. What happens with the low notes is about 1.5s of mis-identification of the note as usually the 2nd or 3rd partial of the real note and then a correct identification of the note.

In all of the spectral plots I created to see what is going on, there is more width to the peaks that I would expect. This width is visually somewhat consistent from the 200 bin to 12000 bin case. I would have expected the peaks to be narrower in the 200 bin case.

So, signal processing is new to me so there may be things that are problems that I wouldn't think to ask about but in terms of specific questions, are the sample sizes sufficient for this task? Is Hann the right choice of window? Should I smooth the data as well before FFT? How sensitive is HPS to the number of bins? I was thinking that if I used a lot of bins then inharmonicity might not make partials overlap their fundamentals with the HPS algorithm's simple approach of dividing by 2, 3, 4, etc.

$\endgroup$
3
  • $\begingroup$ Would the constant-Q transform be of any use for this application? wellesley.edu/Physics/brown/pubs/cq1stPaper.pdf $\endgroup$
    – Atul Ingle
    Jan 3, 2013 at 4:50
  • $\begingroup$ Just curious: what kind of hardware device do you use for signal intake, a regular mic? $\endgroup$
    – amphibient
    Jan 4, 2013 at 19:09
  • $\begingroup$ I'm using a Samson CO1U microphone. $\endgroup$
    – DrTodd13
    Jan 5, 2013 at 17:19

5 Answers 5

6
$\begingroup$

Similar to this thread:

Is there an algorithm for finding a frequency without DFT or FFT?

FFT isn't a particular efficient way of building a tuner. Better (and cheaper) methods include auto-correlation, phased locked loops and delay locked loops, etc..

One example is to use tracking of local maxima and minima to roughly hone in on the fundamental frequency and then use a local oscillator and phase locked loop to track this frequency precisely. This can track a moving fundamental during tuning quickly, continuously, and with great accuracy even if the frequency is low and if the fundamental is weak.

$\endgroup$
1
  • $\begingroup$ Well, I was using FFT elsewhere in the program for inharmonicity measurements and partial matching computations. So, it was just easiest for me to re-use it for this purpose as well. I'm not too concerned with cheap but if phased locked loops are better I'll check them out. Given that this isn't my area of expertise, implementing some of these things can seem inpenetrable. $\endgroup$
    – DrTodd13
    Jan 3, 2013 at 3:36
2
$\begingroup$

A search for 'piano tuning software' or similar items will yield a large number of hits – some good, some not so good.

Every type of musical instrument has unique acoustic/physical/environmental characteristics that affect its sound. And it can get complicated, as thousands of books and research papers would suggest (eg: tonality, attack/decay characteristics, inharmonicity, etc.).

Pitch detection is itself a wide-ranging field. The following is but a tiny fraction of what's available: overview article 1 and stack exchange post and overview article 2

As for your specific questions: 1) your sample size seems like overkill – depending on SNR and waveform stability, you can get high frequency accuracy using other methods using fewer cycles. (some methods are FFT-based). And you may be capturing attack/decay with a long sample time, 2) any window other than rectangular will broaden the beam-width in the frequency domain, but that doesn't mean you shouldn't use one – Hann seems common with HPS, from what I've seen, 3) as noted in the first link above, HPS doesn't work very well at low frequency, and inharmonicity will affect you on the lower strings. As for your overall method, without having to write a lot of pages, I can only say that I would do it differently, depending on the frequency range and harmonics I was dealing with.

$\endgroup$
2
  • $\begingroup$ Some of the cepstrum variants look interesting to try so I will start with that. Perhaps I should discard the portion of the samples that correspond to the "attack"/hammer strike. Does anybody happen to know how long it takes the note to reach a somewhat steady state or there is a way of characterizing the initial state so I can filter it? $\endgroup$
    – DrTodd13
    Jan 3, 2013 at 17:03
  • $\begingroup$ The info is out there, but it may be dificult to find - look, for example, at this thread from 'piano world' forum on 'attack'characteristics: pianoworld.com/forum/ubbthreads.php/topics/1125286/What%20is $\endgroup$ Jan 4, 2013 at 5:15
2
$\begingroup$

Another answer suggests PLL. I think you should stay away from PLL: most literature on pitch tracking focuses on auto-correlation (search for "YIN Pitch Tracking" -- YIN is a modern pitch tracking algorithm based on auto-correlation) and FFT. I believe PLL is more suited to tracking tiny fluctuations in frequency, like with radio.

Auto-correlation is a good place to start. It is fast, efficient and accurate. However, there are tricks for making FFT very accurate and fast (most techniques that use the FFT just look at the magnitude, but you can also use the phase information), so if you are familiar with FFTs you can use that technique as well.

If you do use either of these techniques, I suggest prefiltering with a lowpass to reduce harmonics and focus on the fundamental. With the FFT, you can, instead, or in addition, use tricks like looking at the first local maximum.

This might be a good starting point for filtering and so on. It will also give you some tips about avoiding doing too much work, and it links to source code: http://blog.bjornroche.com/2012/07/frequency-detection-using-fft-aka-pitch.html

This book has sections that explain both YIN and FFT using phase information: http://www.amazon.com/DAFX-Digital-Udo-ouml-lzer/dp/0470665998

Finally, you'll have to understand the specifics of pianos. I'm not sure if the tuner itself needs to do anything special WRT, eg, stretched tuning, or if that's left up to the person tuning the piano, but you'll need to at least understand that stuff. Another poster suggested looking at out of tune harmonics, but the main issue is to identify and tune the fundamental, so the harmonics being out of tune should not matter as long as you properly identify the fundamental.

$\endgroup$
3
  • $\begingroup$ Nice information, thanks! What got me interested was a paper called "Entropy-based Tuning of Musical Instruments." It uses a measure of entropy to compute a tuning for a given piano based on the piano's specific inharmonicities. I was trying to first duplicate the results from the paper and then go from there. Once/if that is successful the I can use what this post is about to tune the piano to the computed tuning. When you talk about using FFT phase, is that an output that is typically discarded that I could use or something internal? I'm using someone else's FFT package. $\endgroup$
    – DrTodd13
    Jan 3, 2013 at 17:53
  • $\begingroup$ I'm not familiar with those techniques (it sounds interesting though). I would start with standard techniques before moving into that domain. Although I would pick a standard technique that most closely resembles the advanced technique you want to emulate. $\endgroup$ Jan 3, 2013 at 19:10
  • $\begingroup$ FFT outputs are usually in real and imaginary parts. You can translate this into magnitude and phase in the usual way ( real+imaginary and mag&phase are both valid representations of complex numbers). How to use this for pitch tracking is subtle -- you'll have to read the DAFX book for the deets. $\endgroup$ Jan 3, 2013 at 19:11
2
$\begingroup$

The wide peaks you see may be the result of physical phenomena, not a signal processing artifact. In general, narrow peaks in an FFT result represents an unmodulated sinusoid exactly periodic in the window in the time domain. But piano string vibrations are not that stationary. They evolve over time, creating a noticeable modulation.

Several effects: multiple piano strings per note will exchange energy thru the soundboard; the total vibration energy will decay over time; the vibration modes may be slightly inharmonic to start with; the exact frequency of vibration of each mode (harmonic) may change with decay in amplitude due to non-zero string stiffness and diameter; and each harmonic may decay at a different rate, etc.

You may have to decide which of these multiple modulations you want to call "the pitch" (books on audiology may help), and find a method to better track it inside the FFT's "wide peak".

$\endgroup$
1
$\begingroup$

With the lower notes of pianos, especially for uprights, the spectrum tends to be stretched out (the distance between the fundamental and first overtone is a little more than an octave, etc). This is what gives pianos their percussive sound, the lowest note on older uprights will often sound more like a thud than a note, and as I understand it this is why cheep uprights have their particular percussive honky tonk sound. Because of this good piano tuners (the people not algorithms) will tune pianos by the lower overtones than the fundamental for lower notes, the human ear tends to focus on the interaction of the lower overtones for these notes. The stretching of the harmonic series could also be the cause of the wider than expected peaks in the spectrum.

$\endgroup$
1
  • $\begingroup$ Well, inharmonicity doesn't equal variability. Inharmonicity would cause the peak to be at a different point but I don't see why it would make the peak wider. Perhaps peak width is due to inherent signal variance over time and so shortening the sample period would reduce the variance? $\endgroup$
    – DrTodd13
    Jan 3, 2013 at 17:01

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.