FFT算法:善有善报IN / OUT? (重:实时基音检测)基音、善有善报、算法、实时

2023-09-11 03:58:17 作者:水晶糖

我试图从音频流中提取音调数据。从我所看到的,它看起来好像FFT是利用最好的算法。

I am attempting to extract pitch data from an audio stream. From what I can see, it looks as though FFT is the best algorithm to use.

而不是挖直入数学,可能有人帮助我理解这是什么FFT算法呢?

Rather than digging straight into the math, could someone help me understand what this FFT algorithm does?

请不要说一些明显的像'从原始信号中提取的FFT频率数据。我需要详细的一个新的水平。

Please don't say something obvious like 'FFT extracts frequency data from a raw signal.' I need the next level of detail.

我怎么传递,而我能得到什么呢?

What do I pass in, and what do I get out?

在我所理解的界面清楚,这将有助于我理解的执行情况。

Once I understand the interface clearly, this will help me to understand the implementation.

我把它我需要通过在音频缓冲,我需要告诉它有多少字节用于每个计算(比如最近的1024个字节,从该缓冲区)。也许我需要指定球场我想它来检测的范围。现在,它是要传回的是什​​么?频点的数组?这些是什么?

I take it I need to pass in an audio buffer, I need to tell it how many bytes to use for each computation (say the most recent 1024 bytes from this buffer). and maybe I need to specify the range of pitches I want it to detect. Now it is going to pass back what? An array of frequency bins? What are these?

()我已经找到了C ++算法来使用(如果我只能理解吧)

() I have found a C++ algorithm to use (if I can only understand it)

Performous 从麦克风中提取沥青。另外,code是开源的。这是什么样的算法做了介绍,从codeD它的人。

Performous extracts pitch from the microphone. Also the code is open source. Here is a description of what the algorithm does, from the guy that coded it.

在PCM输入(带缓冲) 在FFT(1024个样本的时间,除去200个样品从缓冲区的前面之后) 在重新分配方法(对previous FFT,这是200个样本更早) 峰过滤(这部分可以做的更好,甚至离开了) 在合并峰为集谐波(我们称之为组合音) 色调时间滤波(更新组音调检测只需使用新检测到的人的前面,而不是) 挑选最好的人声音(频率范围,加权,可以使用谐波阵列还,但我不认为我们做的)

不过,可能有人帮助我理解它是如何工作的?这是什么,这是获得从FFT发送到重新分配的方法?

But could someone help me understand how this works? What is it that is getting sent from the FFT to the Reassignment method?

推荐答案

有选择这里的一个元素。最简单的实现是做(2 ^ n个样本),复数的,并且2的n次方复数了,所以也许你应该着手行动。

There is an element of choice here. The most straightforward to implement is to do (2^n samples in) complex numbers in, and 2^n complex numbers out, so maybe you should start with that.

在一个DCT的特殊情况下(离散余弦变换),通常在发生什么是2 ^ n个样本(常浮动),和流出去2 ^ n个值,经常浮动太。 DCT是一个FFT但仅需真实值,并且在余弦方面分析功能。

In the special case of a DCT(discrete cosine transform), typically what goes in is 2^n samples (often floats), and out go 2^n values, often floats too. DCT is an FFT but that takes only the real values, and analyses the function in terms of cosines.

这是聪明的(但通常会跳过)来定义一个结构来处理复杂的值。传统的FFT均由内部完成,地方,但它工作正常,如果你不这样做。

It is smart (but commonly skipped) to define a struct to handle the complex values. Traditionally FFT's are done in-place, but it works fine if you don't.

这可能是有用的实例,包含了FFT的工作缓冲器(如果你不想做的地方进行FFT)的一类,并重用了几个FFT的。

It can be useful to instantiate a class that contains a work buffer for the FFT (if you don't want to do the FFT in-place), and reuse that for several FFTs.