查找音频采样的音频文件(谱图已经存在)音频文件、音频、存在

2023-09-04 02:28:59 作者:血染疆场

我想实现以下目标:

在使用Skype,叫我的邮箱(作品) 输入密码,并告诉邮箱,我想记录一个新的欢迎消息(作品) 现在,我的邮箱告诉我,在提示音后录制的欢迎信息 我要等待提示音,然后打新邮件(不工作)

如何努力实现最后一点:

How I tried to achieve the last point:

使用FFT和滑动窗口创建一个谱图(作品) 发出哔哔声创建一个指纹 搜索该指纹的声音来自Skype的

我现在面临的问题是: 从Skype和基准蜂鸣的声音的快速傅里叶变换的结果是不是在数字意义相同,也就是说,它们是类似的,但不一样的,虽然蜂鸣萃取从音频文件的Skype的音频的记录。下图显示了从左侧Skype的音频和右侧参考蜂鸣声的频谱嘟嘟的谱图。正如你所看到的,他们都非常相似,但不一样的...

The problem I am facing is the following: The result of the FFTs on the audio from skype and the reference beep are not the same in a digital sense, i.e. they are similar, but not the same, although the beep was extracted from an audio file with a recording of the skype audio. The following picture shows the spectrogram of the beep from the Skype audio on the left side and the spectrogram of the reference beep on the right side. As you can see, they are very similar, but not the same...

我不知道,如何从这里继续。我应该做出平均值,即把它分成列和行,并比较这些细胞的平均值如这里 ?我不知道这是最好的方式,因为他已经声明,它不工作非常好,短的音频样本,嘟嘟声的长度小于第二...

I don't know, how to continue from here. Should I average it, i.e. divide it into column and rows and compare the averages of those cells as described here? I am not sure this is the best way, because he already states, that it doesn't work very good with short audio samples, and the beep is less than a second in length...

如何进行任何提示?

推荐答案

您应该确定峰值频率和持续时间(可能是minumum权力的持续时间频率(的 RMS 是最简单的衡量标准)

You should determine the peak frequency and duration (possibly a minumum power over that duration for the frequency (RMS being the simplest measure)

这应该很容易衡量。为了让事情变得更聪明(但可能完全没有必要为这个简单的匹配任务),你可以在提示音的窗口期间断言非实存的另一高峰。

This should be easy enough to measure. To make things even more clever (but probably completely unnecessary for this simple matching task), you could assert the non-existance of other peaks during the window of the beep.

要比较完整的音频片段,你需要使用一个卷积算法。我建议用现成的库实现,而不是滚动您自己。

To compare a complete audio fragment, you'll want to use a Convolution algorithm. I suggest using a ready made library implementation instead of rolling your own.

最常见的快速卷积算法通过循环卷积定理使用快速傅立叶变换(FFT)算法。具体而言,两个有限长度的序列的循环卷积是通过取每个序列的FFT相乘逐点,然后进行逆FFT找到。上面所定义类型的卷积,然后有效地利用在具有零扩展和/或丢弃输出的部分结合该技术实现。其他快速卷积算法,如Schönhage-Strassen的算法,使用其他环快速傅立叶变换。

The most common fast convolution algorithms use fast Fourier transform (FFT) algorithms via the circular convolution theorem. Specifically, the circular convolution of two finite-length sequences is found by taking an FFT of each sequence, multiplying pointwise, and then performing an inverse FFT. Convolutions of the type defined above are then efficiently implemented using that technique in conjunction with zero-extension and/or discarding portions of the output. Other fast convolution algorithms, such as the Schönhage–Strassen algorithm, use fast Fourier transforms in other rings.

维基百科列出 http://freeverb3.sourceforge.net 作为一个开源的候选人

Wikipedia lists http://freeverb3.sourceforge.net as an open source candidate

修改添加链接API教程页面: HTTP://freeverb3.sourceforge .NET / tutorial_lib.shtml

Edit Added link to API tutorial page: http://freeverb3.sourceforge.net/tutorial_lib.shtml

http://en.wikipedia.org/wiki/Finite_impulse_response

http://dspguru.com/dsp/faqs/fir

在Debian现有的相关工具包:

Existing packages with relevant tools on debian:

[brutefir - a software convolution engine][3]
jconvolver - Convolution reverb Engine for JACK

libzita-convolver2 - C++ library implementing a real-time convolution matrix
teem-apps - Tools to process and visualize scientific data and images - command line tools
teem-doc - Tools to process and visualize scientific data and images - documentation
libteem1 - Tools to process and visualize scientific data and images - runtime

yorick-yeti - utility plugin for the Yorick language