匹配两个音频文件音频文件、两个

2023-09-04 03:18:12 作者:Gentleman 绅士

我想记录狗叫,保存文件,并与含有不同类型的树皮几个文件(警告树皮,树皮哭泣等)进行比较。

I want to record a dog bark, save the file and compare with several files containing different types of bark (warning bark, crying bark, etc..).

我怎么能做到这一点,为了比较得到匹配?什么是过程,这种类型的应用程序可循?

How could i do that comparison in order to get a match? What is the process to follow in this type of apps?

谢谢你的提示。

推荐答案

有没有简单的答案,你的问题。然而,对于初学者来说,你可能会考虑如何音频指纹识别工作。本文是一个良好的开端写的快变的创造者:

There is no simple answer to your problem. However, for starters, you might look into how audio fingerprinting works. This paper is an excellent start written by the creators of shazam:

http://www.ee.columbia.edu/~ dpwe /纸/ Wang03-shazam.pdf

我不知道这种做法会如何工作的狗吠声,但也有一些概念有可能证明是有用的。

I'm not sure how well that approach would work for dog barking, but there are some concepts there that might prove useful.

另一件要考虑的是如何在FFT工作。下面是与code,我写的间距跟踪,这是使用FFT的一种方法的教程。您正在寻找更多的如何语气和音调与给定的狗的共振峰结构进行交互。所以,参数,你会希望获得可能包括基本间距(其中,独自一人,可能足以区别于其他种类的树皮的呜呜),以及基本的间距比例高次谐波,这将有助于确定如何侵略性的树皮(我就在这里猜了一下):

Another thing to look into is how the FFT works. Here's a tutorial with code that I wrote for pitch tracking, which is one way to use the FFT. You are looking more at how the tone and pitch interact with the formant structure of a given dog. So parameters you'll want to derive might include fundamental pitch (which, alone, might be enough to distinguish whining from other kinds of barks), and ratio of fundamental pitch to higher harmonics, which would help identify how agressive the bark is (I'm guessing a bit here):

的http://blog.bjornroche.com/2012/07/frequency-detection-using-fft-aka-pitch.html

最后,你可能想要做一些研究,基本的语音识别和语音处理,会有一些重叠。维基百科将可能足以让你开始。

Finally, you might want to do some research into basic speech recognition and speech processing, as there will be some overlap. Wikipedia will probably be enough to get you started.

编辑:哦,还有,一旦你已经确定了一些参数,用于比较,你需要一种方法来你的多个参数比较,以你与多个参数的声音数据库。我不认为在Shazam的文章的技巧会工作。有一件事你可以尝试是 Logistic回归。有其他的选择,但是这可能是最简单的。

oh, also, once you've identified some parameters to use for comparison, you'll need a way to compare your multiple parameters to your database of sounds with multiple parameters. I don't think the techniques in the shazam article will work. One thing you could try is Logistic Regression. There are other options, but this is probably the simplest.