比较SIFT存储在MySQL数据库功能功能、数据库、SIFT、MySQL

2023-09-11 22:47:22 作者:锁哥

我目前正在扩大用于分类图像的图像库,我想找到重复的图像,变换图像,以及包含或包含在其他的图像。 我已经测试从OpenCV中SIFT的实现,它工作得很好,但将是多个图像相当缓慢。太加快步伐,我想我可以提取的特点和在数据库中保存的大量相关的元数据已经被关押在那里的其他形象。

I'm currently extending an image library used to categorize images and i want to find duplicate images, transformed images, and images that contain or are contained in other images. I have tested the SIFT implementation from OpenCV and it works very well but would be rather slow for multiple images. Too speed it up I thought I could extract the features and save them in a database as a lot of other image related meta data is already being held there.

什么是比较在数据库中的一个新的图像特征的特征最快的方法是什么? 通常比较是通过使用KD树计算欧氏距离,FLANN,或者与金字塔比赛内核中,我发现在这里SO另一个线程,但没有看过多少成呢。

What would be the fastest way to compare the features of a new images to the features in the database? Usually comparison is done calculating the euclidean distance using kd-trees, FLANN, or with the Pyramid Match Kernel that I found in another thread here on SO, but haven't looked much into yet.

由于我不知道的方式来有效地保存和搜索kd树在数据库中,我目前只看到三个选项: *让MySQL的计算欧氏距离,数据库中的每个功能,但我敢肯定,这将需要一个不合理的时间多了一些图片。 *将整个数据集到内存中的开头和构建kd树(S)。这很可能是快,但非常占用大量内存。加上所有的数据将需要从数据库中传送。 *生成的树保存到数据库中,并加载所有的人,将是最快的方法,但也产生大量的流量,与新的图像KD树就必须重建,发送到服务器。

Since I don't know of a way to save and search a kd-tree in a database efficiently, I'm currently only seeing three options: * Let MySQL calculate the euclidean distance to every feature in the database, although I'm sure that that will take an unreasonable time for more than a few images. * Load the entire dataset into memory at the beginning and build the kd-tree(s). This would probably be fast, but very memory intensive. Plus all the data would need to be transferred from the database. * Saving the generated trees into the database and loading all of them, would be the fastest method but also generate high amounts of traffic as with new images the kd-trees would have to be rebuilt and send to the server.

我使用的SIFT执行的OpenCV的,但我不就可以了死心塌地。如果有一个特征提取更适合这项任务(和大致相当强大的),我很高兴,如果有人可以建议之一。

I'm using the SIFT implementation of OpenCV, but I'm not dead set on it. If there is a feature extractor more suitable for this task (and roughly equally robust) I'm glad if someone could suggest one.

推荐答案

所以,我基本上是做了一件非常相似,这在几年前。 您想看看提出了一个几年前由大卫·尼斯特尔的算法,该论文是:可扩展的识别与词汇树。他们pretty的多少有一个确切的解决问题的方法,可以扩展到数百万图像。

So I basically did something very similar to this a few years ago. The algorithm you want to look into was proposed a few years ago by David Nister, the paper is: "Scalable Recognition with a Vocabulary Tree". They pretty much have an exact solution to your problem that can scale to millions of images.

下面是一个链接到抽象,您可以通过googleing标题找到下载链接。 http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1641018

Here is a link to the abstract, you can find a download link by googleing the title. http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1641018

其基本思想是建立一个分层K-means算法模型的特点,然后利用该树的特征分布稀疏快速找到离您最近的邻居...或者类似的东西一棵树,它是一个短短几年里,我的工作就可以了。您在这里可以找到作者的网页PowerPoint演示presentation:http://www.vis.uky.edu/~dnister/Publications/publications.html

The basic idea is to build a tree with a hierarchical k-means algorithm to model the features and then leverage the sparse distribution of features in that tree to quickly find your nearest neighbors... or something like that, it's been a few years since I worked on it. You can find a powerpoint presentation on the authors webpage here: http://www.vis.uky.edu/~dnister/Publications/publications.html

其他一些注意事项:

我不会用金字塔匹配内核麻烦,它实际上更多用于改善不是重复/变换的图像检测物体识别。

I wouldn't bother with the pyramid match kernel, it's really more for improving object recognition than duplicate/transformed image detection.

我也不会存储任何的这一特点的东西,在一个SQL数据库。根据你的应用是的有时的更有效的计算在飞行的特点,因为它们的大小可能会超过原来的图像大小密集计算的时候。功能或指针直方图节点词汇树是更有效的。

I would not store any of this feature stuff in an SQL database. Depending on your application it is sometimes more effective to compute your features on the fly since their size can exceed the original image size when computed densely. Histograms of features or pointers to nodes in a vocabulary tree are much more efficient.

SQL数据库不适合做大规模的浮点矢量计算。 您可以存放东西在数据库中,但不要把它作为计算工具。我想这一次,使用SQLite,它结束了非常糟糕。

SQL databases are not designed for doing massive floating point vector calculations. You can store things in your database, but don't use it as a tool for computation. I tried this once with SQLite and it ended very badly.

如果您决定要实现这一点,仔细阅读本文,并保留一份得心应手,而实现它,因为有很多小的细节,能够有效地使算法的工作非常重要。

If you decide to implement this, read the paper in detail and keep a copy handy while implementing it, as there are many minor details that are very important to making the algorithm work efficiently.