如何识别图像中的UI元素?图像、如何识别、元素、UI

2023-09-11 23:23:15 作者:来看看有没有

我试图使一个Automator的工具,并正在试验一种类型的记录这需要屏幕截图和记录用户输入。这个想法是对用户采取快照和与突出显示的提交按钮快照方。在播放过程中,该计划将采取打开窗口的sceenshot,并通过搜索快照找到按键的坐标。所以,我需要一个算法来搜索的图像按钮的准确(或非常接近)的图像。我发现的算法迄今比较形象相似,但无法找到它的一个子图像,以及目标识别算法似乎有点洁癖考虑对象即时试图找到将是一个近乎完美的比赛。任何想法?

I am trying to make an automator tool and am experimenting with a type of recording which takes screen shots and records user inputs. The idea would be for user to take a snapshot and and highlight a square on the snapshot of the "submit" button. During playback, the program would take a sceenshot of the open window, and find the coordinates of the button by searching for the snapshot. So I need an algorithm to search an image for an exact (or very close) image of the button. The algorithms I've found so far compare image likeness but cannot find it in a subimage, and algorithms for object recognition seem a bit over the top considering the "object" im trying to find will be a near perfect match. Any ideas?

推荐答案

您需要的是一种有效的特征提取方法。这将取决于你在寻找什么,但让我们假设你正在寻找的发送在此图像按钮:

What you need is an efficient feature extraction method. This will depend on what you're looking for, but let's assume you're looking for the Send button in this image:

之一这个按钮的特征是,它包括一对在顶部和底部平行线段。这同样适用于两个文本输入域,但对于该按钮,该偏移是恰好17个象素。

One of the characteristic features of this button is that it includes a pair of parallel line segments at the top and bottom. The same applies to the two text input fields, but for the button, this offset is exactly 17 pixels.

这是,如果你计算源图像的最大像素值与自己在一起你会得到什么垂直17像素移位:

This is what you get if you calculate the maximum pixel values of the source image together with itself shifted vertically by 17 pixels:

在发送现在的按钮显示为一个坚实的水平线。您可以通过阈值的形象和寻找黑色像素的连续序列检测这个很容易。仅供参考,这是我施加10px的水平运动模糊和阈值在128灰度级后得到:

The Send button now appears as a solid horizontal line. You can detect this quite easily by thresholding the image and looking for an unbroken sequence of black pixels. Just for reference, here's what I obtained after applying a 10px horizontal motion blur and thresholding at a grey level of 128:

这个过程会相当快速识别应聘岗位。然后,您可以受到这些位置像二维卷积和 OCR 强大的技术没有表现太多的损失。

This process will identify candidate positions quite quickly. You can then subject these locations to stronger techniques like 2D convolution and OCR without too much loss of performance.