空间查询在AWS的SimpleDB空间、AWS、SimpleDB

2023-09-11 11:01:32 作者:南风知我意

我想知道是什么人建议做一个空间查询在一个Amazon Web Services的SimpleDB作为有效的方法?

I would like to know what people suggest as efficient ways of doing a spatial query in an Amazon Web Services SimpleDB?

通过空间查询我的意思是一个纬度和经度给定半径在寻找对象。

By spatial query I mean finding objects in a given radius of a latitude and longitude.

推荐答案

SimpleDB中目前不提供任何内置的空间搜索行动,但是,这并不意味着它不能做的。有实施非地理空间感知数据库如SimpleDB的,他们都围绕使用使用数据库来检索基于地理空间边界框一个粗略的第一选择,然后过滤返回的数据在应用程序的想法中心的地理空间搜索的几种方法更精确的算法,如半正矢公式。

SimpleDB doesn't currently offer any built-in spatial search operations but that doesn't mean it can't be done. There's several methods of implementing geospatial searches in non-geospatially aware databases such as SimpleDB and all of them center around the idea of using the database to retrieve a rough first selection based on a geospatial bounding box and then filtering the returned data in your application using more accurate algorithms such as the Haversine formula.

您的可以的存储纬度和经度(零填充和标准化)数字属性,然后进行双范围查询(纬度> = minLat和LAT< = maxLat和LON> = minLat和LON< = maxLat ),但由于既没有theese predicates是有选择性的(各predicate很多项目的比赛)这是不理想(见Tuning查询的)。

You could store the latitude and longitude as (zero-padded and normalized) numeric attributes and then perform a double range query (lat >= minLat and lat <= maxLat and lon >= minLat and lon <= maxLat) but since neither of theese predicates are selective (each predicate matches a lot of items) it's not ideal (see Tuning Queries).

有一个更好的办法是使用 GeoHashes 。

A better way would be using GeoHashes.

Geohashes提供像任意precision,类似prefixes性能   附近的位置,并逐渐去除的可能性   从的code结束字符来减小其尺寸(逐步   失去了precision)。

Geohashes offer properties like arbitrary precision, similar prefixes for nearby positions, and the possibility of gradually removing characters from the end of the code to reduce its size (and gradually lose precision).

作为一个实际的例子中,Geohash 6gkzwgjzn820德codeS到   坐标-25.382708 -49.265506和,而Geohash 6gkzwgjz将   德code为-25.383和-49.266,如果我们采取了类似的立场   同一区域,如-25.427和-49.315,我们可以看到它是   EN codeD作为6gkzmg1w(注意类似preFIX)。

As a practical example, the Geohash 6gkzwgjzn820 decodes to the coordinates -25.382708 and -49.265506, while the Geohash 6gkzwgjz will decode to -25.383 and -49.266, and if we take a similar position in the same region, such as -25.427 and -49.315, we can see it being encoded as 6gkzmg1w (note the similar prefix).

从 http://geohash.org/site/tips.html

使用您的项目的位置,你可以使用喜欢操作符来搜索边界框(GeoHashes 其中GeoHash像'6gkzmg1w%),但因为喜欢操作符是昂贵的(Comparison操作符)一个更好的方式是通过存储每个GeoHash preFIX水平(多少取决于你需要搜索precision)作为一个单独的属性(GeoHash6 GeoHash8等)进行非规范化的数据,然后使用简单的等于predicate(其中Geohash8 ='6gkzmg1w')。

With your item positions as GeoHashes you could use the like operator to search for a bounding box (where GeoHash like '6gkzmg1w%') but since the like operator is expensive (Comparison Operators) a better way would be to denormalize the data by storing each GeoHash prefix level (how many depends on your required search precision) as a separate attribute (GeoHash6 GeoHash8 etc) and then use a simple equality predicate (where Geohash8 = '6gkzmg1w').

现在到GeoHashes的缺点。因为你不能让一个GeoHash的任何假设,在您的搜索框为中心,你必须搜索所有邻国prefixes为好。这个过程是通过很好地描述 geohash-JS

Now on to the downside of GeoHashes. Since you can't make any assumption of a GeoHash being centered within your search box you have to search all neighboring prefixes as well. The process is excellently described by geohash-js

Geohash还具有这样的性质:作为数字数减少   (从右边),精度降低。这个属性可以用来做   边界框搜索,为点附近彼此会分享   类似Geohash prefixes。

Geohash also has the property that as the number of digits decreases (from the right), accuracy degrades. This property can be used to do bounding box searches, as points near to one another will share similar Geohash prefixes.

但是,因为一个给定的点可能出现在一个给定的边缘   Geohash边界框,有必要产生Geohash列表   为了值来执行点周围的真正邻近搜索。   因为Geohash算法使用碱-32编号系统,它是   可能获得周围的任何其它给定的Geohash值   用一个简单的查找表Geohash值。

However, because a given point may appear at the edge of a given Geohash bounding box, it is necessary to generate a list of Geohash values in order to perform a true proximity search around a point. Because the Geohash algorithm uses a base-32 numbering system, it is possible to derive the Geohash values surrounding any other given Geohash value using a simple lookup table.

因此​​,例如,宾夕法尼亚大道1600号,华盛顿特区解析为:   38.897,-77.036

So, for example, 1600 Pennsylvania Avenue, Washington DC resolves to: 38.897, -77.036

使用geohash算法,这纬度和经度被转换   于:dqcjqcp84c6e

Using the geohash algorithm, this latitude and longitude is converted to: dqcjqcp84c6e

此点周围的一个简单的边框可以被描述为   截断该geohash到:dqcjqc

A simple bounding box around this point could be described by truncating this geohash to: dqcjqc

不过,dqcjqcp84c6e'不'内dqcjqc为中心,和搜索   在dqcjqc可能会错过一些预期目标。

However, 'dqcjqcp84c6e' is not centered inside 'dqcjqc', and searching within 'dqcjqc' may miss some desired targets.

所以取而代之,我们可以使用Geohash的数学特性   快速计算dqcjqc的邻居;我们发现,它们是:   'dqcjqf','dqcjqb','dqcjr1','dqcjq9','dqcjqd','dqcjr4','dqcjr0','dqcjq8'

So instead, we can use the mathematical properties of the Geohash to quickly calculate the neighbors of 'dqcjqc'; we find that they are: 'dqcjqf','dqcjqb','dqcjr1','dqcjq9','dqcjqd','dqcjr4','dqcjr0','dqcjq8'

这给了我们身边的dqcjqcp84c6e大致2公里x1.5公里边界框   并允许在仅9密钥的数据库搜索:SELECT * FROM表   WHERE LEFT(geohash,6)('dqcjqc',   'dqcjqf','dqcjqb','dqcjr1','dqcjq9','dqcjqd','dqcjr4','dqcjr0','dqcjq8');

This gives us a bounding box around 'dqcjqcp84c6e' roughly 2km x 1.5km and allows for a database search on just 9 keys: SELECT * FROM table WHERE LEFT(geohash,6) IN ('dqcjqc', 'dqcjqf','dqcjqb','dqcjr1','dqcjq9','dqcjqd','dqcjr4','dqcjr0','dqcjq8');

转换为SimpleDB的查询会是其中GeoHash6在('dqcjqc','dqcjqf','dqcjqb','dqcjr1','dqcjq9','dqcjqd','dqcjr4, dqcjr0','dqcjq8'),然后你会做的结果你的半正矢过滤,以只得到这是你的搜索半径范围内的项目。

Translated to a SimpleDB query that'd be where GeoHash6 in('dqcjqc', 'dqcjqf', 'dqcjqb', 'dqcjr1', 'dqcjq9', 'dqcjqd', 'dqcjr4', 'dqcjr0', 'dqcjq8') and then you'll do your Haversine filtering on the results in order to only get the items that's within your search radius.