如何在霓虹灯内在一个浮点数除以霓虹灯、浮点数、如何在

2023-09-06 02:00:59 作者:D调卡农的忧伤

首先,对不起我的英文不是很好,但我会试着解释我的问题,我可以。

First of all, sorry for my english is not perfect but i will try to explain my problem all i can.

一个小的什么,我做的是处理图像由四个像素的时候,这对ARMv7的一个Android应用背景。

A little of background of what I'm doing is processing a image by four pixels at the time, this on a armv7 for an android application.

我想通过另一种载体将一个float32x4_t载体,但在它的数字是可变的,大约0.7至3.85多少,我期待来划分的唯一途径是右移,但是,对于一些2的n次方,所以如果一些能帮助得到这个。

I want to divide a float32x4_t vector by another vector but that numbers in it are variable and around 0.7 to 3.85 more less, and the only way i look to divide is with right shift but that for a number 2^n, so if some could help to get this.

另外,我在这个新的,所以任何意见是值得欢迎的,如果您需要了解我在做什么,我会尽我最好的回应任何问题的详细信息。

Also I'm new in this, so any comment is welcome, or if you need more info about what I'm doing i will try to do my best to respond any question.

确定我会尝试用一个例子来说明吧:

ok i will try it explain it with a example:

我怎么能与霓虹灯内部函数执行该指令

how i can perform this instructions with neon intrinsics

float32x4_t a = {25.3,34.1,11.0,25.1};
float32x4_t b = {1.2,3.5,2.5,2.0};
//    somthing like this
float32x4 resultado = a/b; // {21.08,9.74,4.4,12.55}

只记得B间隔之间可以改变,因为这一切都是在为改变向量b

just remember that b can change between intervals because all this is on a for that alter the vector b

推荐答案

的NEON指令集不具有浮点鸿沟。

The NEON instruction set does not have a floating-point divide.

如果您知道的的先验的是你的价值观不差的缩小,并且不要求正确舍入(这是几乎可以肯定,如果你正在做的图像处理的情况下),那么你可以使用倒数估计,细化步骤,繁衍,而不是一个除法:

If you know a priori that your values are not poorly scaled, and you do not require correct rounding (this is almost certainly the case if you're doing image processing), then you can use a reciprocal estimate, refinement step, and multiply instead of a divide:

// get an initial estimate of 1/b.
float32x4_t reciprocal = vrecpeq_f32(b);

// use a couple Newton-Raphson steps to refine the estimate.  Depending on your
// application's accuracy requirements, you may be able to get away with only
// one refinement (instead of the two used here).  Be sure to test!
reciprocal = vmulq_f32(vrecpsq_f32(b, reciprocal), reciprocal);
reciprocal = vmulq_f32(vrecpsq_f32(b, reciprocal), reciprocal);

// and finally, compute a/b = a*(1/b)
float32x4_t result = vmulq_f32(a,reciprocal);