为什么浮动分裂慢?

2023-09-10 23:29:51 作者:寻一夜情管饭

什么是算法中的步骤做浮点除法?

What are the steps in the algorithm to do floating point division?

为什么比说,乘法?

时它做我们做师用手工的一样吗?通过由除数重复进行分流,减去的结果,以获得一个余数,再次对准数和持续直到余数是小于一个特定值α

Is it done the same way we do division by hand? By repeatedly dividing by the divisor, subtracting the result to obtain a remainder, aligning the number again and continuing till the remainder is less than a particular value?

另外,为什么我们得到的性能,如果不是做

Also, why do we gain on performance if instead of doing

a = b / c

我们

d = 1 / c
a = b * d

编辑: 基本上我是问,因为有人问我分发基于权重的分配竞争者之间的值。我做这一切的整数,后来要求转换为浮动,这就造成了经济放缓的表现。我知道如何将C或C ++执行这些操作,将导致缓慢只是有兴趣。

Basically I was asking because someone asked me to distribute a value among contenders based on the assignment of weights. I did all this in integers and was later asked to convert to float, which caused a slowdown in performance. I was just interested in knowing how would C or C++ do these operations that would cause the slowness.

推荐答案

从图区划硬件点是一个迭代算法,与所花费的时间正比于比特数。最快的部门,目前各地采用radix4算法生成每次迭代的结果4位。对于32位除法需要8个步骤最少。

From a hardware point of view division is a iterative algorithm, and the time it takes is proportional to the number of bits. The fastest division that is currently around uses the radix4 algorithm which generates 4 bit of result per iteration. For a 32 bit divide you need 8 steps at least.

乘法可以并行完成到一定程度。如果没有细谈,你可以打破一个大的倍增成几个小的,独立的。这些乘法可以直到你在一个位级,或前面停下来,在硬件上使用小查找表再次被分解。这使得乘法硬件从一个硅房地产点重,但速度非常快为好。这是经典的大小/速度的权衡。

Multiplication can be done in parallel to a certain degree. Without going into detail you can break up a large multiplication into several smaller, independent ones. These multiplications can again be broken down until you're at a bit-level, or you stop earlier and use a small lookup-table in hardware. This makes the multiplication hardware heavy from a silicon real estate point of view but very fast as well. It's the classic size/speed tradeoff.

您需要LOG2步骤结合并行计算的结果,所以32位乘法需要5个逻辑步骤(如果你去到最小)。幸运的是这5个​​步骤是一个很好的协议比分裂步骤(它只是增加)简单。这意味着,在实践中的乘法甚至更快。

You need log2 steps to combine the parallel computed results, so a 32 bit multiply need 5 logical steps (if you go down to the minimum). Fortunately these 5 steps are a good deal simpler than the division steps (it's just additions). That means in practice multiplies are even faster.

相关推荐