性能问题与欧拉问题和递归的Int64的类型递归、问题、性能、类型

2023-09-07 22:35:23 作者:楽しい一日

目前,我正在学习Haskell使用项目欧拉问题,我的游乐场。 我震惊我的Haskell程序是如何缓慢竟然是相比同类 方案其他语言编写的。我想知道如果我forseen的东西,或者如果这是什么样的性能惩罚一个人使用哈斯克尔时所期望的。

I'm currently learning Haskell using the project Euler problems as my playground. I was astound by how slow my Haskell programs turned out to be compared to similar programs written in other languages. I'm wondering if I've forseen something, or if this is the kind of performance penalties one has to expect when using Haskell.

在下面的程序由问题331的启发,但我张贴,所以我不破坏任何其他人之前,已经改变了它。它计算绘制在2 ^ 30×2 ^ 30网格的离散圆的弧长。这是一个简单的尾递归执行情况和我确保弧长的累积变量跟踪的更新是严格的。然而,这需要近一年半分钟才能完成(与-O标志与GHC编译)。

The following program in inspired by Problem 331, but I've changed it before posting so I don't spoil anything for other people. It computes the arc length of a discrete circle drawn on a 2^30 x 2^30 grid. It is a simple tail recursive implementation and I make sure that the updates of the accumulation variable keeping track of the arc length is strict. Yet it takes almost one and a half minute to complete (compiled with the -O flag with ghc).

import Data.Int

arcLength :: Int64->Int64
arcLength n = arcLength' 0 (n-1) 0 0 where
    arcLength' x y norm2 acc
        | x > y = acc
        | norm2 < 0 = arcLength' (x + 1) y (norm2 + 2*x +1) acc
        | norm2 > 2*(n-1) = arcLength' (x - 1) (y-1) (norm2 - 2*(x + y) + 2) acc
        | otherwise = arcLength' (x + 1) y (norm2 + 2*x + 1) $! (acc + 1)

main = print $ arcLength (2^30)

下面是Java中的一个相应的实施。它需要大约4.5秒的时间完成。

Here is a corresponding implementation in Java. It takes about 4.5 seconds to complete.

public class ArcLength {
public static void main(String args[]) {
    long n = 1 << 30;
    long x = 0;
    long y = n-1;
    long acc = 0;
    long norm2 = 0;
    long time = System.currentTimeMillis();

    while(x <= y) {
        if (norm2 < 0) {
            norm2 += 2*x + 1;
            x++;
        } else if (norm2 > 2*(n-1)) {
            norm2 += 2 - 2*(x+y);
            x--;
            y--;
        } else {
            norm2 += 2*x + 1;
            x++;
            acc++;
        }
    }

    time = System.currentTimeMillis() - time;
    System.err.println(acc);
    System.err.println(time);
}

}

编辑:在注释中的讨论,我做了索姆修改的哈斯克尔code和做了一些性能测试。首先,我改变了N至2 ^ 29,以避免溢出。然后,我试过6种不同的版本:!用的Int64或诠释,并且可以选择NORM2或两者NORM2和ACC在申报前的刘海弧长XY NORM2 ACC 。所有与编译

After the discussions in the comments I made som modifications in the Haskell code and did some performance tests. First I changed n to 2^29 to avoid overflows. Then I tried 6 different version: With Int64 or Int and with bangs before either norm2 or both and norm2 and acc in the declaration arcLength' x y !norm2 !acc. All are compiled with

ghc -O3 -prof -rtsopts -fforce-recomp -XBangPatterns arctest.hs

下面是结果:

(Int !norm2 !acc)
total time  =        3.00 secs   (150 ticks @ 20 ms)
total alloc =       2,892 bytes  (excludes profiling overheads)

(Int norm2 !acc)
total time  =        3.56 secs   (178 ticks @ 20 ms)
total alloc =       2,892 bytes  (excludes profiling overheads)

(Int norm2 acc)
total time  =        3.56 secs   (178 ticks @ 20 ms)
total alloc =       2,892 bytes  (excludes profiling overheads)

(Int64 norm2 acc)
arctest.exe: out of memory

(Int64 norm2 !acc)
total time  =       48.46 secs   (2423 ticks @ 20 ms)
total alloc = 26,246,173,228 bytes  (excludes profiling overheads)

(Int64 !norm2 !acc)
total time  =       31.46 secs   (1573 ticks @ 20 ms)
total alloc =       3,032 bytes  (excludes profiling overheads)

我使用GHC 7.0.2在64位的Windows 7(Haskell的平台的二进制分发)。根据该意见,并没有在其他配置编译时会出现问题。这使我想到了Int64的类型是在Windows版本坏了。

I'm using GHC 7.0.2 under a 64-bit Windows 7 (The Haskell platform binary distribution). According to the comments, the problem does not occur when compiling under other configurations. This makes me think that the Int64 type is broken in the Windows release.

推荐答案

嗯,我安装了一个新的哈斯克尔平台7.0.3,并得到大致有以下核心程序( -ddump-SIMPL ):

Hm, I installed a fresh Haskell platform with 7.0.3, and get roughly the following core for your program (-ddump-simpl):

Main.$warcLength' =
  \ (ww_s1my :: GHC.Prim.Int64#) (ww1_s1mC :: GHC.Prim.Int64#)
    (ww2_s1mG :: GHC.Prim.Int64#) (ww3_s1mK :: GHC.Prim.Int64#) ->
    case {__pkg_ccall ghc-prim hs_gtInt64 [...]
           ww_s1my ww1_s1mC GHC.Prim.realWorld#
[...]

所以GHC已经意识到,它可以解开你的整数,这是很好的。但这种 hs_getInt64 调用看起来很像的C调用。纵观汇编输出( -ddump-ASM ),我们看到的东西,如:

So GHC has realized that it can unpack your integers, which is good. But this hs_getInt64 call looks suspiciously like a C call. Looking at the assembler output (-ddump-asm), we see stuff like:

pushl %eax
movl 76(%esp),%eax
pushl %eax
call _hs_gtInt64
addl $16,%esp

因此​​,这看起来很像在 Int64的每一个操作都被转换为在后台一个成熟的调用C。这是的慢的,很明显。

So this looks very much like every operation on the Int64 get turned into a full-blown C call in the backend. Which is slow, obviously.

本的来源$ C ​​$ C GHC.IntWord64 似乎验证:在32位版本的(如目前随平台之一),你将通过FFI接口只仿真。

The source code of GHC.IntWord64 seems to verify that: In a 32-bit build (like the one currently shipped with the platform), you will have only emulation via the FFI interface.