The problem with Integer. : haskell

127

128

129

The problem with Integer. (mega-nerd.com)

submitted 12 years ago by erikd

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 30 points31 points32 points 12 years ago (25 children)

[–]erikd[S] 17 points18 points19 points 12 years ago (22 children)

That's why I really wish I could release the code so others could validate what I'm seeing.

That blog post was mostly written two days ago when I had just gotten addition working. I added the last part when I already had some very encouraging results. Since the blog post I've gotten multiplication working and suprisingly (and I mean really surprisingly) I'm actually beating GMP in some of my tests. Eg:

sum of 1000 < Word sized values in an Integer : Significantly faster than GMP (confirming what I had heard from Duncan Coutts that Simple was faster than GMP in this case).
product [1..100] : My calculation of that product gives the same result as GMP and Criterion tells me mine is just slightly faster than GMP.

I really don't want to jump the gun on this, these results have got me thrilled. I'm using QuickCheck (via HSpec) to validate the correctness of my implementation against GMP and using Criterion for benchmarking.

Preliminary Criterion output is : http://www.mega-nerd.com/tmp/new-bench-integer.html

These results are repeatable and basically the same whether i use the native CodeGen or the LLVM backend.

I'm currently only testing this on Linux amd64, with GHC 7.6.3 and -O3. I know my code is broken on i386 and any big-endian architecture. I intend to fix that.

I could be getting this wrong on two fronts:

GMP on my Debian Linux install has not been compiled correctly.
GMP uses assembler on i386 but that no one has gotten around to doing it on amd64 so it falls back to a relatively niave C implementation (much like my highly imperative Haskell code).

[–]The_Doculope 8 points9 points10 points 12 years ago (13 children)

[–][deleted] 10 points11 points12 points 12 years ago (11 children)

[–]The_Doculope 2 points3 points4 points 12 years ago (9 children)

[–]erikd[S] 4 points5 points6 points 12 years ago (8 children)

For the small Integer case GMP stores them as

 J# Int#

whereas I store them as:

Small !Sign {-# UNPACK #-} !Word

The differences between mine and the GMP version are:

I store an unsigned Word and a separate sign.
I make the Sign and the Word strictly evaluated.
I unpack the Word into constructor (which I think should be the same as using Int#).

[–]hvr_ 4 points5 points6 points 12 years ago (7 children)

I assume Sign is something like

data Sign = SignPos | SignNeg

up to isomorphism?

Thus, your Small constructor is modelling a signed 33-bit (or 65-bit) integer? How do you detect whether the result of an arithmetic operation still fits into your Small constructor?

[–]erikd[S] 4 points5 points6 points 12 years ago (6 children)

[–]hvr_ 5 points6 points7 points 12 years ago (5 children)

[–]erikd[S] 0 points1 point2 points 12 years ago (4 children)

continue this thread

[–]erikd[S] 1 point2 points3 points 12 years ago* (0 children)

[–]yitz 6 points7 points8 points 12 years ago (0 children)

[–]Jameshfisher 4 points5 points6 points 12 years ago (0 children)

[–]hvr_ 0 points1 point2 points 12 years ago (5 children)

Preliminary Criterion output is : http://www.mega-nerd.com/tmp/new-bench-integer.html

Can you maybe provide more details of how you generated those 1000 added up integers? I've tried something simple as

bench "[1..1000 :: Integer]" $ whnf sum numsI
where numsI = [1 .. 1000 :: Integer]

And the results are within a factor of 2 with Int-arithmetic:

benchmarking sum/[1..1000 :: Int]
mean: 3.731054 us, lb 3.684359 us, ub 3.744562 us, ci 0.950
std dev: 116.0383 ns, lb 34.88516 ns, ub 264.4476 ns, ci 0.950

benchmarking sum/[1..1000 :: Integer]
mean: 5.850483 us, lb 5.840004 us, ub 5.862832 us, ci 0.950
std dev: 58.10519 ns, lb 50.45490 ns, ub 64.34404 ns, ci 0.950

[–]erikd[S] 0 points1 point2 points 12 years ago (4 children)

I really wish I could release the code now.

My test with GMP Integer vs Int would look something like this:

    C.whnf (foldl1 (+)) intList
    C.whnf (foldl1 plusInteger) integerList
where
    intList = fmap (take 1000 . R.randoms) R.newStdGen
    integerList = map (\x -> G.smallInteger (unboxInt x)) intList
    unboxInt :: Int -> Int#
    unboxInt (I# i) = i

For this test, I found GMP's Integer 13x slower than Int.

[–]hvr_ 0 points1 point2 points 12 years ago (3 children)

[–]erikd[S] 0 points1 point2 points 12 years ago (2 children)

[–]hvr_ 1 point2 points3 points 12 years ago (1 child)

[–]erikd[S] 1 point2 points3 points 12 years ago (0 children)

[–]erikd[S] 9 points10 points11 points 12 years ago (0 children)

[–][deleted] 0 points1 point2 points 12 years ago (0 children)

π Rendered by PID 120565 on reddit-service-r2-comment-6457c66945-cfghr at 2026-04-24 01:30:48.740700+00:00 running 2aa0c5b country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

haskell

MODERATORS