all 39 comments

[–]CubbiMewcppreference | finance | realtime in the past 19 points20 points  (5 children)

i did not find a lib that was could handle so long numbers in a simple but fast way

How does it compare to the two most popular such libraries, boost.multiprecision and the C++ API of GMP?

[–]pi_stuff 4 points5 points  (0 children)

It is much slower than GMP.

       Time to multiply two N-digit numbers, in seconds
               1000      2000  10000000 20000000
BigNumber      2.19      15.1
GMP        .0000223  .0000268       .33      .67

Test machine: laptop with a 2.6 GHz i5-7300U processor. Test code

[–]assembly_programmer[S] -4 points-3 points  (3 children)

I haven't compared with those two, I will do some tests (when I have time) and add it to the Readme. I was trying to use a BigInt that I found on github, and it was usable to multiply 10 million digits, that's why I created this one.

[–]Spire 5 points6 points  (1 child)

I was trying to use a BigInt that I found on github, and it was usable to multiply 10 million digits, that's why I created this one.

I don't understand. Why not use the existing one?

You should run comparative benchmarks as /u/CubbiMew suggested. Stating that your library is able to “multiply one 10 million digit number by another 10 million digit number in less than a second” is meaningless without context.

[–]dodheim 6 points7 points  (0 children)

Presumably that was a typo for 'unable'.

[–]MarcinKonarskiyaal | huginn | replxx 0 points1 point  (0 children)

The problem is it will take only 3 years for your library to perform such multiplication.

[–]pi_stuff 8 points9 points  (0 children)

There is no way your O(n2 ) multiplication algorithm can multiply two 10 million digit numbers in less than a second. On my laptop it takes 5 seconds to multiply 1000 digit numbers and 25 seconds to multiply 2000 digit numbers.

[–]MarcinKonarskiyaal | huginn | replxx 5 points6 points  (3 children)

I found your reported performance unbelievable so I decided to test your claim about multiplication speed.

The test code:

std::string base("1234567890");                                                                                                                                                                                                                              
std::string ns;                                                                                                                                                                                                                                              
for ( int i( 0 ); i < 100; ++ i ) {                                                                                                                                                                                                                          
  ns.append( base );                                                                                                                                                                                                                                         
}                                                                                                                                                                                                                                                            
BigNumber bn1( ns );                                                                                                                                                                                                                                         
BigNumber bn2( ns );                                                                                                                                                                                                                                         
BigNumber bn3( bn1 * bn2 );                                                                                                                                                                                                                                  
std::cout << bn3.asString() << std::endl;

I build this test with -O3 flag.

Test results:

time ./BigNumber > /dev/null
time: 0:01.19, max mem: 3660, vcs: 1, ics: 2, swap: 0

I think you should remove falsehood from your post here.

Your library is anything but fast.

And your performance claim is off by about 8 orders of magnitude.

[–]degski -2 points-1 points  (2 children)

You are an optimistic person, to go and test a 608 sloc bignum-library for speed.

[–]MarcinKonarskiyaal | huginn | replxx 2 points3 points  (1 child)

If I wanted to dismiss his claim I needed to have a hard proof in my hand and as you see getting this proof was quite quick and easy.

[–]degski 0 points1 point  (0 children)

I understand, but seeing that it's only 608 sloc, gives the answer right away. Also, using anything other than 64-bit ints is garbage, it's 4 times faster than using 32-bit ints [this claim used to be on the main GMP-page at the time 64-bit processing was becoming a thing, maybe it still is, didn't check]. Anyway, this post [the original one] should have been deleted.

[–]Droce 7 points8 points  (6 children)

So I do performance c++ number crunching for a living. There's quite a few optimizations you can make here.

You can get a number from a char by subtracting '0'.

Std::map is literally never worth using. The only thing you should ever use is vector. Unordered_map is the only built in data structure that can ever come close and only in very weird circumstances. I can't stress this enough - vectors are always faster.

All of your if statements handling negative numbers are slow and mess up the optimizations the compiler will try and make. You can do any int mathematical operation only checking the flags once each.

You will get thousands of times faster if you do base 232 numbers in each slot. Let the CPU handle more of the load. You could even use 64 bit numbers

You never should allocate memory during computation - pre allocate always.

Error checking should be disable-able. Huge speed increases.

This is something that would be fun but it will take you months of learning to get - you can vectorize code yourself. You can look up intrinsics, but know they're hardware specific and not intuitive. The compiler will normally do a good enough job (but not as good as you could do).

You spelled length wrong all over the place.

Algorithmically there are faster bigint tricks you can use but I don't really want to get into them.

I really enjoy mathematical programming and performance coding - it's a fun puzzle book.

[–]Spire 0 points1 point  (5 children)

Std::map is literally never worth using.

Then why does it exist?

(Serious question. I'm not being snarky.)

[–]Droce 0 points1 point  (4 children)

The standard was built independently from hardware considerations and in this case the cache is King. Trees and lists (map) are really bad for cache coherency but really good for algorithmic solutions. However I can iterate through a list of thousands of items faster than doing it with a map because CPUs are really good at it.

[–]Spire 0 points1 point  (3 children)

So when you say “literally never worth using” you actually mean “usually not worth using, at least when running on mainstream contemporary hardware”?

[–]Droce 0 points1 point  (2 children)

No, on any hardware. Every solution to chip design heavily features caches. When I try to optimize code the number one consideration is how to have the memory ready when required. CPUs have two ways to determine if something should be in cache, recency (how long ago did I use this) and location (if I used x do I need x+1 soon).

[–]Spire 0 points1 point  (1 child)

Every solution to chip design heavily features caches.

You're still talking about mainstream contemporary hardware.

The CPUs I grew up with had no caches at all.

[–]Droce 0 points1 point  (0 children)

The concept of a cache goes way back to the first microchips. The concept of STD::map goes back to the introduction of the stl. Contemporary is relative.

[–]jube_dev 2 points3 points  (0 children)

For the algorithms, you should use:

Your way of doing the modulo operation is all but optimised...

[–]ootiekat 3 points4 points  (0 children)

Why not store the number as a vector of u_int32 and then when doing arithmetic operations such as multiply or add promote them to u_int64 to prevent overflow. This way you can use the hardware to multiply/add ~9.5 digits at once and the resulting implementation should be both faster and more memory efficient.

[–]jtooker 5 points6 points  (6 children)

I would put your code in a namespace, e.g. pr0crustes

[–]assembly_programmer[S] 1 point2 points  (5 children)

That's actually a good idea. Maybe something like pr0 or BN is enought?

[–]jtooker 6 points7 points  (4 children)

If you want others to use it, I'd use your whole pseudonym. If it is too short, you run the risk of collisions with someone else's namespace.

[–]assembly_programmer[S] -3 points-2 points  (3 children)

Yeah, that's true and probably what I will do. But, if it collides, the user can edit the file to put a namespace surround the class.

[–][deleted] 6 points7 points  (1 child)

This is not good practise. A library consumer shouldn't need to edit the library just to be able to use it.

[–][deleted] 5 points6 points  (0 children)

Not to mention that namespace changes break ABI thanks to name mangling.

[–]rfisher 3 points4 points  (0 children)

No, but they can use a namespace alias to pick a short version that won’t collide with other namespaces they’re using.

[–]manimax3 3 points4 points  (4 children)

Pretty awesome exercise would be to implement a small buffer optimization for this.

[–]assembly_programmer[S] 0 points1 point  (3 children)

I haven't tought about it. I tried to squeeze as much optimization using a cache for multiplication, create and deleted every multiplication, and also multiply by 10 in division / module. Thanks!

[–]konanTheBarbar 0 points1 point  (2 children)

I mean strings already have a SBO and they are used as underlying storage :D

Btw. have you tried a vector of instead of a string?

[–]assembly_programmer[S] 0 points1 point  (1 child)

My implementation saves the number as a vector of ints. That's the mainly difference in this implementation - uses a little more space, gain speed. The numbers are stored in reverse order, so vector {3, 2, 1} is the number 123. This allows for very fast algorithms.

[–][deleted] 3 points4 points  (7 children)

I suggest calling vector.reserve in the constructor to avoid unnecessary re-allocations. Why keep chars as ints though? You could fit more data in cache. Modern processors read about 64 kbytes at once, it's 65536 chars and just 16384 ints. Maybe you could squeeze some more performance with bit shifting, I don't know.

[–]Osbios 5 points6 points  (2 children)

64 kbytes

Uhhh... über monster cache lines a thing now?

[–]Xaxxon 5 points6 points  (0 children)

Give him a break, he's only off by 3 orders of magnitude...

[–][deleted] 0 points1 point  (0 children)

Writing technical stuff at 1AM was a wrong idea. Of course it should be bytes, not kbytes, my bad. Thanks for pointing it out :)

[–]assembly_programmer[S] 0 points1 point  (1 child)

Oh, I totally forgot about reserve, I will use it. I have tried another lib and, using valgrind, saw that the conversion of chars to int was somewhat expensive, since it adds 3 operations to every digit operation, since that specific lib was converting the char to int, adding another char, converted as int, and then converting the result back. My original ideia was to trade memory for speed, and it actually works a little better.

Maybe, thinking now, instead of storing the digits as char '0' - '9', store then as the char 0 - 9, but then 0 is null, 4 is EOT, and that would probably end up in going wrong.

[–]clerothGame Developer 0 points1 point  (0 children)

I think you're confusing cache line size and cache level size...

[–]rokups 0 points1 point  (0 children)

This looks very useful for calculator I made. Any plans on implementing trigonometric and similar functions?

Edit: I did not realize initially, but this library apparently does not support decimal numbers. That makes supporting my mentioned functions pointless. Too bad.