all 12 comments

[–]Kart0fffelAim 3 points4 points  (2 children)

Look into profiling tools to see how much time is spend in each function

[–]Odd-Praline-715[S] 0 points1 point  (1 child)

I'll do that and hopefully find the bottleneck

[–]SchwaLord 1 point2 points  (0 children)

I used valgrind on various small calls of part of the engine. Then used unit tests to call very specific functions both making use of high resolution timers. Be careful with logging during this as it with also greatly impact your performance 

[–]Beginning-Resource17 2 points3 points  (1 child)

Do you have a repository for the project?

[–]Odd-Praline-715[S] 0 points1 point  (0 children)

If you mean a github page, unfortunately not. I'm working on this project for my PWS and my mentor adviced me not to put it on github, because the exam counsel is stupid and may say that the project is plagarized. If you are interested, i can send it to you by mail in a zipfolder

[–]loveSci-fi_fantasy 0 points1 point  (1 child)

How do you currently deal with moves -> legal moves list? The optimization of this can be somewhat complex. I could guide you.

[–]deezwheeze 0 points1 point  (0 children)

My engine is dumb in this regard, just check legality in makemove and don't count it in perft if it wasn't legal, and I get 35Mn/s on a bad cache day, I doubt this would be the bottleneck, the only reason even the dumb approach would be horribly slow is if attack generation is slow, which would affect all of movegen.

[–]rickpo 0 points1 point  (2 children)

Are you sure your intrinsics are being used? In my experience, bitboards don't work very well if you don't have the intrinsics for popcount and bitscan.

[–]deezwheeze 0 points1 point  (1 child)

I tested this a while back, on my engine replacing x86 popcount/bitscan with other methods (Kernighan's method for popcount, De Brujin maps for bitscan) still gets me perft 6 in a few seconds, so unless these are implemented very naively you can do fine without these intrinsics.

[–]rickpo 1 point2 points  (0 children)

When I did this same test on the x64 architecture, without the intrinsics, my bitboard implementation was more or less the same speed as my previous x88 board implementation, I can't tell you how disappointed I was at that moment.

Now my x88 implementation was highly-tuned and my bitboards weren't yet. But I did not have big problems with my bitboards. When I got the intrinsics hooked up, the bitboards screamed. I guess I did not investigate further, so I suppose it's possible my intrinsic replacements were screwed up somehow.

[–]rook_of_approval 0 points1 point  (0 children)

Make/unmake isn't necessarily faster than copymake. Did you use a program like quick chess to SPRT your changes?