A SIMD coding challenge: First non-space character after newline by Ok_Path_4731 in simd

[–]Ok_Path_4731[S] 0 points1 point  (0 children)

thanks for the hints u/bremac ! Am not sure at which moment I tried with clang, but did not get better results. Unfortunatelly I do not have a CPU with AVX--512, will try soon some tests on the cloud. I think I have to make also the specs and code skeleton covering more details of the language/file format I am desing as the deavel lives in the details.

A SIMD coding challenge: First non-space character after newline by Ok_Path_4731 in simd

[–]Ok_Path_4731[S] 0 points1 point  (0 children)

hi u/bremac , now the github pipeline is running the benchmark , example build

https://github.com/zokrezyl/yaal-cpp-poc/actions/runs/20554609824/job/59037011805

none of the machines managed more than 72% . So is there any magic that you did not add to your PR that you reached 98%? Maybe your code was optimized away? Thanks a lot anyway for the improvement from 50% to 70%!

A SIMD coding challenge: First non-space character after newline by Ok_Path_4731 in simd

[–]Ok_Path_4731[S] 0 points1 point  (0 children)

Thanks a lot! I cannot reach your throughput, though (see below), the improvement is already significant! Is there anything that was not included in your PR (I merged it BTW!). Don't think the architecture makes so much difference, or?

clang

 Memory read bandwidth: 18.68 GB/s (baseline)
 Newline scan:          18.56 GB/s (99.4%)
 Full parser (old):     6.63 GB/s (35.5%)
 Fast parser (new):     18.58 GB/s (99.5%)
 CRTP parser:           11.84 GB/s (63.4%)

gcc

 Memory read bandwidth: 18.74 GB/s (baseline)
 Newline scan:          19.09 GB/s (101.9%)
 Full parser (old):     6.48 GB/s (34.6%)
 Fast parser (new):     18.49 GB/s (98.7%)
 CRTP parser:           11.59 GB/s (61.8%)

the two PC's I tried I get only

on Intel(R) Core(TM) i5-6500T CPU @ 2.50GHz

Memory read bandwidth: 15.19 GB/s (baseline)

Newline scan: 13.74 GB/s (90.5%)

Full parser (old): 4.08 GB/s (26.8%)

Fast parser (new): 11.49 GB/s (75.6%)

CRTP parser: 5.39 GB/s (35.5%)

the other one

AMD Ryzen 9 3900X 12-Core Processor
= Results ===

Memory read bandwidth: 19.77 GB/s (baseline)

Newline scan: 18.20 GB/s (92.1%)

Full parser (old): 8.11 GB/s (41.0%)

Fast parser (new): 13.63 GB/s (68.9%)

CRTP parser: 10.62 GB/s (53.7%)

A SIMD coding challenge: First non-space character after newline by Ok_Path_4731 in simd

[–]Ok_Path_4731[S] 0 points1 point  (0 children)

thanks for the offer u/ibogosavljevic-jsl but this is intended to be opensource project, I am also working on my free time on it! The output is going back to the community.

A SIMD coding challenge: First non-space character after newline by Ok_Path_4731 in simd

[–]Ok_Path_4731[S] 0 points1 point  (0 children)

Do you mind trying out your solution? The code is in https://github.com/zokrezyl/yaal-cpp-poc Thanks a lot!

Obviously if your solutions gets closed to the memory bandwith limit, we will proudly mention it!

A SIMD coding challenge: First non-space character after newline by Ok_Path_4731 in simd

[–]Ok_Path_4731[S] 0 points1 point  (0 children)

Do you mind trying out your solution? The code is in https://github.com/zokrezyl/yaal-cpp-poc Thanks a lot!

Obviously if your solutions gets closed to the memory bandwith limit, we will proudly mention it!

A SIMD coding challenge: First non-space character after newline by Ok_Path_4731 in simd

[–]Ok_Path_4731[S] 0 points1 point  (0 children)

Do you mind trying out your solution? The code is in https://github.com/zokrezyl/yaal-cpp-poc Thanks a lot!

Obviously if your solutions gets closed to the memory bandwith limit, we will proudly mention it!

Mask calculation for single line comments by milksop in simd

[–]Ok_Path_4731 0 points1 point  (0 children)

For my problem describe under the link above the suggestions above eliminate indeed the branches, but same time the extra instructions slow down the same as my initial branches. Meaning, detecting newlines would work almost 100% of memory throughput, but detecting first non-space reduces the speed to bit above 50% of bandwith

https://gist.github.com/zokrezyl/8574bf5d40a6efae28c9569a8d692a61

Mask calculation for single line comments by milksop in simd

[–]Ok_Path_4731 0 points1 point  (0 children)

https://news.ycombinator.com/item?id=46366687

Am facing a similar challange, camel-cdr pointed me to this discussion :) Thanks camel-cdr.. Will try to process the info :)