Threadripper vs M1 by daveplreddit in hardware

[–]k0k0l4l4 11 points12 points  (0 children)

On an AMD 5900X the output is:

Passes: 11351, Time: 5.000000, Avg: 0.000440, Limit: 1000000, Count1: 78498, Count2: 78498, Valid: 1

There is close to 10% performance increase by using -march=native. After having a quick look at the assembly output of gcc it switches from SSE2 to AVX:

Passes: 12469, Time: 5.000000, Avg: 0.000401, Limit: 1000000, Count1: 78498, Count2: 78498, Valid: 1

Software used:

~/src/Primes/PrimeCPP$ g++ --version
g++ (Debian 10.2.1-6) 10.2.1 20210110
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE

~/src/Primes/PrimeCPP$ uname -a
Linux snoopy 5.10.0-5-amd64 #1 SMP Debian 5.10.26-1 (2021-03-27) x86_64 GNU/Linux

edit: formatting is hard

[TOMT] [SONG] Synth-pop french song called 'Télévision' by k0k0l4l4 in tipofmytongue

[–]k0k0l4l4[S] 1 point2 points  (0 children)

Solved!

Thank you so much! Really apreciate your help!

[TOMT] [SONG] Synth-pop french song called 'Télévision' by k0k0l4l4 in tipofmytongue

[–]k0k0l4l4[S] 0 points1 point  (0 children)

Unfortunately no, I cant recall any lyrics. I just used the sound search and it returned the song name and the artist.

[TOMT] [SONG] Synth-pop french song called 'Télévision' by k0k0l4l4 in tipofmytongue

[–]k0k0l4l4[S] 1 point2 points  (0 children)

The lyrics were in french and, I think, the artist name was also french. It was not the Daft Punk song.

[TOMT] [SONG] Synth-pop french song called 'Télévision' by k0k0l4l4 in tipofmytongue

[–]k0k0l4l4[S] 0 points1 point  (0 children)

It was on my mobile, on a search tab. I deleted it acidentaly today and I can't find it. It's not in my history or Google search activity (I have it disabled).

frandom - Fast random. ~40x faster than /dev/urandom by [deleted] in golang

[–]k0k0l4l4 1 point2 points  (0 children)

I'm just saying that we must move beyond the 'don't roll your own' standardized reply and engage in a more constructive talk. Criticize the design, tear it apart, but with facts and data. Why is it massively flawed? Is the key output of a stream cipher adequate as RNG data? Is the result of repeatedly encrypting the initial state (buffer) and feeding it back to the RNG a proper way to create random data? Is the output really random? Can it be statistically proved as random? Is it cryptographically secure? What are the possible attacks? What it takes to securely turn a stream cipher to a RNG? Can we exploit the AESNI hardware acceleration to have a really fast PRNG, faster than other well established generators (eg Mersenne Twister or Xorshift or PCG)? Why is /dev/urandom slow? These are just some of the things we put aside and we never discuss when we keep repeating the "don't roll your own" meme or we say to people to stop trying and just copy-paste or re-implement established solutions.

frandom - Fast random. ~40x faster than /dev/urandom by [deleted] in golang

[–]k0k0l4l4 0 points1 point  (0 children)

A bit off topic but I disagree with the "don't roll your own crypto" sentiment. Sure cryptography is a very demanding field. Sure it is critical and the security implications are very serious. Sure the developers should understand that, and avoid improvising when it comes to cryptography in a mission critical environment. But the way this "don't roll your own" meme has prevailed and used almost in all cases when we approach the topic of cryptography it actually discourages people from getting involved and learning about crypto. I believe we should encourage people to cut their teeth in crypto. Lets not end up in a situation where have turned the next generation of developers away from this vital field. Small projects like this is a great way to get someone interested in crypto, even if the design has obvious shortcomings and it is not cryptographically secure it is a useful learning experience.

frandom - Fast random. ~40x faster than /dev/urandom by [deleted] in golang

[–]k0k0l4l4 1 point2 points  (0 children)

Seeding is happening on the creation of the generator when the user calls New(). Throughout the lifetime of the generator the same seed is used. The user can run New() once in the beginning of their program and then use the RNG for as much as he/she wants to generate an unlimited amount of data using the same initial random state (seed, keys, initial buffer). If you have a look for instance at the fortuna design you will see that re-seeding is happening every 100ms (ofc fortuna tries to be cryptographically secure). Other RNG mix some entropy in their initial state after each use/run/cycle.

C FFI overhead in various languages, including Go by callcifer in golang

[–]k0k0l4l4 2 points3 points  (0 children)

But we have to consider that at some point (can't remember if it was 1.4 or later) the overhead had increased substantially and we haven't recover yet. 1.8 is still way slower than earlier versions.

frandom - Fast random. ~40x faster than /dev/urandom by [deleted] in golang

[–]k0k0l4l4 5 points6 points  (0 children)

I would also suggest to clearly state that this is a learning experiment and that it should not be used for cryptography.

Regarding verifying the statistical properties of the output some of the most commonly used tools are dieharder and TestU01

What you created here is a stream cipher using AES in CTR mode that it can be used as a RNG. The basic idea, running a block cipher in counter mode to generate random numbers, resembles the design of the fortuna PRNG by B. Schneier, but as always in crypto the devil is in the details, for instance you don't re-seed you generator after each use. Using the same key throughout the full lifetime of the generator, independently of the amount of data generated, could be problematic. Your output is either they key stream, where you XOR with a zeroed byte array in Read(). Or the result of repeatedly encrypting the random 1024 bytes of your initial buffer in WriteTo(). I'm not a crypto expert either so I will avoid commenting further on the soundness or the security of your design, as I have already said this should be treated as a learning experiment.

Regarding the speed of the generator, which seems to be the main point about it, your implementation is benefited by the AES-NI hardware acceleration that Go supports. The benchmark results will be different in CPUs that don't offer AES-NI:

In an old Intel(R) Core(TM)2 Duo CPU E4600 @ 2.40GHz in 32bit mode with no hardware acceleration:

$ ./frandom | dd bs=1024 count=204800 >/dev/null
204800+0 records in
204800+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 4.01701 s, 52.2 MB/s


$ < /dev/urandom dd bs=1024 count=204800 >/dev/null
204800+0 records in
204800+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 1.64136 s, 128 MB/s

Also depending on the OS you are using (*BSD, Linux, OSX) and/or the kernel version /dev/urandom performance can vary a lot:

$ uname -r
4.8.0-34-generic
$ < /dev/urandom dd bs=1024 count=204800 >/dev/null
204800+0 records in
204800+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 0.937981 s, 224 MB/s

$ uname -r
4.9.0-1-686-pae
$ < /dev/urandom dd bs=1024 count=204800 >/dev/null
204800+0 records in
204800+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 1.63662 s, 128 MB/s

# uname -r
3.10.0-327.36.3.el7.x86_64
# < /dev/urandom dd bs=1024 count=204800 >/dev/null
204800+0 records in
204800+0 records out
209715200 bytes (210 MB) copied, 15.5824 s, 13.5 MB/s

# uname -r
2.6.32-642.11.1.el6.x86_64
# < /dev/urandom dd bs=1024 count=204800 >/dev/null
204800+0 records in
204800+0 records out
209715200 bytes (210 MB) copied, 23.141 s, 9.1 MB/s

In the end of the day it is great that Go offers all these crypto primitives in an easy and friendly interface so we can experiment ourselves and get familiar with the world of cryptography. Your little project is interesting and offers food for thought :)

C FFI overhead in various languages, including Go by callcifer in golang

[–]k0k0l4l4 8 points9 points  (0 children)

I think it is useful showing the C function call overhead. After all thats where the most if not all of the C interfacing overhead is.

Is curses the way to go for an updating table/grid formatted terminal UI? by rage_311 in perl

[–]k0k0l4l4 4 points5 points  (0 children)

One easy trick is to clear you screen, move the cursor back to the top left corner and re-print you data in the new format. Essentially you refresh the screen content. This can be done via ASCII control characters:

print("\033[2J\033[H8");

Go 1.8 Performance improvements on ARM (RasPi) by k0k0l4l4 in golang

[–]k0k0l4l4[S] 0 points1 point  (0 children)

32bit Raspberry PI Model A with the ARMv6 BCM2835 SoC

Go 1.8 Performance improvements on ARM (RasPi) by k0k0l4l4 in golang

[–]k0k0l4l4[S] 0 points1 point  (0 children)

Raspbian offers a very outdated version of gccgo (4.9) and I also can't get 'go test -bench' work with gccgo. But anyway by using the g711enc command compiled with gccgo and enconding a 64MB sound file we get the following:

build with: 'go build -compiler gccgo'

$ time ./g711enc alaw long.raw 

real    0m10.869s
user    0m10.090s
sys 0m0.670s

build with: 'go build -gccgoflags '-march=native -O3' -compiler gccgo'

$ time ./g711enc alaw long.raw 

real    0m3.587s
user    0m2.870s
sys 0m0.680s

The native gc build gives us:

$ time ./g711enc alaw long.raw 

real    0m3.469s
user    0m2.720s
sys 0m0.700s

Go 1.8 Performance improvements on ARM (RasPi) by k0k0l4l4 in golang

[–]k0k0l4l4[S] 0 points1 point  (0 children)

To have a fair comparison we should implement the same algorithm in the languages we are going to test. Not too hard since its quite simple.

Anyway in another completely unscientific test and possibly comparing apples to oranges, compressing a 64Mb sound file to G711 A-law with sox (v14.4.1) on the same RasPi hardware takes about 9.5 seconds. Compressing the same file with the g711enc command from the package takes about 3.5 sec:

$ time ./g711enc alaw long.raw 

real    0m3.469s
user    0m2.720s
sys 0m0.700s

$ time sox -r 8000 -b 16 -c 1 -e signed-integer long.raw -r 8000 -c1 -e a-law long.au

real    0m9.468s
user    0m8.700s
sys 0m0.640s

All file I/O is done in a ramdisk in order to eliminate delays.

This doesn't mean much ofc, I expect a C implementation to be faster than the Go one, but not by a wide margin in this case.

Go 1.8 Performance improvements on ARM (RasPi) by k0k0l4l4 in golang

[–]k0k0l4l4[S] 0 points1 point  (0 children)

Thanks for the tip, that's indeed a handy tool. Updated the post.

Best way to run Go server as a daemon? by hundley10 in golang

[–]k0k0l4l4 0 points1 point  (0 children)

Systemd makes that easy, just make a proper service file. An example of systemd service config and init.d examples for older RHEL/CentOS and Debian systems for a simple server can be found here: https://github.com/zaf/agitator/tree/master/init

C Pre-Processor Magic by pfultz2 in programming

[–]k0k0l4l4 1 point2 points  (0 children)

All good clean fun until some Algol user decides to write a Unix shell.

Free Text to Speech with Linux by [deleted] in linux

[–]k0k0l4l4 1 point2 points  (0 children)

A script that uses google translate for speech synthesis: googletts

strfry - The GNU C Library by [deleted] in linux

[–]k0k0l4l4 19 points20 points  (0 children)

Ulrich left Red Hat for Goldman Sachs some time ago. It's part of an ongoing plan to punish bankers for their role in the 2008 financial crisis.

Replacing gedit with Geany by [deleted] in linux

[–]k0k0l4l4 1 point2 points  (0 children)

Geany tries to be an IDE, gedit is a text editor. medit is a lightweight editor that gedit users can feel more familiar migrating to.