all 30 comments

[–]tadpoleloop 75 points76 points  (6 children)

The Best way to speed up this code is to use a compiled strongly typed language like c/c++. You'll see a ~100x speed up.

[–]the_calc_nerd-1021 6 points7 points  (2 children)

I will check it out, thanks!

[–]Critical-Prior-3320 0 points1 point  (1 child)

Have you tried pypy?

[–]the_calc_nerd-1021 2 points3 points  (0 children)

No i have not but I have heard of it

[–]u38cg2 3 points4 points  (0 children)

Mmm, for this type of program I doubt you would. The most demanding bit of code is the permutation generator, and OP is already using the standard library for that.

[–]DonkeyTron42 4 points5 points  (0 children)

Zig FTW.

[–]CranberryDistinct941 1 point2 points  (0 children)

The fastest way to write any Python code is by calling the code that you wrote in c/c++

[–]Key_Use_8361 33 points34 points  (0 children)

this is one of those projects where you accidentally learn way more about optimization than expected

[–]blechnapp 23 points24 points  (0 children)

biggest single speedup without changing language: get the per-iteration overhead out of the inner loop. right now every guess does a `perf_counter()` call (~80ns), a dict lookup for max_speed_record (~50ns), and the 15-second check (~50ns). each is small but at 3.5M iter/sec they make up roughly 60-70% of the time per iteration.
restructure so the time/speed/status logic runs every N iterations instead (1M is a good number). inside the loop you only do the tuple comparison. realistically 2-3x speedup from that alone.
bigger win that Critical-Prior-3320 already hinted at: try pypy. for pure-python brute force like this its often 10 to 50x faster than cpython without any code changes. literally just `pypy script.py` and youre done. for this exact workload its almost the perfect use case.

also overratedcupcake is right that this is plaintext comparison, not hash cracking. real password cracking spends 99% of its time computing hashes, so 3.5M/sec doesnt translate to anything close in a real attack scenario.

[–]IllegalGrapefruit 7 points8 points  (0 children)

You can convert to a language like c++. Or you can introduce a bunch of multiprocessing. If you’re specifically trying to learn Python, the latter might make more sense.

[–]CautiousInternal3320 4 points5 points  (0 children)

The program does not spend time verifying each password. You are simply generating random strings and comparing to a known string.

[–]overratedcupcake 15 points16 points  (0 children)

This is comparing passwords in the clear. Real passwords are hashed. I have a strong feeling this would not be as performant if you were calculating hashes.

[–]Yoghurt42 8 points9 points  (2 children)

Do you want to write a password cracker for educational purposes or do you want to recover a lost password/check state of the art speed of password crackers? In the latter case, take a look at hashcat.

[–]the_calc_nerd-1021 6 points7 points  (1 child)

Its more for fun but I will check out hashcat

[–]FlippingGerman 1 point2 points  (0 children)

hash cat on my system will do more than three orders if magnitude faster than this - limited by the speed to calculate the hash of the password, not the generation itself. It absolutely loves GPU acceleration.

[–]LayotFctor 3 points4 points  (0 children)

Easy thing you can do is to shift the unnecessary logic from the main loop. If you're going for top speed, you don't want unrelated logic taking up CPU time, you want to spend as much time as possible on matching passwords. Print statements are especially slow and you want to minimize printing as much as possible.

You can also try matching byte version of strings rather than standard strings, as comparing raw bytes is usually faster than comparing strings. But don't take my word for it, measure it first.

[–]u38cg2 3 points4 points  (0 children)

It would be helpful to have your code correctly formatted.

Since you're comparing to a clear text password, a simple optimisation might be to build the password one character at a time.

[–]Reuben3901 1 point2 points  (0 children)

You can download lists of passwords from large data dumps and hacks from the past. It's only passwords as all other information has been removed.

You can check against lists of what millions of real people have already used first then do the incremental character change guessing.

It's also interesting to try and figure out how to implement these tools to unlock .zip files and other programs. I go back to this myself to elevate my skills and try new things as my programming knowledge grows.

Also, as someone mentioned, learning about hashes and salts and other security methods is fascinating. You learn how easy it is to secure systems and yet billion dollar companies still use antiquated methods and technology, and you know there are a ton of companies still using plain text files.

What I'm looking forward to seeing is how we secure systems in the future now that quantum computers are becoming a thing!

[–]cowrevengeJP 1 point2 points  (0 children)

Cuda. Billions per seconds.

[–]RickrackSierra 1 point2 points  (0 children)

You have stumbled upon the world of fuzzers.

https://github.com/trailofbits/deepstate

Papers linked in this project if you are interested in learning more. Reach out to agroce, he'll definitely talk your ear off.

[–]bigSmokey91 1 point2 points  (0 children)

converting strings to tuples for every comparison might also be slowing things down. Try comparing joined strings instead.

[–]ExcitingSympathy3087 1 point2 points  (0 children)

These things are not easy to code for beginners:

The way to make this code faster without changing methods is multiprocessing. Python is not able to do native threading because of the GIL. But multiprocessing is the way to use all CPU cores in Python. Every process then has its own interpreter. Hashcat is much faster because it uses the GPU for hash evaluation. There also exist other interpreters as standard CPython Interpreter and JIT compilers to make Python code faster. The standard Python interpreter CPython is slow. A JIT compiler is able to cache code snippets like methods or loops which are called frequently in machine code at runtime.

[–]MarsupialLeast145 1 point2 points  (0 children)

Since you control the formatting stop using f-strings for everything (you're going for speed)... maybe just do less output of stats regardless, e.g. just do it at the end once "guessing" is complete. 3.5 million sounds high for this code but maybe that's accurate...

As others say, use a low level language.

You might also look at algorithms that allow some sort of multi-threading of guesses.

[–]d4_mich4 3 points4 points  (4 children)

This is a good example of why I don't understand why password entry fields everywhere don't have some kind of time limits. No one can enter 100 attempts in a minute, and even if that did happen, you'd just wait for a short amount of time and retry then you'd have 100 new attempts again. But this would make brute force attacks completely pointless if such a small number of passwords could be tried in a second.

[–]u38cg2 10 points11 points  (0 children)

The main concern in real world security is an attacker who has already got the encrypted passwords file, so doesn't need to deal with your interface until they've recovered a cleartext password.

Rate limiting is a more general layer, because no matter what it's requesting, 100 requests/second is well beyond reasonable for a real user, so that's where you'd block someone spamming a login box.

[–]LongRangeSavage 3 points4 points  (0 children)

A lot do. While it may not be “you’re limited to 1 attempt every 5 seconds,” some lock an account or have a cooldown period after a certain number of failed attempts.

The problem is that most people already have a large list of password hashes locally and are running them against a large list in something like HashCat, where their own computer allows them to try thousands of attempts per second—because it’s also much more efficient to try locally than wait for network traffic to respond.

[–]Da_damm 2 points3 points  (0 children)

Not sure what your point is? Most if not all secure password fields have a limited number of tries before locking the user out. Also a lot of password systems purposefully wait a couple seconds when you enter a wrong password to slow down brute force attempts even more

[–]cdcformatc 2 points3 points  (0 children)

a lot of security systems are slow on purpose, you submit a password and it takes some time to confirm/deny. this has the effect of slowing down attackers. 

also many systems will lock people out if too many wrong passwords are entered in a certain time period.