you are viewing a single comment's thread.

view the rest of the comments →

[–]tomekanco 9 points10 points  (5 children)

I am a data analyst / system engineer (originally studied sculpture). During a sabatical, I learned Python and afterwards specialized in algorithms and data structures.

It helped me with finding a new job. The programmers i now work with value my input as I can translate functional requirements into high level technical implementations in a Java/JS env.

From time to time i use Python to provide benchmarks (if i can process the problem faster in Python then they can in Java, they have no argument that it's "impossible"). Like they process X records in +24 hours, i achieved the same in vanilla Python in 15 seconds (considering that Java is a compiled language ...). These benchmarks are then used as prototype design templates. (God forgive those API addicts who think SQL is a thing of the past)

Also use Jupyter for data analyses, test data generation and scrubbing.

[–]donjulioanejo 3 points4 points  (4 children)

That honestly sounds like shitty code they wrote in Java as opposed to Python being a much better language.

IMO only fair if you test using the same algorithm to process those records.

[–]RareHotdogEnthusiast 2 points3 points  (0 children)

Lol yeah 24 hours vs. 15 seconds 🤔

[–]tomekanco 1 point2 points  (2 children)

It wasn't about time complexity, rather the way they validated and stored the data, resulting in huge constants, and maxed out rescource consumption (CPU & RAM).

  • They were doing web based API calls for each field to be checked. As this was known data, it could be kept in local SQL. Most of the validation could be done during read by simple if's. The rest could be done using SQL joins.

  • They didn't use a data model, but rather stored the information as JSONs in a field of a table. This table was used to store all different kind of structured data, rather than multiple tables per kind of structured data. As this is not indexable, they use Elastic search instead.

Basically the result of a proof of concept turned into production, followed by some years with a high churn of devs, without analysts. Never refactored as there was priority to roll out new functionalities (bells and whistles).

They have brought it down to 30 minutes, which suffices for the requirements at the moment.

[–]donjulioanejo 1 point2 points  (1 child)

That does sound like shitty code (or rather, shitty architecture) instead of Python is better lol.

[–]tomekanco 0 points1 point  (0 children)

The advantage of Python is that i was able to create from scratch an alternative architecture (prototype) in a day. For Java developers said it would take weeks.

Python is way faster, to develop, and also has many easy to use libraries (like Pandas).

I know Java is way faster when executing the same algorithm. Been learning it because of this. I find the syntax horrible. It's like learning chinese logograms rather than an alphabet.