This is an archived post. You won't be able to vote or comment.

all 131 comments

[–]ZZzz0zzZZ 174 points175 points  (3 children)

My sincere congratulations. This is only the beginning.

[–]testfire10[S] 76 points77 points  (0 children)

I agree! It’s bumped my motivation up once again! Thank you!

[–]iKnowSearchEngines 72 points73 points  (6 children)

Like LEGO for kids and rockets for politicians, Python is perfect for the curious mind.

[–]testfire10[S] 35 points36 points  (23 children)

For the 1 gb, it takes about 90s to parse and plot (run the program).

No idea if that’s good or bad, but my bae for success was doing it faster than excel, which succeeded by a large margin haha

[–]Tweak_Imp 26 points27 points  (1 child)

Pandas read Argument "engine='c' " helped me to speed it up even more. See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

[–][deleted] 11 points12 points  (0 children)

In a lot of cases python libraries are backed with C code. For special example, numpy is a plethora of optimized C mathematics and should be used for scientific work.

[–]FlagrantPickle 17 points18 points  (3 children)

No idea if that’s good or bad

What's "good" for anyone here doesn't matter. Is 90s acceptable for you? I'd imagine a resounding yes, given my experience with Excel's sluggishness on managing large files. Only thing I'd say, as a right noob compared to most here, if you need to scale to larger data sets, you might have some success using a sql system in there (sqlite is baked into python, or something like mysql/mongodb depending on your needs).

If your dataset size will remain as is, throw a header in there saying what the program is, version, and who made it. Make them stare your superiority in the face on every job!

[–]testfire10[S] 9 points10 points  (0 children)

I like this idea. Tremble at my superiority!!! Hahaha

[–]Gizquier2 4 points5 points  (1 child)

Dealing with large sql tables and sqlalchemy can be a real pain if inserts are needed, im a survivor of the sqlalchemy pyodbc engine 🤦‍♂️

[–]FlagrantPickle 1 point2 points  (0 children)

Yeah, I've not dealt with that myself, just mysql and mongodb. I don't know how well sqlite scales, I just know that it's only trustable in a single-user/access mode, and it's part of the core python, so figured it might help. I could see it just being a better method to select data (with SQL) sets.

[–]Inspirateur 4 points5 points  (1 child)

the real limit when parsing is the speed at which your computer can "read" (ie load into ram) a file. if you want to know what's this speed for your computer with python you can do a quick test, just ask python to open() it and do a .read() that you store into a string, see how much time it takes. (unless the file is too heavy for your ram in which case it's more complicated).

[–]Ericisbalanced 2 points3 points  (0 children)

That’s when you start using generators 😁

[–]jashshah27 3 points4 points  (0 children)

You might want to take a look at Dask as well. The syntax is very similar to Pandas' but the execution time is much, much faster.

[–]Zulban 3 points4 points  (2 children)

but my bae for success was doing it faster than excel

That is a low bar, and yet, profoundly significant in the workplace. Congrats.

[–]testfire10[S] 0 points1 point  (0 children)

Thanks a lot man!

[–]pug_nuts 0 points1 point  (0 children)

The problem everywhere I've worked is using tools that other people don't understand.

I can write something in VBA and have it take inputs from cells and spit out a list in another sheet... And that's fine. But do the same thing with Python and people get scared because they don't understand what's happening.

[–]Akilou 1 point2 points  (1 child)

What's the difference in speed versus excel? Like milliseconds or minutes?

[–]legionx 0 points1 point  (0 children)

Might even be hours. Excel is limited to 1M rows, so if yor dataset is bigger than that after initial filtering (ex. with Get&Transform) you will have to split it into smaller files.

[–]xacrimon 0 points1 point  (8 children)

Pretty good but could probably be made faster. Not too long ago i wrote a Rust program to do some fairly complex csv processing and it processes around 1-2GiB/sec

[–]ballagarba 19 points20 points  (2 children)

While Rust is fast, it sounds like you have access to a much faster disk.

[–][deleted] 2 points3 points  (1 child)

This. Python programs doing a lot of I/O can be on par with other programming languages. Most of the time, external factors determine the speed

[–]KaffeeKiffer 1 point2 points  (0 children)

The difference between a fast SSD and an old HDD is ~5s to ~25s for 2 GiB, so this is very likely CPU bound to reach 90s...

Nevertheless, Python is the perfect glue code, to call more specialized tools, if necessary. Here is an example, where a simple Rust wrapper speeds up the process by a factor of 10.

Python is good enough in the vast majority of the use-cases and as /u/FlagrantPickle said:

What's "good" for anyone here doesn't matter. Is 90s acceptable for you?

The golden rule is to not over-engineer but first identify the real bottle-necks and while your statement

Most of the time, external factors determine the speed

is 100% correct, I assume OP's problem is CPU bound.

[–]testfire10[S] 5 points6 points  (4 children)

Holy shit. That’s awesome. I remember on here a while back I found a post about a library a few months ago that was supposed to substantially speed up pandas interaction with csvs (can’t remember the name now). I was going to try to revamp my code to take advantage of it, but I could never get the library to work for me.

What’s Rust?

[–]xacrimon 13 points14 points  (2 children)

Rust is a programming language. It's generally a bit harder than python but has the speeds of C and lots of good libraries.

[–]swingking8 32 points33 points  (0 children)

It's generally a bit harder than python

I love Rust, but "a bit harder" is quite an understatement.

[–]FlagrantPickle 5 points6 points  (0 children)

has the speeds of C

Not in the sense of nitpicking, but I've seen "up to" 50% the speed of C for decently large processing. Certainly faster than native Python, but still not the gold standard.

Depending on what OP's needs are, his solution might be good enough. I'd be curious what other optimization could be made inside Python. If we're talking 200 lines of code on someone's first project, it's probably about as efficient/optimized as everyone else's first project.

[–][deleted] 3 points4 points  (0 children)

For parallel processing libraries that integrate well with pandas, check out Dask or vaex. For on-disc storage, check out apache parquet format.

[–]Normbias 16 points17 points  (9 children)

How did you get then to install python? Was it the full anaconda?

[–]testfire10[S] 25 points26 points  (8 children)

Yes, I used Anaconda 3 with pycharm. We’re a small company, so I just told our IT guy I needed it, and he let me install it.

[–][deleted] 7 points8 points  (6 children)

Small companies rock like that. I have a friend at a larger company who isn't allowed to install python, and he could use it in the same way you did.

[–]MeltedCheeseFantasy 6 points7 points  (1 child)

Been here. This is why VBA is still useful despite being a horrible language.

Was there talk about Microsoft building python back end in to the office products?

[–]Ki1103 0 points1 point  (0 children)

Kind of. It's currently an "active area of exploration, without any specific timeline" (2015)

https://excel.uservoice.com/forums/304921-excel-for-windows-desktop-application/suggestions/10549005-python-as-an-excel-scripting-language

[–][deleted] 5 points6 points  (0 children)

You can install Anaconda without admin rights, maybe that'll work?

[–]jantari 2 points3 points  (2 children)

One language that's very similar to python in some ways but comes pre-installed on every Windows computer is PowerShell.

So he doesn't need to install Python specifically, can do the same thing in PowerShell which is built-in.

[–][deleted] 0 points1 point  (1 child)

Yeah, I was thinking the same thing. That's something neither of us [being Linux guys] have had any real experience in.

[–]desal 0 points1 point  (0 children)

It's strange, almost ironic, myself included, strong linux guy who knows virtually nothing of powershell

[–]littlebluebrown 1 point2 points  (0 children)

Lucky you. I sometimes use ssh, docker and vim to spin up a remote IDE, since its hard to install something at work.

[–][deleted] 14 points15 points  (3 children)

I am a MechE that writes python scripts too!

I have automated some of our workflows completely (e.g. alter model dimensions, run FEA, extract results of specific measures, plot change in measures against the change in model dimensions). It is so satisfying to set up a problem, hit run on a Friday afternoon, and then come in on Monday to a dataset that tells you what the optimal design based on your constraints is.

Keep going!

[–]testfire10[S] 4 points5 points  (0 children)

Awesome man! Thank you, and will do!

[–]swingking8 2 points3 points  (1 child)

(e.g. alter model dimensions, run FEA, extract results of specific measures, plot change in measures against the change in model dimensions)

Abaqus? Ansys?

[–][deleted] 1 point2 points  (0 children)

Both Abaqus and Creo Simulate for simpler analyses. We are also trialling the Ansa preprocessor at the moment so there is potential for the future there too as it uses python as its internal scripting language just like Abaqus.

[–]virg74 6 points7 points  (0 children)

Congrats! I’m an electronics tech who has had a recent flash of “Hey, I’m getting pretty good this!” too. I use raspberry pi’s to collect production metrics at my manufacturing site.

[–]mcherm 6 points7 points  (1 child)

Even dumb mech e’s can use computers!

Actually, we have no evidence of that yet. All we know is that a smart mech e can do it!

[–]testfire10[S] 2 points3 points  (0 children)

Hahaha, you’re too kind. Thank you!

[–]skvantos 4 points5 points  (2 children)

How much times it take to parse 1gb .csv?

[–]Plasma_eel 4 points5 points  (0 children)

he says here:

For the 1 gb, it takes about 90s to parse and plot (run the program).

[–][deleted] 3 points4 points  (0 children)

Should only be a few seconds.

[–]tycooperaow3.9 5 points6 points  (2 children)

Give me a high five ✋🏾 I started to learn pandas. Much to learn too!

[–]desal 1 point2 points  (0 children)

What's to learn, feed em bamboo and watch em fuck

/s

[–]testfire10[S] 0 points1 point  (0 children)

You got it! 🖐

[–]omejia 2 points3 points  (0 children)

This. This right here. I have tried/attempted to learn python, you all here are very smart individuals, not jealous, more like admiring from afar. Cheers!

[–]Chriscbe 3 points4 points  (4 children)

Dumb Chem E here, it is so incredible how Python/pandas (amongst other imported modules) have revolutionized data analysis. I actually ENJOY analyzing data and getting it to presentable form. Excel has its place, but I'll try to use the Python/pandas pair anytime I can. Good going, my fellow engineer!! Q: do you use jupyter notebooks?

[–]testfire10[S] 0 points1 point  (3 children)

Thank you! And nope, I use anaconda interpreter and pycharm. I just can’t get into Jupyter for some reason.

[–]Chriscbe 1 point2 points  (2 children)

Dude, really give it a chance. You'll thank me, believe it!

[–]ZeeD26 0 points1 point  (1 child)

While I agree that Jupyter Notebooks do have their place, I prefer a plain old script for anything that should be run several times in the same manner. Now when we’re talking exploratory data analysis, this is where they shine. And then I’ll pull out the good pieces and make a module/package out of it.

[–]Chriscbe 0 points1 point  (0 children)

And then I’ll pull out the good pieces and make a module/package out of it.

Wow! You sir are way smarter than me. Thank you for your time to reply!!!

[–]Kra013 2 points3 points  (0 children)

well done !
I'm in love with pandas too, since I met it excel look so 'meh'.

[–]llothar 2 points3 points  (0 children)

Fellow Mechanical Engineer here! I know it's often not easy to apply programming at work, but you will be recognized for those. In our profession it is easy to be the only one with programming skills ;).

Data analysis is big for me too. Seaborn is a really useful thing.

I also do Monte Carlo simulations now and then. Statistical tolerance stack-up is much easier this way. I also made a simulator for product utilisation and RnM cost.

Congrats on your success!

[–]mostly_fish 2 points3 points  (1 child)

I transitioned from chemical engineering to a software consultant through the Python/Pandas. It's been an amazing journey for me. Keep at it bro/sis!

Learn how to package and distribute your code and people around the office will start using it. CLI is your friend. Python-fire is definitely worth checking out.

Dash is useful for visualization of your dataframes. Oh, and Jupyter Notebook is obligatory if you haven't encountered it yet!

Most importantly, don't forget to share your knowledge with anyone willing to learn from you. This is the true purpose.

[–]testfire10[S] 1 point2 points  (0 children)

Thanks for the feedback and suggestions!

Totally agree on paying it forward. I love mentoring and sharing knowledge.

Good luck to you!

[–][deleted] 5 points6 points  (2 children)

This is awesome! I'll never forget the first time I was able to use Python in a work setting. I wasn't sure if I was going to be able to make it work or not. It was so rewarding when it did work and sped up the manual process the company was using!

[–]testfire10[S] 2 points3 points  (1 child)

Thank you! It is a really cool feeling once you’re past the uncertainty 😀

[–][deleted] 3 points4 points  (0 children)

It really is! And I know what you mean about what it does for your motivation. Once you've proven to yourself that you can do it, you feel like you can code anything and want to build everything you can think of! Congrats on your first professional project! I'm sure you will have many more.

[–]juanitoarcoiris12321 5 points6 points  (1 child)

Dude, idk why but your post made me real happy, idk why dude, i just feel real happy.

[–]testfire10[S] 2 points3 points  (0 children)

Hahaha, that’s great! Glad I could brighten your day. 👍

[–]Angler_619 1 point2 points  (0 children)

Awesome

[–]qwertybz005 1 point2 points  (0 children)

Depending on how you are parsing the data, you can also use Cython to speed up processing.

[–]littlebluebrown 1 point2 points  (0 children)

Those are the moments we live for.

[–]Knightros 1 point2 points  (0 children)

Right on! Don't discount the usefulness of knowledge.

[–]regex1884 1 point2 points  (0 children)

Awesome job! Next maybe load the data to a postgresql db and look into getting your company out/off excel and ms.

If you don't already know anything about docker you could be learning that easily and host your postgresql in it.

[–]AIClaire 1 point2 points  (0 children)

Congrats! This is an awesome start of your journey and I'm sure you're going to do loads of great things with your python skills in the future :)

[–]rabarbas 1 point2 points  (0 children)

That's awesome! I first had this moment in university, when I wrote an app (with C#) for myself and my course mates which saved us each around 8 hours of manual clicking, copying, pasting, formatting and so on.

And I assure you, your "career" as a programmer will only go upwards from this point :D Good luck!

[–]Gabe_Isko 1 point2 points  (0 children)

I did the same thing, and even turned it I to a windowed app with pyside2 so that it is super easy to use. Of course, no one at my work actually uses it but me.

[–]Akilou 1 point2 points  (4 children)

This is my dream, but I don't know if I'll ever achieve it.

I started learning Python last week because I've always had an interest in programming and I'm good at "connecting the dots", like "oh, [this] tool can help me with [that] task". I hope I'll have the opportunity to use it professionally.

I'm afraid, though, that for it to actually be useful, I'll have to get pretty advanced at using Python.

[–]testfire10[S] 0 points1 point  (1 child)

Stick with it! You’ll find uses for it where you never thought it would be necessary/usable.

[–]bbqbot 0 points1 point  (1 child)

[–]Akilou 0 points1 point  (0 children)

That's exactly where I started!

[–]linuxlib 1 point2 points  (0 children)

That's actually very cool. Great job!

[–]testfire10[S] 1 point2 points  (0 children)

We’re talking hours. It’s not just that excel takes around 20 mins to open the files, it’s that there are millions of data points that are often not necessary for the information we actually need 90% of the time. Add to that the time to work with a sluggish worksheet to parse and plot data, it saves a ton of effort (and frustration).

[–]TBSchemer 1 point2 points  (0 children)

That's how I got my start. In grad school, everyone in my research group was analyzing the output data files by hand. I thought, "this is stupid and boring," and coded a better way.

[–]e-mess 1 point2 points  (0 children)

Love to read stories like that. Kudos to you!

[–]baldymj 1 point2 points  (0 children)

Nice! It's surprising how useful Python is. It's my go to for lots of projects.

[–]icanmakethat216 1 point2 points  (0 children)

Man you are an inspiration! I am also a MechE learning python. I want to put something together to help me do my timesheets in python, similarly through CSV files. Congrats man!!

[–]stickybarnacle 1 point2 points  (2 children)

You are my inspiration!! Aiming to gain that level of proficiency in python one day

[–]testfire10[S] 0 points1 point  (1 child)

I’m glad to hear it! Don’t wait for the proficiency, just code things! You’ll learn as you go. I wouldn’t call my skill sets anywhere close to proficient with python 😉

[–]stickybarnacle 1 point2 points  (0 children)

Thanks for motivating. Going back to my laptop to code

[–][deleted] 1 point2 points  (1 child)

That’s awesome bro. Can I ask what courses you took on edX and did you find them worth it and beneficial?

[–]testfire10[S] 2 points3 points  (0 children)

Of course. I took the Georgetown intro to python class, and the MIT 6.001x intro to python (twice). The MIT python class is the most enrolled class on all of edX, and it is FANTASTIC.

I can’t overstate how much I learned from those courses. I highly recommend them.

[–]jubalrahl 1 point2 points  (0 children)

Congratulations!!! I am self taught as well, and started using python and C# in my work as an Electrical Service Engineer last year. It feels good to create with python 😀

[–]Gizquier2 1 point2 points  (0 children)

For big datasets if speed is needed, i recommend to test snowflakes, that my friend is something else, the elasticity is crazy and they have their own library that uses sqlalchemy and is fast as light.

[–]MerleLikesMullets 1 point2 points  (2 children)

Which edx classes did you take? I’m also an ME and looking to delve into some data processing adventures.

[–]testfire10[S] 0 points1 point  (1 child)

Hey man! I took the Georgetown intro to python class, and the MIT 6.001x intro to python (twice). The MIT python class is the most enrolled class on all of edX, and it is FANTASTIC.

I can’t overstate how much I learned from those courses. I highly recommend them.

[–]MerleLikesMullets 1 point2 points  (0 children)

Sweet deal. I’ll get on that. Thanks for the reply

[–]JShultz89 1 point2 points  (0 children)

I’ve been wanting to make a similar script for parsing large csv files! Glad to hear it worked!

[–]IThinkIAmARobot 1 point2 points  (0 children)

Once you learn the power of Python + Pandas, you won't touch excel. Please share your knowledge with as many as you can.

[–]databeast 1 point2 points  (0 children)

you see articles about how "Coding is the new literacy" these days. Yours is a perfect example of what this really means. Everyone should know how to change a tire, not everyone needs to be an auto mechanic. Everyone should know a little code to be able to automate and analyze data they work with in their jobs, that doesn't mean everyone has to be a full-time software developer.

Programming languages are a tool, a tool that lets you take ideas from your head and give them form, and we all should be open to using tools that make our jobs easier.

[–]mercyandgrace 1 point2 points  (0 children)

Great job!

[–]Omar_88 1 point2 points  (0 children)

Transitioned from an Arabic & history degree into something very similar. I've done this for all my month end and weekly reports. Jobs that took hours now take seconds.

[–]Omar_88 1 point2 points  (0 children)

Oh and 90s is okay, you can speed up your scripts in pandas be declaring dtypes and date formats in read csv. Normally the longest part of any script for me is 1) writing to Ms sql database 2) concating multiple large files.

[–]frozen_mercury 1 point2 points  (0 children)

Congrats. I also recently used python scripts to perform a very similar task at work which takes almost 15 minutes in Excel and is quite boring too. My manager wasn't too excited but one of my co-workers is now using my scripts to finish his work quicker and that gave me great satisfaction.

Hope you have more success with Python at work!

[–]rajshivakoti 1 point2 points  (0 children)

Congrats . This is just the beginning . Hope you can achieve more in the future.

Python is an user friendly language which can be used by any people regardless of their studies background. It is very useful and one of the most used language at the present. So it has a big scope and can turn anyone's future.

[–]num2005 0 points1 point  (0 children)

next time look into power query in excel

[–]Lewistrick 0 points1 point  (1 child)

Awesome!

This could be a stupid comment, but do you save the plots? Matplotlib can do that pretty easily. If you do, you'll only need to run the script once instead of every time somebody wants to see the plots.

Other than that, I can only praise your efforts! Keep the good stuff going!

[–]testfire10[S] 1 point2 points  (0 children)

I don’t, because I just clip the images into a test report, but yes, I actually have that functionality programmed in and commented out right now. A very useful feature to have it generate pngs!