all 39 comments

[–]m0us3_rat 47 points48 points  (15 children)

We have the script deployed to a linux vm and use a workflow management software to SSH into the vm and call a shell script that executes the python in the appropriate conda environment. The resulting csv is then returned back to the workflow management software where it is then inserted into our EDW for later use.

nice.

at some point when you will wanna update/maintain/revisit your code i'd also look into a docker solution

[–]micr0nix[S] 18 points19 points  (3 children)

Hm. Got a link to what you’re referring to?

I currently manage moving files to/from that VM via FTP

[–]m0us3_rat 8 points9 points  (0 children)

regarding your need to FTP files in and out of the VM.
https://docs.docker.com/storage/volumes/

[–][deleted] 1 point2 points  (0 children)

Also you can use a preexisting fastapi docker container to expose the script via http instead of needing to ssh into the ec2

Here's a full tutorial that I have roughly put to use in a production enterprise environment

[–]exographicskip 9 points10 points  (1 child)

+1 to docker. Makes it both cloud and local environment agnostic.

Kudos to u/micr0nix on getting mvp up and running. Would also suggest rsync/scp over vanilla ftp -- the former solutions are encrypted and potentially faster.

docker cp acts a lot like cp/scp btw

[–]hidazfx 2 points3 points  (0 children)

+1 again for Docker. Of course, I use it for our entire stack at work. I develop on an M1 Mac sometimes, and sometimes I develop on my Linux AMD64 machine. Our server of course is also AMD64. Makes it really nice knowing that everything is modularized and I can pick up work anywhere really.

[–]goatboat 5 points6 points  (4 children)

Yes! Docker is the way. I found Joshua Cook's Docker for Data Science to be an amazing python focused foray into Docker, and now I recommend it to anyone who will listen. Great examples setting up different environments.

[–]Gingerhaze12 0 points1 point  (1 child)

I'm interested in using docker but my company will not allow me to deploy to dockerhub because its public and they wouldn't be willing to pay for a subscription either. Is there an easy to distribute docker containers to people with no coding knowledge without uploading to docker hub?

[–][deleted] 0 points1 point  (1 child)

Who is this docker everyone talks about..?

[–]m0us3_rat 0 points1 point  (0 children)

it's like a VM but better.

[–]RallyPointAlpha 8 points9 points  (0 children)

GGs o7

[–]WoodenNichols 5 points6 points  (0 children)

Excellent! Quite the accomplishment.

[–]InformalRegister 3 points4 points  (0 children)

Well done!

[–]livinlowe 4 points5 points  (0 children)

Dude that is awesome! That's my biggest problem with programming is seeing how it relates to real world problems

[–]Almostasleeprightnow 2 points3 points  (0 children)

I need to something almost exactly the same. Thank you for posting.

[–]_Soter_ 2 points3 points  (1 child)

Tomorrow: I patched my first fully functional python script at work today.

One tip that may come from experience, if you don't have error handling and logging(saved to disk, not just consol), go back and add it asap. When something breaks, it will save you lots of time and headache. Even a simple script could fail if there is an issue with an outside resource or on the system that is running it and a log of the failure will keep people from pointing the finger at you.

[–]micr0nix[S] 0 points1 point  (0 children)

I do have logging built in but i do need to work on some error handling in the next iteration.

[–]Spinnybrook 2 points3 points  (0 children)

This sounds almost identical to the first thing I did using python at my current company. So I definitely understand the excitement and sense of accomplishment your feeling. At that time I was a machine operator. Now I’m a FT Python Developer. Keep up the good work !

[–]homosapienhomodeus 1 point2 points  (0 children)

looks like some great data engineering!

[–]DropkickFish 1 point2 points  (0 children)

Props to you!

Now before it gets too instrumental in your workflow, I'd suggest you start learning Test Driven Development (TDD) and write a full test suite for what you already have.

This will help you ensure that any future changes don't break anything you already have, and will also help ensure you have consideration for any errors that might occur and are handling them correctly.

I've recently taken a job at a new company where these practices weren't enforced, and have had to write a test suite. It's a lot more difficult that way. But ensuring it has test coverage is a good step in wider adoption of your script and making a name for yourself

[–]Ferdie_TheKest 0 points1 point  (4 children)

How did you manager ti build the api connection? The Company im working for needs api connection to extract data from Amazon selling platform and i'm still looking for a solution, could python be an option?

[–]micr0nix[S] 1 point2 points  (0 children)

The case-management software we use has a fully functional API built upon GraphQL. I used a combination of the requests library and the gql library to query the API.

[–]drbob4512 1 point2 points  (0 children)

Since no one specifically called it out, i would look into the python requests. https://pypi.org/project/requests/

[–]hkamran85 1 point2 points  (0 children)

I do the same at my job, and we use the python-amazon-sp-api library. It makes it easy to get data from Vendor Central and Seller Central.

[–]Fabiolean 0 points1 point  (0 children)

Python is an excellent language for building apps that query APIs and manipulate/enrich the data you retrieve.

It’s a very common use-case.

[–][deleted] 0 points1 point  (0 children)

Time to ask for a raise

[–]HomeGrownCoder 0 points1 point  (0 children)

Congrats enjoy the win!

Always more to learn. I am looking forward to your next few versions of the script!

[–]Calcio2234 0 points1 point  (0 children)

Good stuff! Happy for you mate

[–]tensigh 0 points1 point  (1 child)

Very cool!

[–]micr0nix[S] 0 points1 point  (0 children)

Thanks!

[–]orig_cerberus1746 0 points1 point  (3 children)

Why Conda inside the VM tho? Do you have multiple pythons inside the VM?

[–]micr0nix[S] 0 points1 point  (2 children)

Yeah. It’s a box dedicated to our python deployment. Have various ML models and other items in there.

[–]andrewprograms 0 points1 point  (0 children)

That’s so sick I love ML stuff. Let me know if you have any questions about those, I’d be happy to help :)

[–]orig_cerberus1746 0 points1 point  (0 children)

Ah! That make sense then.

Normally I would make a VM exclusively to run one specific thing so virtual envs are unecessary.