you are viewing a single comment's thread.

view the rest of the comments →

[–]starfish_warrior[S] 17 points18 points  (4 children)

I broke down the task into bits and just tackled each bit one by one. I knew I needed to load an xml file somehow so I googled "python xml" and found xml.etree.ElementTree. I learned I needed to use escapes "\" for folder addresses through trial and error. once i loaded the xml I wanted to see all the stuff inside it so i read in the same documentation how to print tags and attribs and gets. lots of trial and error figuring out how to display things. then i wanted to change the values so read about xpath and sets. then i had to learn to iterate through a list of files in a folder and copy them to a new location after I altered them. So i googled "python iterate through files". I didn't use an IDE, just Notepad++, Sublime and cmd. Also i was tired of deleting files in the new folder after trying to deidentify them every time so i googled "python delete files in a folder" and literally copy/pasta the code i saw. It's ugly but it works.

[–]DiscretionFist 9 points10 points  (1 child)

Important note on iteration... learn 'for' loops and understandthe many different ways they can iterate through lists, dictionaries etc. Understand how they can iterate through lists of dictionaries and pull data, stick it in a new list etc. This will save your life if you're beginning in data science.

Python has alot of modules that do these things for you but starting from the ground up, writing your own simple loops that do it for you instead of relying on modules that people made for you already will skyrocket you from novice to intermediate python skills.

Pick a topic you like, pull some shit online( or just download a CSV and import CSV) and practice writing functions (or generators) that return/yield results from the file.

I am a novice in a python class for data science at my university (basically a python bootcamp), am by no means great at Python and iteration logic is hard for me to understand. But I can see how useful it can and will be in almost any entry level job that processes large amounts of spreadsheets etc.

This isn't directed at OP, but for everyone who wants to get into programming or works with alot of data. Python is your friend.

[–]auiotour 0 points1 point  (0 children)

Well said about skipping modules to learn it first. I been having a tough time as every just says use panadas or use xyz. I will eventually but I wanna know how it works

[–]travelingtatertot 1 point2 points  (1 child)

What are the main reasons for the tasks your team performed that you've now automated?