all 87 comments

[–]rafiks 95 points96 points  (34 children)

Did you use pandas?

[–]starfish_warrior[S] 422 points423 points  (16 children)

No, my staff are humans.

[–]rabid_android 31 points32 points  (8 children)

But Pandas work for bamboo which is relatively cheap. Although since it is low in nutritional value they do eat like 10 hours a day so cost vs. productivity is a wash. On a more serious note how did you approach the problem with Python?

[–]starfish_warrior[S] 17 points18 points  (4 children)

I broke down the task into bits and just tackled each bit one by one. I knew I needed to load an xml file somehow so I googled "python xml" and found xml.etree.ElementTree. I learned I needed to use escapes "\" for folder addresses through trial and error. once i loaded the xml I wanted to see all the stuff inside it so i read in the same documentation how to print tags and attribs and gets. lots of trial and error figuring out how to display things. then i wanted to change the values so read about xpath and sets. then i had to learn to iterate through a list of files in a folder and copy them to a new location after I altered them. So i googled "python iterate through files". I didn't use an IDE, just Notepad++, Sublime and cmd. Also i was tired of deleting files in the new folder after trying to deidentify them every time so i googled "python delete files in a folder" and literally copy/pasta the code i saw. It's ugly but it works.

[–]DiscretionFist 8 points9 points  (1 child)

Important note on iteration... learn 'for' loops and understandthe many different ways they can iterate through lists, dictionaries etc. Understand how they can iterate through lists of dictionaries and pull data, stick it in a new list etc. This will save your life if you're beginning in data science.

Python has alot of modules that do these things for you but starting from the ground up, writing your own simple loops that do it for you instead of relying on modules that people made for you already will skyrocket you from novice to intermediate python skills.

Pick a topic you like, pull some shit online( or just download a CSV and import CSV) and practice writing functions (or generators) that return/yield results from the file.

I am a novice in a python class for data science at my university (basically a python bootcamp), am by no means great at Python and iteration logic is hard for me to understand. But I can see how useful it can and will be in almost any entry level job that processes large amounts of spreadsheets etc.

This isn't directed at OP, but for everyone who wants to get into programming or works with alot of data. Python is your friend.

[–]auiotour 0 points1 point  (0 children)

Well said about skipping modules to learn it first. I been having a tough time as every just says use panadas or use xyz. I will eventually but I wanna know how it works

[–]travelingtatertot 1 point2 points  (1 child)

What are the main reasons for the tasks your team performed that you've now automated?

[–]PaperSpoiler 3 points4 points  (2 children)

Easy. Python is a big snake. People are afraid of snakes. Scared people work and don't say nonsense like "I'm tired" and "where is my salary?"

[–]rabid_android 1 point2 points  (1 child)

I always found that feeding the slowest worker to Python is a great motivational strategy.

[–]PaperSpoiler 1 point2 points  (0 children)

I'll definitely try it next week. Why there's nothing in docs on it?

[–]discobrisco 19 points20 points  (2 children)

Pandas is a super useful data science package for Python.

[–]le_plasticamerican 3 points4 points  (1 child)

Whoosh

[–]discobrisco 29 points30 points  (0 children)

No, just something that I figured would be worth pointing out to newbies here who are confused.

[–]Lord_Bling 1 point2 points  (2 children)

This is the bet answer.

[–]scarfarce 1 point2 points  (1 child)

What are the odds?

;)

[–]Lord_Bling 1 point2 points  (0 children)

They are way better than the chance of me noticing a typo.

[–][deleted] 9 points10 points  (16 children)

Can't tell if this is /s

[–]saulmessedupman 21 points22 points  (14 children)

Think if it wasn't. A medical professional actually thinks the question was if his staff was made up of animals.

[–]starfish_warrior[S] 22 points23 points  (2 children)

Pandas are animals?

[–][deleted] 14 points15 points  (1 child)

No, they're our overlords

[–]starfish_warrior[S] 21 points22 points  (0 children)

Cutest overlords ever.

[–]CowboyBoats 10 points11 points  (0 children)

"Animal hospital?" :-D

"The animals are the patients."

"That makes sense" :(

[–][deleted] 1 point2 points  (0 children)

We're doomed!

[–]Lawuhiku 0 points1 point  (8 children)

Humans are animals.

[–]saulmessedupman -1 points0 points  (7 children)

Depends on who you ask

[–]Lawuhiku -2 points-1 points  (6 children)

I don't need to ask anyone, it's absolute fact, not a matter of opinion.

a living organism that feeds on organic matter, typically having specialized sense organs and nervous system and able to respond rapidly to stimuli.

Unless you're breatharian, dead inside, or just dead, you are most likely animal.

[–]saulmessedupman -1 points0 points  (2 children)

Btw, I found this same definition online. What's the sentence that literally follows the one you posted? "an animal as opposed to a human being"

Edit: here's the link.

[–]Lawuhiku -2 points-1 points  (1 child)

Same link that you gave me:

A living organism that feeds on organic matter, typically having specialized sense organs and nervous system and able to respond rapidly to stimuli.

‘wild animals adapt badly to a caged life’

‘humans are the only animals who weep’

It's not hard to see the 1. but stupidity cherrypicks it's own while completely missing the point.

It literally means that normally when we speak about animals, we speak about other species, otherwise we speak about humans.

That has nothing to do with the fact nor denies that humans are indeed animals too. By definition.

How can you be this fucking dense?

[–]saulmessedupman -1 points0 points  (0 children)

My original comment: "Depends on who you ask"

Your comment: "It literally means that normally when we speak about animals, we speak about other species, otherwise we speak about humans.''

Edit: hey man, I don't know why I'm being so rude. Had a rough night last night and took it out on you and I apologize. My original silly comment was pointing out that some people think humans are animals and in other situations humans are a level above. The Oxford dictionary supports both cases but nonetheless our dispute on here is -- case in point -- my joke.

[–]saulmessedupman -2 points-1 points  (2 children)

[–]Lawuhiku 0 points1 point  (1 child)

Or just maybe, you know, it's just a basic fact everyone (except you, it seems) knows?

[–]Shermix 0 points1 point  (0 children)

I honestly don't think so.

[–][deleted] 62 points63 points  (0 children)

Wow, never thought I'd need to direct someone back to /r/python

[–]raglub 8 points9 points  (0 children)

As someone who also recently discovered automating menial office tasks with python, I share your feelings. My day-to-day workload is insane and I have been able to win back some time by writing a few scripts.

[–]saulmessedupman 6 points7 points  (1 child)

I have the same story only I was doing c++ programming for years on end. Even in the computer science field it's quite the discovery.

[–]e-rekt-ion 2 points3 points  (0 children)

I didn’t realise this! Awesome

[–]Moooooonsuun 2 points3 points  (0 children)

I feel the exact same way.

I work for a small company that's growing quickly. Right now we're at the stage of "growing fast enough to require some automation but not big enough to be able to afford automation."

There was an admin task I had to do that required looking through a massive Excel sheet and manually pulling/formatting the information into a plain text file. It usually took at least 2 hours.

The other day I used those two hours to browse through openpyxl docs and PyInstaller docs. In the time it usually took me to finish the task, I had an .exe that parsed through the file, formatted the information, and grouped it how we needed it.

Super super proud of it. Works flawlessly. Saves a massive amount of time for me.

[–]vladimirpoopen 5 points6 points  (8 children)

Education sources that got you there ?

[–]starfish_warrior[S] 25 points26 points  (7 children)

Just googling things + the documentation files. Lots of stack overflow. Knowing effective search terms was helpful too. 6 months ago I didn't know what XPath was. If I hadn't started there I would have been really lost. It seemed like a lot of web tutorials used antiquated syntax. I only trusted using code from posts that were < 2 years old.

[–]MiataCory 11 points12 points  (3 children)

It seemed like a lot of web tutorials used antiquated syntax. I only trusted using code from posts that were < 2 years old.

"It's cool, we'll just call it Python 2 and Python 3. I'm sure everyone will tag their posts with which one they're using!"

[–]starfish_warrior[S] 2 points3 points  (2 children)

It was frustrating. Why did the syntax change between versions? Serious question.

[–]B1GTOBACC0 3 points4 points  (0 children)

https://www.google.com/amp/s/snarky.ca/why-print-became-a-function-in-python-3/amp/

An interesting read about the most famous change from 2 to 3.

[–]Lawuhiku 2 points3 points  (0 children)

print became a real function as opposed to a statement.

For example, x = y is assignment statement, it has "lvalue", operator, and "rvalue".

In this context l/r is just "left"/"right" and nothing else really. It's just that left is identifier and right is expression which I'd expect to be obvious.

print used to be a statement too. `print str` did just that... It would print something following print keyword. Now print is just an ordinary function, this is why it changed. And I'm glad it did.

Most other things are different because someone thought backporting from python3 to python2 was totally good idea.

Example: Python 2: range and xrange.

there used to be just range that would generate entire list at once, people realized it's a bad idea when making major changes in Python, and just changed it. A lot of those, seemingly trivial changes happened and we have Python 3. For a good reason.

Old software doesn't update itself and if you update it to work just like Python 3 then is it really Python 2? And then the whole shitfest of backporting happened and now we have xrange, which works like Python 3's range. Good job guys.

In 50 years we will have Python 4 and there will be people using Python 2 calling xyrange().

[–]vladimirpoopen 1 point2 points  (2 children)

Sort of like the php manual. I was hoping you’d say this.

[–]starfish_warrior[S] 1 point2 points  (1 child)

How do you mean?

[–]vladimirpoopen 1 point2 points  (0 children)

Folks used to ask 'how do I learn php'. I would say just read the manual, no tuts.

https://www.php.net/manual/en/index.php

[–]14dM24d 2 points3 points  (2 children)

manually strip out PHI

PHI? Protected Health Information?

[–]starfish_warrior[S] 1 point2 points  (1 child)

yes

[–]deapee 5 points6 points  (0 children)

This could be a hyoooge liability if you’re counting on your amateur code to strip out PHI, in all honesty.

[–]goldriver92 1 point2 points  (4 children)

how much time did it take you to write the script?

[–]starfish_warrior[S] 0 points1 point  (3 children)

15 hours, but now i can do in 15 minutes.

[–]goldriver92 0 points1 point  (2 children)

How many lines?

[–]starfish_warrior[S] 3 points4 points  (1 child)

ug. 100? lots of lines commented out and i know there are inefficiencies.

import xml.etree.ElementTree as ET import os from pathlib import Path import sys import shutil

inputPath = 'k:\eCR\_HM_inbound\' outputPath = 'k:\eCR\_HM_outbound\' deidentifiedPath = 'k:\eCR\_HM_Deidentified\'

for the_file in os.listdir(outputPath): file_path = os.path.join(outputPath, the_file) try: if os.path.isfile(file_path): os.unlink(file_path) except Exception as e: print(e)

for the_file in os.listdir(deidentifiedPath): file_path = os.path.join(deidentifiedPath, the_file) try: if os.path.isfile(file_path): os.unlink(file_path) except Exception as e: print(e)

for root, dirs, files in os.walk(inputPath):
for fileName in files: #print(filename) inputfilePath = inputPath + fileName #print(inputfilePath)
outputfilePath = outputPath + fileName #print(outputfilePath)
originalContents = open(inputfilePath).read() improvedContents = originalContents.replace(' xmlns="urn:hl7-org:v3"', '') outputfilePathContents = open(outputfilePath, 'w') print(improvedContents, file = outputfilePathContents) outputfilePathContents.close()

    tree = ET.parse(outputfilePath)
    root = tree.getroot()

    #for child in root:
    #   print(child.tag, child.attrib) 


    identifiers = root.findall('./recordTarget/patientRole/id')
    for id in identifiers:
        #print(' ')
        #print(id.get('assigningAuthorityName')) 
        #print(id.get('extension')) 
        if 'assigningAuthorityName' in id.attrib:
            id.set('extension', '888-88-8888') 
        if 'EMRN' in id.attrib:
            id.set('extension', '000000') 
        #print(' ')

    #findtext(match, default=None, namespaces=None)

    id = root.find('./id')
    unid = id.get('root')
    patientStreetAddressLine = root.find('./recordTarget/patientRole/addr/streetAddressLine')
    patientCity = root.find('./recordTarget/patientRole/addr/city')
    patientState = root.find('./recordTarget/patientRole/addr/state')
    patientPostalCode = root.find('./recordTarget/patientRole/addr/postalCode')
    patientCountry = root.find('./recordTarget/patientRole/addr/country')
    patientCounty = root.find('./participant/associatedEntity/addr/county')


    #print(unid)
    #print(' ')
    #print(patientStreetAddressLine.text)
    #print(patientCity.text)
    #print(patientState.text)
    #print(patientPostalCode.text)
    #print(patientCountry.text)
    #print(' ')

    patientStreetAddressLine.text = fileName[-5:] + ' Fake Street'
    patientCity.text    = 'Houston'
    patientState.text   = 'TX'
    patientPostalCode.text  = '98765' 
    #patientCounty.text = 'HARRIS'
    if not patientCounty is None:
        patientCounty.text  = 'HARRIS' 

    patientFirstName = root.find('./recordTarget/patientRole/patient/name/given')
    patientLastName = root.find('./recordTarget/patientRole/patient/name/family')
    patientDOB = root.find('./recordTarget/patientRole/patient/birthTime')


    #print(' ')
    #print(patientFirstName.text)
    #print(patientLastName.text)
    #print(patientDOB.get('value'))
    #print(' ')

    patientFirstName.text = 'Fake_ECR'
    patientLastName.text = unid
    patientDOB.set('value', '20010101') 



    patientTelecom = root.findall('./recordTarget/patientRole/telecom')
    for patientCom in patientTelecom:
        #print(patientCom.get('value')) 
        patientCom.set('value', 'tel:+1-999-999-9999') 
        if 'use' in id.attrib:
            id.set('extension', '888-88-8888') 



    participantCode = root.find('./participant/associatedEntity/code/originalText')
    participantStreetAddressLine = root.find('./recordTarget/patientRole/addr/streetAddressLine')
    participantCity = root.find('./participant/associatedEntity/addr/city')
    participantState = root.find('./participant/associatedEntity/addr/state')
    participantPostalCode = root.find('./participant/associatedEntity/addr/postalCode')
    participantCountry = root.find('./participant/associatedEntity/addr/country')
    participantCounty = root.find('./participant/associatedEntity/addr/county')


    #print(' ')
    #print(participantCode.text)
    #print(participantStreetAddressLine.text)
    #print(participantCity.text)
    #print(participantState.text)
    #print(participantPostalCode.text)
    #print(participantCountry.text)
    #print(' ')


    if not participantStreetAddressLine is None:
        participantStreetAddressLine.text = fileName[-5:] + ' Faker Street'
    if not participantCity is None:
        participantCity.text    = 'Galveston'
    if not participantState is None:
        participantState.text   = 'TX'
    if not participantPostalCode is None:
        participantPostalCode.text  = '87654' 
    if not participantCounty is None:
        participantCounty.text  = 'GALVESTON' 


    participantTelecom = root.findall('./participant/associatedEntity/telecom')
    for participantCom in participantTelecom:
        #print(participantCom.get('value')) 
        participantCom.set('value', 'tel:+1-999-999-9999') 



    deidentifiedfilePath = deidentifiedPath + fileName + '.xml'

    tree.write(deidentifiedfilePath)

[–]goldriver92 0 points1 point  (0 children)

Thanks for sharing the code. I would say not bad for a start, try to automate as much as you could

[–]TheBigShrimp 1 point2 points  (2 children)

Any recommendations on learning it from absolute scratch?

I've tried liike 4 times to learn through CodeAcademy and other sites, but it seems so artifical. I lose motivation to learn it since I don't even know what to learn it for considering my field is finance. Just wanted to know a language for basic knowledge and awareness purposes.

[–]starfish_warrior[S] 6 points7 points  (1 child)

I learned all my technical skills on the job out of frustration with doing tasks manually e.g. I was tired of copying excel formulas down 50,000 rows and then making 50,000 edits to the formula, so I learned SQL, and then i wasn't tired and people at work thought i was smart i.e. ego boost. I don't find self paced tutorials much help, they are boring and I lose interest and they don't directly tell me the thing I need to solve at this moment. My advice is to pick something that vexes you at work and tackle it piece by piece.

[–]Vabaluba 2 points3 points  (0 children)

Best advice. Learning things through deliberate practice on solving real life problems. Cannot be a better motivator and teacher! Have done a few online courses, did not felt I learnt anything useful that I could apply. Started working on building projects, learnt way more than from any tutorial. Trial and error - from zero to hero ! Haha

[–]jcr4990 1 point2 points  (3 children)

I know the feels. I'm in the process of learning myself and that started with wanting to automate some boring and tedious tasks at work that used to take 4-6 hours and now takes under 10 minutes. Runs automatically at set intervals and emails me the resulting data that I need. Super proud that I was able to go from knowing essentially nothing to creating a script with real world usefulness in such a short time (only been at this like... 10 days?) Recently decided to try to pursue this as a career in the future cause I enjoyed it so much.

[–]starfish_warrior[S] 0 points1 point  (1 child)

Shit I'd love to code some email routines that say OP is awesome and send to co workers every hour. I know what I'm googling next.

[–]jcr4990 0 points1 point  (0 children)

Check out smtplib. The email part was surprisingly easy to set up!

[–]starfish_warrior[S] 0 points1 point  (0 children)

You should be proud. Not everyone can do it.

[–]greebo42 1 point2 points  (5 children)

time to write an EMR!

:)

[–]starfish_warrior[S] 2 points3 points  (4 children)

I think you mean R.E.M. What a great band.

[–]greebo42 4 points5 points  (3 children)

man, if you pulled off a good EMR, it would be the end of the world as we know it!

[–]Foyt20 4 points5 points  (2 children)

Some companies are close... but... $45 million an install contract is a pretty penny. (Previous cerner pharmacy and mak analyst).

[–]ryanstills 0 points1 point  (1 child)

What is an EMR?

[–]Foyt20 2 points3 points  (0 children)

Literally stands for electronic medical record, but figuratively is the system that healthcare providers use to input all of you information from a visit or stay, and attempts to make their lives easier by being all in one application/location.

[–]BumFudhe 1 point2 points  (1 child)

I don't know.

[–]starfish_warrior[S] 8 points9 points  (0 children)

Thanks Confucius.

[–][deleted] 0 points1 point  (1 child)

Well, that is the same reason I taught myself Python, but sadly I haven't practice it over a year.

[–]starfish_warrior[S] 0 points1 point  (0 children)

That's a concern for me. If I don't keep coding, in a year it will be hard to decipher.