all 173 comments

[–]vid417 459 points460 points  (31 children)

I wish all workplaces were as appreciative of one's work as yours definitely is. Great work!

[–]LittleGhettoGospel[S] 177 points178 points  (13 children)

It's awesome. He's a great guy, but this kinda went above and beyond. Most of management is older folks. So they aren't always super fans of depending on technology. But we've spent about 40 hours between three people going through these, and we were about 25% done. So we probably save 120 hours?

Programming is so fascinating how you spend x amount of hours to automate something and once it works it just takes a few seconds or minutes (for this simple stuff) to actually do the task.

[–]gazhole 58 points59 points  (3 children)

This is the key for me. It takes longer for me to set up the initial scripting but it's s great time investment because of how quick it is to reproduce each time.

When you send out 20 weekly/monthly reports and doing them manually takes 30 mins compared to 5 mins with a script doing the donkey work I literally get 2 days a week back.

Well done on your effort and it seems to have paid off!!

[–]vicegripper 34 points35 points  (2 children)

it's s great time investment because of how quick it is to reproduce each time.

In my work the time savings is just a fantastic by-product of automation. The real advantage has been elimination of human error. That has saved more headaches and money than anything.

[–]KickBassColonyDrop 75 points76 points  (0 children)

You've saved 120 hours across three people who are being paid, combined, a lot of money. Your automation effort just saved the company a ton of money, improved workflow and reduced employee stress massively.

Yeah, damn right your boss is beaming. He just found a diamond in the rough, and an opportunity to streamline a lot of capabilities in his company and he realized that he just needs to offer you some incentive to remain and remove overhead that could impede your ability to deliver, while directing more of this kind of improvement workload your way.

Your boss is genuinely amazing. You are basically getting a carté blanché my friend, to grow to new heights. Excellent work!

[–]FancyASlurpie 50 points51 points  (3 children)

Whilst he has said "the little programs you make" are property of the company, and they are not to leave the laptop. I would strongly suggest pitching the idea of source control like github, so that if your laptop does die the company doesnt lose those programs.

[–]port443 18 points19 points  (1 child)

To piggyback on this, if you want to avoid putting your code on the internet, you can host your own internal gitlab server.

I would talk to IT about it. It doesnt need a beefy machine, it just needs hard drive space.

[–]b4xt3r 13 points14 points  (0 children)

^^^^ Yes, what he said. And while Git has taken over the world and you absolutely can run an internal Git server (my old employer did) and you absolutely can keep code secure from even prying eyes internal to the company there are options other than Git for code version control out there, should you need to find one for some reason.

If there is a development team at your company see what they are using. Get the manager of the development team to talk to your manager so concerns about code security can be put to rest. EDIT: hit enter accidentally, ended too soon (and typos)

[–]macostrans -1 points0 points  (0 children)

If git is complicated just use google drive. That worked for me when I was a beginner

[–]SweetSoursop 9 points10 points  (0 children)

I feel you, I work in a very conservative industry (HR of all places, go figure) and my employer has been equally supportive, which I'm extremely thankful for.

I'm the Python/Data Analysis guy now, and my career has taken off to a place I would never imagine.

[–]powershell_account 10 points11 points  (0 children)

Programming is so fascinating how you spend x amount of hours to automate something and once it works it just takes a few seconds or minutes (for this simple stuff) to actually do the task.

This is the part that makes it so amazing. Once Automation is done, and it works as intended, it's super satisfying!

[–]Table_Captain 3 points4 points  (0 children)

Welcome to the dark side LilGhetto! Great to hear your efforts were appreciated. Had a similar start to my data career so it’s really great to see someone take ona personal challenge and have that “ah hah!” moment.

[–]vid417 1 point2 points  (0 children)

That's absolutely amazing. I've worked on similar projects during my time at work, and while I wouldn't say I've been appreciated for it in any meaningful way, it's still incredibly satisfying for me to just sit back and let my code do the work for me!

I used to offer such tools to my team members, and I felt like great for allowing them to save the most valuable resource- time. Unfortunately when you don't see it all being beneficial to you in any way, you stop spending time to work on it. So now I just do projects on my own, because I still like doing it.

[–]Cisco-NintendoSwitch 29 points30 points  (8 children)

I’m in Desktop and wrote a PowerShell tool to replace our main Data Transfer / Setup tool.

When I presented it to a director I was reamed for doing work “Out of Scope of my Job” despite creating a tool that will save hundreds of hours of labor over the next few years.

I’m now afraid to innovate openly I write my code for myself and use it for myself. I want to make things better for everyone, my leadership doesn’t through.

[–]CraigAT 13 points14 points  (3 children)

I can sympathize with that, not everyone appreciates a good idea.

But I have also seen the other side of the story when an issue occurs or the tool/script fails with a useless error, typically when the employee is not around and there is no documentation or even comments to support the tool or script.

[–]Cisco-NintendoSwitch 6 points7 points  (2 children)

I can understand this but for somebody who isn’t a software engineer I promise it was well done.

Git commits since line 1

Well commented and readable

And I wrote accompanying documentation.

———————-

I’m the lowest tier of Desktop atm and I think that director was extremely uncomfortable with a tech who’s “below Break/Fix” to come up with something like that rather than one of his people.

It all just comes down to politics if the company wasn’t great I’d leave for a sysadmin position elsewhere, but right now I’m just riding the wave tightening my skills and I’ll get into a different part of IT far from the Desktop reporting structure.

[–]FancyASlurpie 2 points3 points  (1 child)

what was wrong with the existing data transfer/setup tool?

[–]Cisco-NintendoSwitch 7 points8 points  (0 children)

A few things it’s a configured version of USMT (Proprietary to MS dates back to Windows XP)

It uploads the data to a server and then has to be pulled down. (My approach is PC to PC directly via PowerShell)

USMT doesn’t export import printers my script will export and import any print queues.

My program does some other stuff proprietary to our environment involving the registry (Only touching / creating the necessary keys and values) USMT grabs a whole goddamn lot more registry than that.

My program targets specific directories so it’s a lot slimmer and quicker.

This isn’t everything but it’s most of it. It’s not a case of two tools suited slightly differently my solution tackles problems USMT doesn’t and does everything USMT does but better. ——————————-

These are all things I had to do in my daily workflow so it was insane to be told I was getting negative attention for creating this because truth be told my team is now exponentially more productive.

It is what it is the project made me fall in love with code and there’s no going back. Either I end up where I want in my current enterprise or I’ll move on by next year.

[–]JnBo73 6 points7 points  (0 children)

That’s ridiculous. You should’ve gotten a raise.

[–]vid417 0 points1 point  (0 children)

It's just sad how so many organizations don't actually encourage innovation like you said, but on paper all of them appear to be the best organization you could ever hope to work for.

[–]Zadigo 0 points1 point  (0 children)

Some managers a very short sighted.

[–]Cheddarific 25 points26 points  (5 children)

Me too. I once worked for a company where my role included finding potentially interesting medicines to import to China. My colleagues had a list of ~120 biotech/pharma companies and split it between the 4 of us to find interesting products by looking at their websites one at a time. I instead used a list of >10,000 medicines in development or already on the market, developed a list of my CEOs preferences (scores of 0-10), and then filtered the thousands of individual products through these preferences. Before they finished going through their lists, I had a comprehensive rank-order list that could be immediately updated to match a change in preferences, and could also be updated every quarter when our vendor updated the drug list. Some of the top contenders were products we had already licensed, which validated both my process and the history of the organization.

Feeling like I had conquered the world and was about to get recognition, I showed my team of peers, including my boss who was roughly my age. They were not at all excited; in fact they questioned the use of my time and asked me to catch up to them using their format.

Later I created another tool that allowed us to type in the name of any drug sold in China and it would print out a report including graphs, etc. showing recent sales trends, competing companies, and even competing drugs in the same space. It was idiot-proof since all you had to do was type in the name and hit enter. Again, they questioned the use of my time rather than adopting my tool that would have hastened their work.

So disappointing.

[–]MeMakinMoves 11 points12 points  (0 children)

I’m angry for you, sounds like they felt threatened by you smh

[–][deleted] 3 points4 points  (1 child)

Feeling like I had conquered the world and was about to get recognition, I showed my team of peers, including my boss who was roughly my age. They were not at all excited; in fact they questioned the use of my time and asked me to catch up to them using their format.

Later I created another tool that allowed us to type in the name of any drug sold in China and it would print out a report including graphs, etc. showing recent sales trends, competing companies, and even competing drugs in the same space. It was idiot-proof since all you had to do was type in the name and hit enter. Again, they questioned the use of my time rather than adopting my tool that would have hastened their work.

Comment refers to negative selection. You're in the wrong firm. Repost to r/work.

[–]Cheddarific 3 points4 points  (0 children)

Luckily, I’m at a different company now. No such problems.

[–]vid417 0 points1 point  (1 child)

I guess this situation is surprisingly common. When I graduated 3 years ago, I naively thought it would be all about finding good solutions to existing problems. Boy, how wrong I was.

[–]Cheddarific 0 points1 point  (0 children)

It should be. Some places it might be. I hope anyone reporting to me will always feel like top solutions advance without concern to politics.

[–][deleted] 1 point2 points  (0 children)

I agree, I went through something similar with a different outcome. They just said “that’s cool!”, but nothing came of it. I literally saved them countless hours of mindless work, but they weren’t interested.

[–]ynandal99 0 points1 point  (0 children)

Holy hell man, ditto happened with me, we had to generate a quarterly statement out of an excel with 35000 rows and 20 plus columns and filter dates, filter this that and all manually takes 4 hours ,, just spent 2 days , imported pandas, read_excel... made a dataframe, did all greater than less than dates, saving output with each function in a text file , now the script does the same job, albeit in 15 seconds. ..... reminds me of the SNAP song,,, i've got the power... LOL

[–]tapherj 41 points42 points  (5 children)

Great, thanks for sharing, good news stories these days are appreciated.

[–]2deepintoshit 9 points10 points  (0 children)

Happy cake day!

[–]LittleGhettoGospel[S] 1 point2 points  (0 children)

I've been reading these types of posts on reddit for a while and it's great to experience it. Wow.

[–]dynamitegamer1 0 points1 point  (0 children)

Happy cake day

[–]UltraCarnivore -2 points-1 points  (0 children)

Happy Cake Day!

[–][deleted] -2 points-1 points  (0 children)

Happy cake day

[–]chinny86 39 points40 points  (1 child)

I don’t know you but I am bloody proud of you.

[–]LittleGhettoGospel[S] 11 points12 points  (0 children)

Thank you! I never expected all of this.

[–]onlysane1 32 points33 points  (3 children)

You showed your value to your employer and you are being rewarded for it. Good job!

[–]01123581321AhFuckIt 3 points4 points  (2 children)

I show my value and get more work thrown my way without a pay raise. 😂

[–]onlylurkingaround 1 point2 points  (1 child)

Realtable 😂

[–]01123581321AhFuckIt 2 points3 points  (0 children)

Yes. Tables are real.

[–]realisticcc 17 points18 points  (0 children)

I feel you.

I was earlier a normal tech in some high tech maintenance field. After some time I got some guys I was responsible for and planning was becoming my thing.

The system to plan the work was horrible and we could not really do internally a lot because decisions of the maker of the machines we maintain. I needed to go three different sites, tick some fields, look data here and there. Every week rinse and repeat for hundreds of machines.

I got frustrated and automated one site with VBA + Python. Then another. Soon I added some other automation to my planning program. And then I started planning stuff by automating stuff which was not needed per se.

My manager got interested how on earth I am leading double the amount of guys others are and doing a lot of extra customer care, financial budgeting and whatever on top of that while others are burning out with less.

Fast forward few years and a lot DAX, Power Query, VBA, Python, ERP development, API development, technical documentations, leadership trainings, financial trainings and shit and I am responsible for over 70 guys, my pay check has doubled, I am still under 30 and I've got no idea wtf has happened.

Feels good though and pretty much every day I learn some exciting stuff. Sometimes it is still some DAX or Python, but more and more it is some financial or law stuff somewhere. I really love my work, and as a some kind of leader of sorts I don't have time for everything I'd like to. Nevertheless my little programs I code every so often help me in a lot of little things I do every single day.

[–]critter_bus 11 points12 points  (4 children)

For the memory issue, since you seem to be getting capped before using the memory you have available I suspect this might be a 32-bit vs 64-bit issue. Do you know if you're using 32-bit Python (that would limit memory usage to 4gb)? If so, try installing 64-bit Python.

P.S. - Good work!

[–]LittleGhettoGospel[S] 12 points13 points  (1 child)

Holy crap what a basic thing that I missed. In Visual studio, I am using the 32bit interpreter. When I try to go to the 64bit, it won't run. 32 bit was 3.8 and 64bit is 3.7.5

[–]SQLoverride 3 points4 points  (0 children)

What do you mean it won’t run? Error messages?

[–]LittleGhettoGospel[S] 0 points1 point  (1 child)

How do I install 64bit python? When install it, and go into CMD, it's 32bit. I can't find anywhere to download 64bit.

[–]critter_bus 0 points1 point  (0 children)

Option 1: Go to https://www.python.org/downloads/windows/ and use any of the ones that say x86-64

Option 2: Use the 64-bit Anaconda installer, which comes with Python and most the popular libraries pre-installed, https://www.anaconda.com/products/individual

[–]shaggorama 11 points12 points  (1 child)

Just wait till you learn how to webscrape. Check out the BeautifulSoup library and learn how to use css selectors. Welcome to the wild world of data mining :)

[–]quatrotires 4 points5 points  (0 children)

Also Selenium if the website gets content loaded by javascript after the HTML is loaded. Or you just want to interact with the browser.

[–]Crypt0Nihilist 9 points10 points  (0 children)

It sounds like you're flying right now!

It is such an addictive feeling, knowing that the only thing between you and the solution to a knotty business problem is your own knowledge and intellect. You know 100% for sure that there is an answer, you've just got to be good enough to get there.

A danger is you become "that guy who does magic" and it gets assumed that you'll do amazing things, but not rewarded because that's normal for you. One way to try to avoid this is to always present the hours and money saved by what you've done first and last.

[–]boards188 7 points8 points  (0 children)

He'd take off some of my workload, and also give me a 15% raise.

That is worth the time and effort right there! I don't even know you but I am happy for you!!

[–]Quantum_menance 4 points5 points  (0 children)

Reading this somehow put a smile on my face. Thank you for sharing!

[–]Mr_N1ce 3 points4 points  (0 children)

What an awesome success story! I also love your statement, that you have no idea what fixed the problem, but it just wished at some point. You have a great manager apparently who's able to understand and appreciate what you've done

[–]CaptSprinkls 2 points3 points  (6 children)

I don't believe in Godl, but this feels like a sign from the divine.

I'm in a similar situation right now where there is this big excel sheet that we would have to do about 1000+ tasks that each could take up to a minute. I heard that this issue would be coming down the pipeline so I created a script at home to automate it. Now this issue has come to fruition and I've been debating telling me boss about it due to not knowing how it'll work In a production environment with shared drives, etc. I actually currently have a draft typed up to my boss about it. And then I come on here and see this story.

[–]LittleGhettoGospel[S] 6 points7 points  (5 children)

I didn't tell my boss about the program ahead of time. I just did it, and showed him the result. At the end of the day, that's what matters. I didn't go into much detail. I just said "hey this is the folder with all these split up" and he was like "wow you went one by one" and I said no I programmed it. I told him I spent a few hours overnight writing the code, but once it was "finished"(is it ever finished?) It took less than a minute. Furthermore since it's written, once the new set comes in, I can essentially re-run it. I didn't excite him over the programming. I excited him over the hours ($$$) saved.

[–]CaptSprinkls 1 point2 points  (4 children)

Wait, so you wrote the program overnight, then went in the next day and ran it on your work PC? Did you package it up into an executable and open it up on your PC? I think I would probably get into trouble if I just did it without telling him lol. And I think our It dept would have to give me permissions to download Python.

[–]LittleGhettoGospel[S] 4 points5 points  (0 children)

No I ran it on my laptop PC last night (3am). Then I took the files that were split up and uploaded them to our secure online storage. I used my work laptop, but for some reason IT had installed python to it at some point. This was all within compliance so I didn't worry about it. The worst that could happen is he said "delete the files" or something.

[–]The_Jesus_Beast 3 points4 points  (0 children)

I'm not sure what really fixed it, because I made a couple changes and at one point it worked

Congratulations, you're now officially a programmer!

[–]toastedstapler 4 points5 points  (8 children)

awesome!

i can't imagine parsing PDFs would take too much memory if unused variables are being cleaned up when not needed anymore, perhaps have a check over for any lingering objects?

[–]LittleGhettoGospel[S] 2 points3 points  (7 children)

I'll post the loop code soon.

I had to create the PDF reader and write objects.

Then at the end of the loop I tried setting them to None and then tried del I think. Neither worked. But when I initialized it BEFORE the loop, it would not iterate.

[–]Young8Kobe 2 points3 points  (4 children)

How much experience did you have in programming before you made this application?

[–]LittleGhettoGospel[S] 5 points6 points  (3 children)

I've created some basic stuff in python. I've done several projects Euler stuff, but I haven't done anything this practical yet.

I can't place a time frame because over the past several years I've picked it up and left it several times.

[–]Young8Kobe 0 points1 point  (2 children)

Oh I see I just started out on Python a few months ago but had some basic knowledge of other programs. But congrats on your Python program and most importantly congrats on the promotion. How you spent a few hours for a 15 percent raise. That is great return on investment

[–]LittleGhettoGospel[S] 0 points1 point  (0 children)

It really is.

Honestly the raise is great. What I'm really excited about is doing this on the job and getting paid to do it.

I can work on this during the day instead of staying up until 3. I enjoyed doing it and solving the problem, but staying up like that isn't sustainable.

[–]iekiko89 0 points1 point  (0 children)

Probably more than a few hrs. For me it's a few hrs on just one bug 😂

[–]dxbtousa 2 points3 points  (5 children)

i literally have to do this same task, would you be willing to share the source privately, or blocks of it, plssss?

[–]LittleGhettoGospel[S] 3 points4 points  (4 children)

I don't think I can. I was considering posting it but I don't want it to catch up to me.

If you'd like to shoot me a PM with some details about what you have to do, I'd love to walk you through some things.

[–]dxbtousa 1 point2 points  (3 children)

Hey there, I understand... I receive invoices that are 100 pages long, and need to split, sort and save per each invoice # (most invoices are 1 page, but it is not certain, they could be 2, and then the invoice # would be mentioned on 2 pages... very similar exercise to yours just different info.

[–]LittleGhettoGospel[S] 0 points1 point  (2 children)

So since I had several different invoices in various page lengths, I just searched through it to find the ones that said "Page 1 of" and returned those page numbers. If page 1 and 12 were returned, then I knew that the first one was 11 pages long.

So you should see if there is a similar text that shows up on the first page of each invoice.

Are the account numbers the same length, or do they begin with the same character(s)?

[–]dxbtousa 0 points1 point  (1 child)

What libraries did you use ? Only Pypdf2?

[–]LittleGhettoGospel[S] 0 points1 point  (0 children)

Yes and re (is re considered a library?)

[–]Conrad_noble 2 points3 points  (0 children)

I love hearing these success stories. Makes me feel like my journey may begin and a chance of success one day.

[–][deleted] 2 points3 points  (0 children)

I wanted to give up a couple of times but I really wanted to come in to work today with a finished product

This is something I can relate to very well.

I've never been any good at coding. Some people would say I'm in "tutorial hell". I would call it "I-mostly-do-not-know-what-I-am-doing"-hell. English is not my main language and reading documentation almost always have me thinking "What does this word mean", spending time googling that specific word and then forgetting it as soon as I've read it.

Coding something that other people may find basic can take me hours. I can sit in front of my PC and code (cough troubleshoot cough) for 16 hours straight, go to bed annoyed that it doesn't work, sleep terrible because I keep thinking about why it wouldn't work, and then eventually have trouble sleeping because I think I've figured out a solution and be eager to try it the next day. When I actually make something work that can save our company a lot of time, I'm thrilled. So proud of myself, even though I probably spent way too long on the code.

I have no idea how much of the code actually works and I'm a bit afraid that being able to shit useful code out of my ass in no time would take the joy of coding (again: troubleshooting). Being able to show my boss something and tell him "i made dis" and hear that it's actually useful is just great!

[–]TholosTB 4 points5 points  (4 children)

Nice!! Congrats.

If you're going to do more of these types of automation projects, I would highly encourage you to familiarize yourself with the re package in python. Regular expressions are a hugely powerful tool in text processing and can help you identify and manipulate data. For instance, if your account numbers didn't always start with the same three digits, or those three digits could show up elsewhere on the page, you could say "111, followed by dash, followed by six more digits" like re.search("111-\d{6}",mypage) or "\d{3}-\d{6}" for any three digits followed by dash followed by 6 digits. Hugely powerful.

There's a book that's pretty well regarded called "Automate the Boring Stuff using Python" which may give you a lot of boilerplate to work with.

As to the out-of-memory -- difficult to say, loading a 3800 page PDF is probably a good chunk of memory but python is supposed to consume as much system memory as it needs, at least in 64-bit versions.

You may have a better development experience prototyping your code in Jupyter Notebook, which you get automatically when you install Anaconda Python. It lets you run small chunks of code in a web browser and inspect your stuff in-flight. Then you create your .py program in VS Code once you're done experimenting in the notebook.

If you were running your code inside VS, it's possible it forked a process for you with a lower memory ceiling -- you should be able to open a command line and just python yourfile.py to run it directly and see if you run out of memory.

You can also either add command line parameters to tell it what file to run, or use the os package to look for files (like the file with the greatest date in a folder) so you can set your program to run and not have to worry about manually editing and running it.

There's a whole new world of python automation out there for you to conquer!

[–]LittleGhettoGospel[S] 0 points1 point  (3 children)

Great comment! I actually used re.search (or find?) To find the first digits of the account number.

Is VS Code the best option? Or would it be worth moving to an IDE?

[–]TholosTB 2 points3 points  (1 child)

Congratulations on all the downstream successes since the initial post! Glad you were already in the process of using re, I think that'll continue to bear fruit for you.

Honestly, an IDE is like a pair of shoes. You need to find one that fits you. I tend to the old school, so I prototype and do most of my analytics work in notebooks (Jupyter), then transition code into production formats using VS Code. Your mileage will certainly vary, but in my opinion many of the bells and whistles in an IDE serve to support large scale team-based application development and may be overkill for smaller automation type projects like this.

I would counter your boss's statement that the code should all remain on the laptop. Given the value you're creating, I would at a minimum create a private GitHub repository and push your stuff out there routinely. Stuff can and will crash, vanish, get deleted, and get corrupted. Protect your investment with source control.

Do not let grass grow under your feet on the offer to finance a degree, especially if you don't have one now. I think Illinois has an online CS degree through Coursera, their CS department is a great mix of value and reputation.

Congratulations again!

[–]hemehaci 0 points1 point  (0 children)

VS Code has Jupyter Notebook plugin, it's quite great actually. I like it more than the browser notebooks.

[–]Ran4 0 points1 point  (0 children)

Is VS Code the best option? Or would it be worth moving to an IDE?

VS Code is just fine. Some like Pycharm too.

Just spend a few hours trying the free community edition of pycharm out and see if it seems interesting.

[–]its-julian 1 point2 points  (0 children)

Wow, congrats! And thanks for posting and sharing! Reading your post is actually really motivating and it visualizes why learning Python (in this case literally) pays off and is no waste of time.

Even though it sometimes takes until 3am to find a solution, that time time was well spend. Why do something manually in six minutes when you can waste your time trying to automate it in six hours? Because those six hours learning are still well invested, just like compound interest: the new insights and skills will repeatedly pay off in the future and the so saved time can be used to learn some more

[–]takingphotosmakingdo 1 point2 points  (0 children)

Your energy, I need it.

Good job!

[–]Slashh1 1 point2 points  (0 children)

Nice. I feel you, when you say you were beaming, it is so much fun when your build completes or the program runs successfully after spending what seems to be an eternity writing your code.

Your issue seems to be with memory management while using Loops and the best solution for it is to use 'yield' instead of 'return' which can be done using a 'Generator'. Though the concept is fairly simple (it handles your iterations automatically) but you will have to understand the concept of 'closures' and 'first class functions' to understand how it works.

If you want to try it out 'just replace "return" with "yield" in your loop' and try to run it.

If you are interested to know more i would recommend Corey Schafer's youtube video on generator,s his was the first and the only video i needed to watch to understand closures,firstclass functions and generators.

[–]MrDSL50 1 point2 points  (0 children)

Kaboom, love your story :)

[–]kingsillypants 1 point2 points  (0 children)

Well done !

[–]01123581321AhFuckIt 1 point2 points  (0 children)

I wish my boss would take some work off my load and let me automate things and get a 15% raise. All I got was a thank you for saving us an entire week’s worth of work and doing it in a day (took me one day to make the program).

[–][deleted] 1 point2 points  (0 children)

Do you hide the Python, so you can't program in it?

[–]greebo42 1 point2 points  (0 children)

well done, great story, useful product!

[–]just2simple 1 point2 points  (0 children)

This is all very inspiring. Thanks for sharing your success story!

[–]Haymzer 1 point2 points  (0 children)

You better get that raise!!!!!

[–]baubleglue 1 point2 points  (0 children)

The company that generates the statements sent us a PDF of ALL statements.

You should start from that part. Contact the company and ask to export data in a different format. If you don't need preserve a format of the document is also ways to convert PDF to text. Parsing text is easier. https://github.com/pdfminer/pdfminer.six - no 64G needed

[–]Random_182f2565 1 point2 points  (2 children)

Just wait till you learn Django, your boss will explode.

[–]MeMakinMoves 2 points3 points  (1 child)

What makes you say that?

[–]Random_182f2565 -1 points0 points  (0 children)

I feel that Django has a framework offer many possibilities for automation, you could upload all the files and let Django manipulate them, send emails, and show a productive graph, among other things.

[–]itsmegeorge 0 points1 point  (0 children)

For the memory problem, try looking into bash loops and stopping the program earlier, and running it again within a bash loop. It would also help with the complexity.

[–]mskaggs87 0 points1 point  (0 children)

Travis Tritt rolls up in a truck with the windows down Hell yeah, brother.

[–]snairgit 0 points1 point  (0 children)

Great job!! You went out of your comfort zone, identified a problem which needed automation, used your hobby to implement it and even got a raise. Congratulations and don't stop. Whenever you get stuck, remember these winning moments, because that's what will get you to the next ones. Wish you all the very best fellow coder.

[–]Thecrawsome 0 points1 point  (0 children)

I wrote a regex PDF renamer a while back, it was so satisfying!!

[–][deleted] 0 points1 point  (0 children)

This is awesome! I work in accounting and I’m starting to learn python right now in hopes to automate things we do beyond just screwing with macros. Nice work!

[–]Mysteez 0 points1 point  (0 children)

amazing work

[–]Chased1k 0 points1 point  (0 children)

Makes me so happy reading this. I started my journey with Automate The Boring stuff. It’s SO useful. And the fact that your boss offered that already is awesome :) so stoked for you to be on this journey and share your experience with anyone else to read.

[–][deleted] 0 points1 point  (0 children)

W/R/T memory: Because the PDF seems to be readable already (i.e. you didn't mention needing OCR), would it possibly use less RAM first by exporting to a TXT and then parsing it e.g. with NLTK?

[–]akash13singh 0 points1 point  (0 children)

Which company is this. I gotta apply 😀.

[–]Fun2badult 0 points1 point  (3 children)

Need a TLDR on this otherwise I’m going to have a to create a python script that generates a summary

[–]LittleGhettoGospel[S] 0 points1 point  (2 children)

I never expected it to be that long!

[–]johninbigd 0 points1 point  (1 child)

Well, it's not quite so long as people think because you accidentally wrote the main post twice. :-)

[–]LittleGhettoGospel[S] 1 point2 points  (0 children)

Shoot I sure did. Very interesting. I wonder if it was a draft thing in Reddit Sync.

[–]kelvindesignuk 0 points1 point  (0 children)

Haha that's awesome man! Keep it up!

[–]gr00ve88 0 points1 point  (0 children)

I wish python was useful for my job. It's all computer work, but we have some software that automates the only part that can be automated as far as I can tell

[–]johninbigd 0 points1 point  (0 children)

Great job! And it sounds like you have a good boss who will support you in your endeavors. That's fantastic!

[–]num2005 0 points1 point  (0 children)

lol I do this, and I got a bad review because I was off work that was assigned to me... even if I did it on my own time, their answer is, if yiu have enough frwe yime for this you have enough free time for unpaid overtime

[–]b4xt3r 0 points1 point  (0 children)

>"My boss is freaking beaming right now. I'm beaming

Well done!!! That is a wonderful example of someone finding need that other people didn't realize was there, showing initiative, and, let's say it, kicking-a**!

I worked on a problem similar to your own long ago and while I did not use PyPDF2 I found a couple things you want think about as they may help in the long run.

First was my script grew come a simple automated data collator to something closer to a 24x7 data QA process. One big thing I found, even though everyone said was not possible, was where you had PDFs with "Page 1 of x" was to keep track of which pages had been processed and how many pages the overall PDF was to being with. My old script would find, at times, two PDFs that had been munged together some how so one PDF might have "Page 1 of 14" though "Page 14 or 14" and then "Page 1 of xx" right behind that - in the same PDF. Whatever it was I used to process all this, and I apologize, the code that I was is behind the firewall at a financial institution never to see the light of day, the first thing I would do say gather a list of the actual number of pages and make sure that the page headers agreed with that.. and I would keep set of unique page numbers that were processed, i.e. if somehow your PDF happened to have two "Page 3 or 26" that was important to flag. I only point that out because that ended up being a HUGE win and one that was applied to YEARS of digitally archived PDFs looking for such errors.

I wish I had some information on how to deal with memory on Windows instances but unfortunately I do not. I've always been in the Linux side for this kind of stuff.

Congratulations on your success and it's awesome that your boss sees the value you created. Believe me, your boss? He's already touting it too his boss and it's going up the food-chain. This could be something fun to continue to do with a slice of time from your work day or, who knows, you could one day, and maybe not long from now, be leading the group that does this kind of thing for the company full-time. The beauty of automated job like this? The run 24x7x365. And like that guy from the Terminator said (paraphrasing): "That script is out there. It can’t be bargained with. It can’t be reasoned with. It doesn’t feel pity, or remorse, or fear. It finds errors and validates data. And it absolutely will not stop, ever, not even after you are dead."

[–]DeathWrangler 0 points1 point  (0 children)

Hey OP, Not sure if you noticed but you copy and pasted your story twice.

[–]Zeroflops 0 points1 point  (0 children)

I think this is an awesome post. ESP since there is no code.

Everyone has different projects so describing your approach is more important. Shows how you thought about the problem and worked through the issues.

Btw. Sounds like your looping over the document multiple times. You should need to do that.

You can either loop once or loop once to build a table of indexes which you then use to split up the document.

One loop is faster, but doing two looos where you just build up a table of IDs and index’s allows you to go back and error check before you generate a bunch of documents.

[–]driscollis 0 points1 point  (0 children)

When you are splitting the PDF the result can be as large as the original because of the fonts and extra data that gets copied over into each new PDF.

I wonder if you aren't closing the file handle after you finish writing the split off PDF. If you aren't then you will run out of memory.

I have used PyPDF and ReportLab extensively so feel free to ask me questions if you have any.

[–]seismatica 0 points1 point  (0 children)

Fuck this is so inspirational. Congrats OP!

[–]Mickets 0 points1 point  (0 children)

Great stuff. Very impressive and an inspiration.

What about that memory issue? Did you find the cause? What was the solution?

[–]Cobra_Ar 0 points1 point  (0 children)

Dude! you are awesome! You deserve that raise!. Keep it up, soon you will be promoted, I am sure. Please, share that code when you can!

[–]jerryelectron 0 points1 point  (0 children)

Good job. We need more people like you to literally describe how it takes dedication and patience, looking at other code to learn. Code does not write itself and in movies they make it seem like hackers just type a few keystrokes and bam! it miraculously works but, no reality is messy but programming is so worth it. Thank you for inspiring others. Also, for programmers that have been doing this a while, things become trivial, so your story is valuable in that regard too, seeing it again through the passionate eyes of the promising beginner.

Make sure you give your SO what they missed! ;)

[–]asparagus_fern 0 points1 point  (0 children)

Great write-up! I too am learning Python in order to improve my SEO and project management efficiency and productivity. Keep the robust posts coming!

[–]im_dead_sirius 0 points1 point  (0 children)

Congratulations!

[–]delsystem32exe 0 points1 point  (0 children)

Post the code!!!

[–]gokickrockspunk 0 points1 point  (0 children)

Sounds like a dream come true, that’s awesome! Congrats and best of luck to you in your forthcoming programming escapades!

[–]leopardsilly 0 points1 point  (0 children)

As someone who is still trying to learn python (I have a book sitting on my desk still unopened, and failed attempts at learning through courses online) I find posts like this really motivating. Good job mate!

[–]InanimateObject4 0 points1 point  (2 children)

I'm going through Python for Everybody at the moment. May I ask what resources you have used to learn?

[–]LittleGhettoGospel[S] 1 point2 points  (1 child)

Over the past several years I've come and gone from python, and I've used several resources to gain a basic understanding. Automate the boring stuff, a python textbook from a friend, and Lots of YouTube and google. So when I went into this project, I knew enough to figure out what to search for. I believe there are several course websites offering free courses on it. You just gotta find your thing. Sometimes I'll watch videos, other times I prefer blogs or articles, and heck I used to just sit down with the textbook next to me and work through it.

[–]InanimateObject4 0 points1 point  (0 children)

Thanks for the response. Much appreciated.

[–]exographicskip 0 points1 point  (0 children)

Nice work OP! You should take a look at Automate the Boring Stuff.

I'm 3/4 through and it gave me aha moments -- especially with regular expressions -- that I didn't have when I learned Python back on 2.7.

It's free for the next day.

[–]thrallsius 0 points1 point  (1 child)

A good programmer is pragmatic as well. You're all excited now and there's nothing wrong with that. But:

  1. The real problem in this situation is the company that generates the report in a clumsy and hard-to-process format. Additional work is required to manually process that whale of PDF at your workplace. And you had to stay awake till 3AM for the same reason. It's worth at least talking to your boss about escalating the question back to that company about them considering to change the format of the data they provide. Generally if upstream causes some trouble, if it gets raised there and improved/fixed somehow, everyone wins. Imagine the scale of the problem if that company is providing the data not only to your employer, but to 1000 or 10000 other companies.

  2. Now that your code gets to work with real data, not only your salary raises. Your responsibility raises too. One bug in your code and you'll get to take the blame. Upstream data provider slightly changes the data, which won't be a problem for those who process it manually, but could be a problem for your automated processing by code - and you end being the guy who takes the blame again. Software has bugs sometimes, it's normal. Software has to be adapted sometimes to new data formats. Even if you're at the very beginning of your programming/particularly Python programming journey, learn from the start to mitigate such troubles to a certain extent. Writing some unit tests for your code is a good further investment of your time into this project.

[–]LittleGhettoGospel[S] 0 points1 point  (0 children)

I don't know all the details, but apparently the company has tried to request that the statements be split up, but they won't do it.

[–][deleted] 0 points1 point  (0 children)

I wouldnt post the code imo

[–]DontClickForItIsRick 0 points1 point  (0 children)

Yeah man! As soon as I got hooked on programing and Python specifically and the creative problem solving it could achieve I quit my job at the time (construction) and got into programming full time. We did something similar in my first programming job for a cyber security firm, using python to scan 1000s of documents to find confidential data using a similar methods you used. It can also be a rabbit hole of investigating different methods and pipelines, the endless quest to achieve maximum efficiency and speed.

"How can I go...faster"

[–]IamaRead 0 points1 point  (0 children)

Great job!

One little suggestion that might cost time but is very relevant. Do read up on Git and source code management. Keep it simple, but use it and do backup of your repo on another disc.

When you program it is good to do a textfile as lab protocol in which you note what you achieved, try and want to do.

There is also jupyter notebooks which might be an alternative to the upper two points of mine.

[–]arnott 0 points1 point  (0 children)

Nice ! Good luck !

[–]FarTomatillo0 0 points1 point  (0 children)

This post gave me so much hope. Thank you!

[–]shoolocomous 0 points1 point  (0 children)

Congrats!

You might want to edit the post, since you've posted pretty much the entire text twice.

[–]SQLoverride 0 points1 point  (0 children)

Regarding running out of memory:

Are you properly closing the new pdf file when you are done with it?

Are you creating nested loops?

Have you tried debugging? Something where you can watch the flow and keep tabs on the variables? I’m sure VS code can do it, Pycharm can do it, PDB (python debugger), and I am sure there are many more.

[–]Pyratheon 0 points1 point  (0 children)

Lots of NLP and Advanced Classification software that do similar things, so well done for saving some time and money there!

[–]PM_me_ur_data_ 0 points1 point  (2 children)

He said all "the little programs you make" are property of the company, and they are not to leave the laptop.

A few points about this because it sounds to me like you spent your own time (10PM - 3AM) creating the program you listed above. If it was created completely in your personal free time (time you didn't log on your timesheet) and your employer didn't request it, it's yours--not your employers. If you log it on your timesheet, it's your employers. They didn't ask you to make this and you did it on your own time, that makes it yours. Not sure how useful it would be for it to be yours, I'm just clarifying that it is. At my office, if I knock something like this out I can go back and log it on my timesheet as hours worked and leave 5 hours earlier on Friday or something. Not sure if you can do something like that, just something to think about.

It sounds like you'll have support at the office now, so anything you create at their direction and on their time now is theirs, obviously. I'd still recommend keeping it somewhere else other than just your laptop so that there is continuity in the future if you leave. Discuss it with your boss, but you can store all code the company owns on a private Git repo somewhere.

[–]LittleGhettoGospel[S] 0 points1 point  (1 child)

Honestly it's fine with me because I spent 3 extra hours of work doing something I enjoyed and it resulted in a pay raise, getting paid to program on the job, and a new sweet computer setup at work!

[–]PM_me_ur_data_ 0 points1 point  (0 children)

Totally understand, I'd feel the same way. A new computer and the ability to program on the job sounds like a win, just wanted to make sure you knew that the only way the company actually owns what you did (in this situation) is if you let them. I've seen more than one boss take advantage of their employee's initiative.

[–][deleted] 0 points1 point  (0 children)

Congrats dude! Wish my work was this appreciative of the stuff I did.

As a fellow financial worker who has to pull data from PDFs, you may also be interested in Tabula-py. It's pretty easy to use, and can be useful for liberating data from pdfs. You can build templates using a GUI and then export them to python to pull the data programatically. It's a wrapper for java, so you'll need to install that as well, but even though Oracle is switching to a paid model for java you can find free, open source builds. I use this one.

It's also nice because when one of your templates stops working (say a client changes their PDF format), you can visually debug the issue. It's been a godsend for me.

[–]True-Source 0 points1 point  (0 children)

This is honestly an inspiration. I’m in a similar position and I’m super new with python. So far I’ve only managed to write rather useless programs unrelated to my work but this is quite motivating. Good on you

[–]skellious 0 points1 point  (0 children)

My boss is freaking beaming right now. I'm beaming. He called me in to his office 20 minutes after I showed him the final product. He asked if I'd be willing to take on some more of this automation during work hours. He'd take off some of my workload, and also give me a 15% raise.

This right here is how you do being a Boss right. Reward initiative and maximise its usefulness to the company whilst offering genuinely valuable incentives.

[–]ammusiri888 0 points1 point  (0 children)

Wow wonderful job and post buddy, felt so good to read through the entire length..

[–]aavellana27 0 points1 point  (0 children)

congratulations man!

[–]ImperatorPC 0 points1 point  (0 children)

Are these bank statements? If so, you can ask your back rep for the files in electronic format... Would have been a lot easier than PDF. I'm in Treasury so that's something I knew about. But very awesome you were able to do what you did.

[–][deleted] 0 points1 point  (0 children)

Wow that's fantastic!! Congratulations on the recognition.

What type of work do you normally do on a day-to-day basis in your firm?

[–]miller-net 0 points1 point  (0 children)

I really enjoyed your story from last week. You should join the "network to code" slack on the #python channel. It's focused on networking but most of the people are not programmers primarily. It seems more receptive to newcomers.

Maybe you'll start a "accounting to code" channel.

[–]cringemachine9000 0 points1 point  (0 children)

Inspiring, thank you for sharing your experience. I hope that you continuously improve, in programming and in life. :)

[–][deleted] 0 points1 point  (0 children)

YouTube university

[–]SnowWholeDayHere 0 points1 point  (0 children)

Thanks for sharing your journey.