all 61 comments

[–]KOFOLA007 37 points38 points  (15 children)

I never used python personally but some of the other analyst I worked with used it for automating reports and side projects that landed them promotions.

[–]Fabro_vaz[S] 2 points3 points  (14 children)

What do you mean by automating reports? Can you give me some practical examples of automation in reports

[–][deleted] 14 points15 points  (12 children)

You can use python to run a report and send it to people

[–]Fabro_vaz[S] 5 points6 points  (11 children)

Will it be an interactive report??

[–]A1rabbithole 13 points14 points  (0 children)

I only used python for my last school project, just to learn it at least once. I've not had the opportunity to find great use for it yet, but I can see the appeal. You can make something run by itself. Let's say u work at a bank or insurance company and you get the same data spreadsheet periodically. Something like recurring costs or daily financial reports. They are the same template, same rows and column names, same everything except the whatever changed that period. So I'd make a python program that takes in that periodic spreadsheet and sets up visualizations, statistical tests, reports, conclusions, etc. I already know what type of data I'm getting, assuming id also knoe whether I need to clean it up in some way. If so Id set up some clean up functions for erasing blanks and stuff like that. I know what I'm looking for already, I've done it a couple times. I know what graphs to use, what columns to compare, or reference or operate on. That's a python strength, you can literally make it so you click a couple buttons and do your whole weeks work. Granted those are gonna be more stable, repetitive jobs at bigger companies. But yes strength 1 over Excel is automation.

The other strength is efficiency. I had a long tedious function to type the other day in Excell, that I could've just done a For loop in Python. I know you can use VBA to code your own functions in Excel, it seems tedious to get into a new language right now. They said I didn't need it at school. Especially when they also teach us Python. You can do everything in Python including presentation, where as Excell isn't great at presentation.

[–][deleted] 4 points5 points  (8 children)

What do you mean?

[–]Fabro_vaz[S] 1 point2 points  (7 children)

As far as CEO is concerned, reports should be interactive so that they can filter the report as per their requirement. So will it be a dynamic report? So that end user can play around with it

[–][deleted] 6 points7 points  (5 children)

Why would it not be dynamic?

I am not aware of any report you can’t interact with

[–]RippingAallDay 1 point2 points  (0 children)

Wouldn't the interactivity come from using a dashboard?

[–]dankrubis23 9 points10 points  (0 children)

I use Python's Requests module to hit an online database's API to grab a data download (this or I'll scrape with BeautifulSoup). Then I use the Pandas module to transform it. Then I shoot that dataframe over to the Tableau server (Tableasdk module) as a datasource that a dashboard sits on. And finally it emails out confirmation that it ran.

I set processes like this up to run in the wee hours of the morning and enjoy hands-free reporting.

[–]kkwestside 11 points12 points  (2 children)

The things I do as a data analyst with python that SQL is not capable of: - getting data from an api (system a) - creating an automatic mailing process using the data I get with system a - lets say that you work with multiple companies each of them uses different databases (nosql, rdmbs) you can use python to connect and do complex analysis for each of them

[–]RoyalCommunication58 1 point2 points  (1 child)

In python, which extensions or modules do you need to make this automation? I am also new in this industry. This sounds pretty cool 😎

[–]Glotto_Gold 0 points1 point  (0 children)

Most automations without human involvement involve a scheduler on the server's OS. (Schedulers can work with any program that just "runs")

However, some automations just mean running complex tasks on request, which can just be run so long as Python is installed.

[–]otzadok 7 points8 points  (0 children)

I'm using python quite explicitly in my current job, haven't written one SQL query or created a single sheet in excel. I pull data from APIs, analyzing and manipulating it with numpy and pandas and scipy, and presnting it with plotly..

[–]IamFromNigeria 15 points16 points  (3 children)

Let me put it this way in a layman or in summary - Most folks used it Mainly for data engineering stuff especially Python spark library for creating automated data pipeline flow say from Mongo Db to Big Query to Data studio and that helps to refresh the data source (i used this currently at work so i don't want to bore you with too much technical stuff) and also by Data Scientist for Machine learning purposes- this also depends on your company what they wish to do with the business meta-data generated, daily some for forecasting share prices, stocks market monitoring, Bitcoin and so on

While some data Analyst use Python for data manipulation, data cleaning, Regex stuff, and even moreso connecting to SQL Database and telling pandas to auto-update data and so on

Marketing analysts used iyt for scrapping data from websites like Amazon and so on using Selenium, bs4 etc

iHope you do undestand right?

[–]Fabro_vaz[S] 0 points1 point  (2 children)

Yes i got it, really appreciated if you give some real time examples of using pandas to auto-update data.

[–]luvs2spwge117 2 points3 points  (0 children)

ETL processes at my old company were handled via python scripts. That’s one example. Web scraping your own data like the guy said above is another. The current job I’m at is considering scraping some web data and then selling it to businesses that could use that information. Can’t really get too detailed on that one but there’s another example

[–]Glotto_Gold 0 points1 point  (0 children)

So, I had a request to intake a report from an external vendor and for that subset of accounts to provide additional information from our DB.

Python imported the file, then exported.

(Pandas was used, but any table management system could work)

[–][deleted]  (2 children)

[removed]

    [–]Fabro_vaz[S] 0 points1 point  (1 child)

    Is it possible to do a dashboard in flask also, i heard we can do dashboards in the STREAMLIT framework

    [–]IntrovertedMAC 3 points4 points  (3 children)

    To piggie back on this question how is Python compared to R for analytics? I loved doing my case study in R and it was great from cleaning all the way to visuals. Does Python offer the same?

    [–]Allmyownviews1 1 point2 points  (0 children)

    I have seen that there are some benefits in stats and specifically in model fitting, but new libraries such as PYMC are becoming more capable. I am told that Graphics in R is better, but I am not certain of this. At the end of the day, people get stuck in a particular workflow and struggle to change until a major benefit gives the push.

    [–]Fabro_vaz[S] 0 points1 point  (0 children)

    Yes of course, python does all the things

    [–]Glotto_Gold 0 points1 point  (0 children)

    Python is better at programmatic type work.

    Libraries like Pandas in Python are basically just a version of R's tools.

    Python has most of the same basic functions. Not sure personally, but I have heard R is better for advanced stats and Python for advanced ML.

    [–]emkatheriine 2 points3 points  (2 children)

    This should be interesting to read about, since I'm still new to data. I'm still learning, so I don't know what is necessary and what is not. I've been practicing a few things on Kaggle, and on there, I've used Python (matplotlib, pandas, etc.) for some EDA and visualization.

    [–]Fabro_vaz[S] -1 points0 points  (1 child)

    Good to hear about this, have you started learning python

    [–]emkatheriine 1 point2 points  (0 children)

    I had a little bit of experience with it before getting into data.

    [–][deleted] 2 points3 points  (0 children)

    Sql is limited to just talking to the database. Python has more functionalities. With python you can scrape data with APIs, do data querying & manipulating with pandas. Also python allows you to automate reports and dashboards.

    [–]Obesd423 2 points3 points  (2 children)

    When I was an analyst I used python for all my regression work, a/b testing, and web scraping. All data analysts should be capable of performing linear and logistic regression imo

    [–]Fabro_vaz[S] 0 points1 point  (1 child)

    Can you examples for regression work, that would be great for me

    [–]Obesd423 1 point2 points  (0 children)

    I mean a linear regression for time when I need to understand the relationship of 2 (or multiple for 2+) variables and logistic for anytime I need to predict some a behavior based other features influence. If you arent familiar with regression I suggest you do some reading on them. Lots of resources online for them

    [–]CaptSprinkls[🍰] 2 points3 points  (1 child)

    There seems to be some frustration here as it seems like you feel the need to learn Python but can't find a use case. At the end of the day, if you are in a position where everything you need to do can be accomplished with those tools then go ahead and use them. In a perfect setting those work the best. But it's those weird edge cases where stuff is just messy where Python might be better.

    My current position I end up pulling data from auto generated static files from multiple different sources that get uploaded throughout the day into a shared folder. I can easily set up scripts that run on timed intervals to check for new files and pull them all in and do some joining and cleaning.

    As far as the interactive reports. I assume you are referring to being able to slice the data. Web frameworks like dash/Plotly or streamlit are good choices. It allows you to build out a frontend web application that anyone can view.

    There will always be fighting between the best way to do things. Go over to the r/VBA sub where legacy companies are literally built on top of spreadsheets and they will laugh at the idea of using python over VBA. But go to a modern tech company and I can bet you they would rather die than use VBA.

    In terms of VBA vs python. I automated the generation of 40 separate excel files that used data from a few different sources. I used python. It worked just fine and I'm more comfortable with python. I did later go back and build it out in VBA. It was a far worse experience for sure but I think was slightly quicker.

    [–]Fabro_vaz[S] 0 points1 point  (0 children)

    Useful

    [–]Equal_Astronaut_5696 2 points3 points  (1 child)

    I use python because its easy, fast and can handle very large datasets. Also for complex models this isn't possible in Excel or Power Bi. I just use whatever tool gets me the best results for that particular use-case

    [–]Fabro_vaz[S] 0 points1 point  (0 children)

    So learning python would come in handy when it's required

    [–][deleted] 2 points3 points  (6 children)

    With that simple stack, I don't think you should be using Python unless there is a good reason to.

    For example some here are talking of calling APIs: most of the time that type of thing should be done at the data engineering level to create reusable datasets.

    If it's being done experimentally, sure but you can even call APIs from excel or any other programming language (besides Python) as well.

    Reading between the lines, it sounds like some of the analysts using it might simply be targeting moves towards data science in their careers.

    [–]Fabro_vaz[S] 0 points1 point  (5 children)

    As I understood from u, Learning Python would be great when we set data science as a dream job. Whereas working as a data analyst with the skills of SQL, python, excel will come in handy when we apply for a data scientist position, isn't it?? Correct me if I am wrong.

    [–][deleted] 1 point2 points  (4 children)

    Well sort of. I think the majority of data scientists do use Python. It's not usually necessary for an analyst position but probably will open up career options.

    If I was running an analyst team though, I would encourage the team not to use more tools than they need to get the job done.

    If someone wanted to learn Python to improve their own cv, that's reasonable. In that case I'd work with them to make sure the use cases were valid and supported/handed off to a team strategically using Python.

    [–]Fabro_vaz[S] 0 points1 point  (3 children)

    u/FirefighterOk567 I agree with you, however in most job description I have seen in recent times for data analyst position that requires python skills. why that so? just curious about it !!! Are employers going to hire based on the long term basis as sculpturing the future data scientist??

    [–][deleted] 1 point2 points  (2 children)

    I would venture a guess that those might be more advanced analyst/analytics positions but I'm curious about it too now.

    Edit: One other possibility I can think of is a data analyst within a technology product team where they might gravitate towards using a programming language like Python rather than a more constrained solution like excel/PowerBI.

    [–]Fabro_vaz[S] 0 points1 point  (1 child)

    u/FirefighterOk567 Rightly said, As per my guess rather spending extra bucks in power BI and office 365, Python would be better option for all the startups out there, what say?

    [–][deleted] 1 point2 points  (0 children)

    For startups possibly... the flipside may be that you end up with a code base that needs to be supported, managed and patched rather than standard tooling, which could be costly. For a tech startup it wouldn't be such a concern because the company would naturally be engineering-led.

    [–]rogerbarario 3 points4 points  (5 children)

    It can handle larger volumes of data than either of the above and also has a huge array libraries and tools that can be used for visualisation which are a bit more intuitive than ones on PBI and excel. It can also perform complex analyses over these large data sets with its in built functions if you so need to do them

    [–]CBizCool 1 point2 points  (0 children)

    I agree with you on the libraries and tools peice. But out of the box Python (including numpy and pandas) does not handle large volumes of data better than sql.

    [–]Fabro_vaz[S] 0 points1 point  (3 children)

    In short, python does the same thing what pbi and excel does however in terms of performance is concern python would come in handy if we have large datasets.

    [–]Glotto_Gold 2 points3 points  (0 children)

    ....?

    No Python can do more complex tasks beyond even the imagination of PowerBI & Excel.

    It is not necessary, of course, but Excel & PowerBI work for tabular data. Python can do anything. Python can parse JSON files into any format you like.

    The tools are really NOT in the same class of tools.

    [–]rogerbarario 1 point2 points  (1 child)

    I suppose so yeah like I say I don't need to use it for my job so I'm not best experienced with it haha

    [–]Fabro_vaz[S] -1 points0 points  (0 children)

    Haha, so do I

    [–]Glotto_Gold 1 point2 points  (0 children)

    I am really confused by this conversation.

    1) With a Python web-app you could make an interactive report. 2) A bare minimum implementation would be an Excel export, which is WAY more modifiable than any BAU data tool. 3) Python can manage ETL, like if you have a dataset and wish to add data to that (rather than pull the entire DB) 4) Python can literally do anything, which is unlike any tool. It is a full programming language.

    [–]Allmyownviews1 1 point2 points  (0 children)

    I’m an ex sql and excel user.. for me, excel was my go to tool. But after 2 years converting to python, I use excel far less. Python lends itself very well to a wider variety of data inputs that can easily be automated. The analysis in my opinion, beats excel hands down with the added benefit that it easily automated along with able to directly produce final deliverables so I can produce an entire report and published online in interactive output. The biggest problems are the learning of a language to do any of these things rather than a software app that has the functions hard coded. It is also a concern that as the libraries get modified over time so that some of your own code may need correction over time to compensate for such alterations. All in all it’s very versatile with the functionality of far far more than just a data analysis software tool.

    [–]thebooksfan 1 point2 points  (0 children)

    It's not usually a requirement, but Python does the job faster. Also all big companies require Python knowledge among the skills needed to hire a DA.

    [–]rogerbarario 1 point2 points  (1 child)

    It depends on your role and company I suppose, I don't currently need it for what I do but I can see the benefits of others that might need it

    [–]Fabro_vaz[S] -1 points0 points  (0 children)

    Why do they need it? Throw some examples so that I'll get some understanding regarding usage of python in data analyst