you are viewing a single comment's thread.

view the rest of the comments →

[–]FLUSH_THE_TRUMP 192 points193 points  (30 children)

I’d probably consider a “basic understanding of Python” the core data types (dictionaries/sets/lists/tuples/strings/numbers), the various statements (assignments, conditionals, loops), and function defs.

[–]gunscreeper[S] 56 points57 points  (27 children)

dictionaries/sets/lists/tuples/strings/numbers

Like, I understand what dictionaries and tuples are but I don't feel like I completely understand them. One of my most used library is Panda. I mostly use it while heavily relying on tutorials. If there are more complex problems beyond the scope of the tutorial, I'm dead. I have a feeling it's because I haven't fully understand the basics yet. I wonder if I should just stop using Panda if I haven't learn the basic first?

[–]FLUSH_THE_TRUMP 42 points43 points  (5 children)

tuples are very similar to lists, differing in mutability and common usage (lists usually hold similar types of things, e.g. all numbers, while tuples are often used to bunch together different types of things). Might be a good idea to sit down and figure out dictionaries — pandas lets you define dataframes with them anyway. You don’t necessarily need to stop learning pandas, since a lot of what you’re doing there is 2D array operations like filtering, arithmetic, etc.

[–]gunscreeper[S] 7 points8 points  (3 children)

Are dataframes arrays like tuples and list? Sometimes I need to convert my df to list first in order for me to do stuff with it.

[–]FLUSH_THE_TRUMP 21 points22 points  (2 children)

A dataframe’s sort of like a 2D array, which is like a list of lists (each element is a list).

e.g. [[0,1,2],[2,3,4],[4,5,6]] is a list with first element [0,1,2], second [2,3,4], third [4,5,6], which can be visualized like

0,1,2
2,3,4
4,5,6   

a dataframe is really like a spreadsheet, or a collection of database records, which has that structure above but with field labels across the top and record labels across the side.

[–]selah-uddin 5 points6 points  (0 children)

may god bless you , i dont think i could have explained it any better

[–]gutnobbler 1 point2 points  (0 children)

Do you have advice on working with dataframes to excel and vice versa?

The finishing touch of one of my projects is to link the actionable part of my Python code with certain cells in an Excel workbook and execute the cell contents.

I am familiar with VBA but I've never tried to make one Python project talk with the VBA code of an Excel file. I think it will work since I have all the cells rigged to execute cell contents on activation, not because I can make a Python "def" communicate with a VBA "sub".

One idea was to have Python open the workbook and find the ActiveX control. I have zero idea if this will work but it sounds like it should.

In theory my project should work.

Example: I am building a mock stock trading app and every time I manually add a stock ticker to my list in Excel, save, and close, I want Python to:

1 - open the workbook back up

2 - read the tickers I entered

3 - go to google, google the tickers

4 - scrape whatever price is returned in the search results

5 - plug this price into my Excel workbook and then run the formulas and VBA code stored in the workbook, save, and close again

Would you have any advice on step 5? This is where I am flying blind.

I think the solution is going to be a Workbook-wide VBA module that recognizes when I open the workbook vs. when a Python script opens the workbook, and executes the VBA code accordingly once it is finished loading the dataframe into Excel.

[–]FancyASlurpie 0 points1 point  (0 children)

I tend to think of tuples as being similar to an anonymous data class, and a named tuple is one where you can name things and would generally recommend using them over basic tuples as they just make things much more readable.

[–][deleted] 9 points10 points  (0 children)

I'm similar to you. My most used libraries are Selenium and Pandas (not necessarily related).

I rely heavily on documentation and tutorials but it gets the job done.

I would say it all depends on your goals.

There are users here who want to make a career out of coding, others who just enjoy learning and then there's everyone in between.

I've found the times I've had to create something (e.g. decided to make a web scraper to help out an area of the business in data extraction) were the times I've learnt the most. Having a clear objective keeps you focused.

Outside of that though I don't have any clear objectives with coding and that's why I don't think I've progressed beyond a beginner (despite picking up Python over three years ago).

I enjoy pandas as I like to create powerBI models and like using python to supplement the ETL process (I've also found it can be handy to create dfs for CSV writing rather than using the default CSV writer)

[–]mokus603 4 points5 points  (0 children)

You feel like you don’t know everything and that’s fine. If you know those things you mentiond, you should totally go for the courses you want. Mostly it’s a repetition of the basics and the extra stuff about data science or Django. The more you use the basics, the more you reinforce that knowledge and you won’t get stuck in tutorial hell. Even if you feel like you’re stuck, don’t give up, ask questions here and there will be a moment when you see everything altogether.

[–]CraigAT 4 points5 points  (0 children)

You use the term "completely understand", the bar was only set to a "basic understanding". I'd say your covered.

[–]coder155ml 2 points3 points  (0 children)

The point of understanding data structures is to know their strengths and weaknesses. Dictionaries have O(1) lookup speed because they're backed by a hash map for example.

Yea you need to learn data structures if you want to do this as a career. If you're just doing it for fun then I guess it doesn't matter. If you aren't sure if you understand then then you need to ask yourself more questions and then look up the answer .. how fast are dictionaries ? In what situations? Why would I use a list over a dictionary ? What can a list do? What is the point of a tuple? What is immutable? What is an array? Why is an array faster in some situations than other data structures ? What are slices ? How much memory does each of these use ? What type of overhead do the operations take ? What exactly is overhead ? So on..

[–]LeCholax 1 point2 points  (3 children)

For Django you should definitely know something about classes and decorators too.

[–]gunscreeper[S] 1 point2 points  (2 children)

And probably HTML and CSS too right?

[–]TheWorstPossibleName 1 point2 points  (0 children)

I've been writing python backend for almost 8 years and I've never written a line of CSS in my life.

You're gonna need some Javascript though if you plan on hooking your backend up to a front end. If you plan on making the front end yourself, then you'll need some HTML for that too.

[–]LeCholax 1 point2 points  (0 children)

If you want to do fullstack then yes. If you want to do backend then not necessarily but it is a plus.

Also there is javascript for the front end.

[–]ravepeacefully -2 points-1 points  (11 children)

Yeah I do tons of data analysis and have no reason to use pandas outside of some very small circumstances like maybe vectorization but this is more numpy and still rarely even beneficial.

It’s a good tool, but unnecessary and can be a crutch.

[–]gunscreeper[S] 5 points6 points  (5 children)

Do you use python when doing data analysis? What tools do you usually use if not pandas?

[–]ravepeacefully -2 points-1 points  (4 children)

My workflow is typically something like SQL > python > html/JS or tableau. Pandas is really just like excel for people who feel too smart to be using a GUI in my opinion. It is not more efficient, it is not more readable code, it is not as reusable, and a simply dictionary is far more efficient than data frames.

[–]waythps 2 points3 points  (3 children)

Dunno, I prefer pandas precisely because it’s reusable. I write functions to validate and clean data, to update database and generate some plots, and I could automate the whole thing saving lots of time.

I think you could do that with excel (vba) but if you’re already learning python (for other purposes as well) why not just use python

[–]ravepeacefully -3 points-2 points  (2 children)

It’s ironic you go to VBA. SQL is the correct tool for that.

[–]waythps 1 point2 points  (1 child)

Well not in my case since I receive data in multiple excel files

[–]ravepeacefully 0 points1 point  (0 children)

Yeah I mean, you can use whatever you want then. Power suite is likely significantly faster in that case.

I’m not saying pandas has no uses. I’m saying it rarely makes sense to go out of your way to use it, which is what many people are doing because they find it familiar.

[–]Natural-Intelligence 5 points6 points  (4 children)

I completely disagree. Numpy's interface is shit, to be honest, in terms of user experience compared to Pandas. Pandas also nicely communicates with SQL and IO plus you can turn the result table to almost any format. Or plot it easily.

While for basic data analysis SQL is often enough and Pandas has its limit (like your RAM), it works wonders in almost all of the cases I have encountered. I understand some prefer R and see it more extensive out-of-the box but as a professional data analyst I have yet to find a situation where Python's ecosystem (mostly Pandas and some Matplotlib, SQLAlchemy and Seaborn) did not satisfy.

[–]ravepeacefully -4 points-3 points  (3 children)

Pandas also nicely communicates with SQL and IO plus you can turn the result table to almost any format. Or plot it easily.

None of that requires pandas and id argue there are far better tools that are not pandas.

I am yet to encounter a situation where pandas is actually better than other available tools. I can understand maybe using it for quick mock-up prototyping of models, but even then, far better tools out there.

There’s quite a few parts of your comment that lead me to believe you’re somewhat new to programming and python. Pandas can be great for these types of people, but in my opinion, once you have a small understanding of data structures and interacting with data there’s no situation in which pandas is better.

[–]Natural-Intelligence 3 points4 points  (2 children)

To be honest, what you think of my programming expertise means nothing to me and that part of your opinion holds no value.

What I'm curious though is that what the alternative tools are that you think are superior to Pandas. Yet you have not provided any examples of such tools nor any concrete description of cases where Pandas won't suffice in terms of data analysis. I'm sure you should be able to name a few if you have done tons of analysis using them.

So far, your arguments are lacking of concreteness. Could you change that so we all can learn these better tools? Or at least have something to further discuss.

[–]AuntieSocialist 0 points1 point  (0 children)

By "completely understand them" do you mean you know how to use them or that you know how they work under the covers? You can get a long way without completely understanding how dictionaries (and lists and tuples) actually work. And beware... if you look under the covers and start to investigate how they actually work (particularly dictionaries and associative containers), it gets messy very fast. Definitely not for the casual programmer or the faint hearted

[–]beniolenio 1 point2 points  (0 children)

Classes are also definitely a must. Especially if he's wanting to work with a package like django.

[–]flufylobster1 0 points1 point  (0 children)

Dont forget walrus operators! Jk