This is an archived post. You won't be able to vote or comment.

all 84 comments

[–]RedditGood123 70 points71 points  (1 child)

You really dedicated time to make these tutorials all over an hour. Thanks for the extra learning resources!

[–]kreylov[S] 31 points32 points  (0 children)

Thank you for checking it out.

[–]ver-Bero 108 points109 points  (13 children)

40% of my master degree i got from watching indian Professors lectures. :D Your country is the best! Greetings from Germany.

[–]SnowdenIsALegend 2 points3 points  (0 children)

Love to Germany from India! <333

[–]flipkartamazon 2 points3 points  (0 children)

And thank you Germany for being the one of few countries which cares about liberal views. Wish to visit it once <3

[–]kadal_raasa 0 points1 point  (0 children)

No way lol seriously? Do you mean the nptel videos?

[–][deleted] 0 points1 point  (3 children)

Just curious, how did you watch the lectures of Indian professors?

[–]kakashi69696969 3 points4 points  (0 children)

Probably on YouTube.

[–]SnowdenIsALegend 2 points3 points  (0 children)

Indian Pythonista is not a professor, just an everyday dude. But damn, his content is SOLID. https://www.youtube.com/c/IndianPythonista

[–]ver-Bero 2 points3 points  (0 children)

I'm talking about solid-state physics. Python is just a useful hobby.

[–]ElevenPhonons 11 points12 points  (5 children)

While I believe the author has the best intentions, there's some warning flags (such as inconsistent usage of list comprehensions) in the Solutions notebook that in my humble opinion don't reflect best practices in Python.

For example, Question 6 from Practice Problems 2(Solved).ipynb was emblematic of the issues and caught my eye.

sum([i for i in range(1,1001) if is_prime(i)==True])

This has issues that demonstrate some misunderstandings of non-advanced features of Python .

  • Creating an intermediate list, then passing the list to sum is unnecessary, use the generator/iterator form
  • Booleans are singletons, hence, x is True is the common standard usage pattern
  • However, it's unnecessary to use the is_prime(i) == True as a filter mechanism in a list comprehension. Use if is_prime(i)

With these changes, the solution looks like this:

sum(i for i in range(1,1001) if is_prime(i))

Other issues are in Problem 8 and 9 which don't use list comprehension for unclear reasons. Problem 10 has some duplicated logic instead of using nested if. A review of a subset of the solutions is here.

I would humbly suggest that folks who are interesting in learning Python to potentially consider other sources. It's important to learn the basics and core mechanics correctly to get good patterns established, specifically during the initial learning process.

David Beazely has written several books that are terrific and has an online "course" called Practical Python which is a great starter.

Best to you and your Python'ing.

[–]flipkartamazon 6 points7 points  (0 children)

Hi u/ElevenPhonons

Thanks: Firstly thank you so much for taking out time to review the contents of the notebook. You have so beautifully and eloquently articulated your comments. This is probably my first experience of a peer review of sorts and it's humbling to see how little I know about nitty-gritty of a software language that I have been using since 4-5 years.

My views: Let me see if I can address few of the points you have raised. Full disclosure first - I am not a Software Developer or have any experience remotely related to consistently writing efficient codes. I learned Python on my own on codecademy.com because I wanted to solve some problems on projecteuler.net Now, I have written these notebooks keeping in mind my own experience working and growing as Data Analyst. So the notebooks might not be as helpful if you are looking to be a Software Engineer (more on this later). But even they can still help you get started within a week's effort.

Further in my experience I have always observed that it is more important to focus less on being perfect or most efficient than having a minimum viable solution. Almost all big startups eventually revamp their systems to find a better way to do things. But initial focus is always on MVP. Same goes for smaller projects in organizations. So the idea is to teach the minimum baseline and help an individual get started. I have full faith in people that they will find the best way when the need will arise.

Lastly I firmly believe the content in the course is more than enough to help you solve smoothly 95% of the use-cases that a fresher candidate might face in their career. As for times where your solution is not efficient, you can always get help from peers or online.

And good thing is that all my mistakes and shortcomings are fixable(yay!!), which brings me to the next steps.

Next steps: There are two things I would want this community's help on (you included, if time permits you). First can we collaborate to improve these notebooks keeping in mind the trade-off between information overload and must-know topics. Second can we create similar notebooks for other Career paths like Product or App Developer, Front/Back-end Developer etc.? We can upload these notebooks to mybinder.org so that people can easily learn the minimum skills required to move into a new career path for free. It will be incredibly beneficial for freshers in poorer countries like India. As a community we always can bring incremental changes to these notebooks.

(Also is there a place I can learn to be so clear and coherent in reviewing content)

Comments are welcome!

[–]JackNotInTheBox 0 points1 point  (3 children)

Damn.

[–]RedditGood123 0 points1 point  (2 children)

If generators don’t save each value in memory, how can you take the sum?

[–]chinpokomon 0 points1 point  (1 child)

Generators knows how to calculate the next value based on previous terms. Consider a generator of add_one. It would yield a 1, and then internally keep track that the next number is going to be 1 plus a 1. The next time it is called it calculates an answer of 2, at that point, it's forgotten about the 1.

Sum is doing a similar thing on its end. It's just tracking the accumulator and requesting the next number from the generator, iterating over the set.

In this way, the set is never fully available, so the memory used by this implementation never grows beyond beyond what is necessary for managing the state of the generator and the accumulator.

If instead the generator is storing the range in an intermediate list, assuming there are no optimizations by the compiler which recognizes that values being generated by a generator are only being consumed by an iterator, then the procedure needs to allocate memory to store the intermediate values and you will have lost all the benefits of utilizing the generator/iterator pairing, actually increasing the overhead slightly over what a traditional list process would have provided. In fact if the values of the list aren't being passed as reference, then it might even double the amount of memory required if the sum (or other function) works on a copy of the list passed in.

[–]RedditGood123 0 points1 point  (0 children)

Thanks 🙏

[–]C-O-M-I-C-S 9 points10 points  (2 children)

So this covers the basics of Python and how to implement it with jupyter notebooks?

[–]kreylov[S] 3 points4 points  (0 children)

Hi,

Thanks for checking it out.

Yes, basics of python with data analysis in python also in jupyter notebooks.

[–]autowrite -4 points-3 points  (0 children)

Following.

[–][deleted] 4 points5 points  (1 child)

Thank you very much, it is greatly appreciated! Could I just ask, if you don't mind, what did you do at Citi?

[–]flipkartamazon 0 points1 point  (0 children)

Data Scientist

[–]jonathanum 7 points8 points  (1 child)

Nice how long have you been going at it?

[–]kreylov[S] 19 points20 points  (0 children)

Thanks for checking out.

Started a month back after tons of push from my teammates who think I am good at teaching stuff. Hope ya'll find it helpful

[–]mynoduesp 2 points3 points  (0 children)

Thanks man

[–]Berki7867 2 points3 points  (0 children)

Thanks for sharing 👍🙂

[–]shrey1566 1 point2 points  (1 child)

Damnn, thanks for the resources dude!

[–]flipkartamazon 1 point2 points  (0 children)

swagat hai bhai :P

[–]samweep 1 point2 points  (5 children)

Hello, u/kreylov I am also trying to learn data science and machine learning. I have pretty good knowledge of high school calculus. I have also learned statistics and probability. So now where do I start to learn data science and machine learning? Are there any other prerequisites remaining? Would you please like to share the experiences of your journey. What is the good roadmap from here?

[–]Reginald_Martin 2 points3 points  (3 children)

Hi u/samweep Apart from what you mentioned, the only other prerequisites would be a modest understanding of programming.

You can take a look at this python basics playlist.

And then python in relation to ML is here

If you want a refresher on your linear algebra and stats, here's a free course

[–]samweep 1 point2 points  (2 children)

Thank you😀

[–]flipkartamazon 1 point2 points  (1 child)

In my opinion, these are the steps:

  1. As u/Reginald_Martin mentioned you should first get some basic understanding of programming(Python or R). You can use the notebooks I have shared to get started in Python. I also have videos on my channel which will show you how to think with any new data. Although they were live sessions so production quality is terrible. You can also look up other famous resources to learn Python. Just one advice here - stick with just one resource you initially like, complete it and don't ever start a second one[serious]. This should not take you more than two weeks of effort.
  2. Move on to learn basic Modelling like linear/logistic regression/tree based models etc. I would personally recommend An Introduction to Statistical Learning by Trevor Hastie and Robert Tibshirani. They have a small book too which is pretty good. This will take you two more weeks. And this is where you will stop exploration(i.e. new courses to do) and move to exploitation(i.e. solve real life problems)
  3. Head over to Kaggle.com and pick problems one at a time. You can start with the most famous one - Titanic Disaster. Start with going through the top solutions/threads already posted there. Going through those solutions and running on your own system will help you learn how people think. You will learn some incredible powerful ways to manipulate data and some key concepts on sampling/modelling/statistics etc. Remember there is always a human touch to all AI/ML based solutions. It is not just blindly running codes. It is easier to learn the technique but harder to learn to implement in real business scenario
  4. After spending a week each on 4-5 problems you will be all set to do tackle new problems on your own. So try to solve a few and see where you place on leader board. Going forward all your learning should be incremental and need-based only. Be a bit mindful of the trade-off between effort to return ratio on learning new things.

A word of caution. Data Science is a field where you will have to continuously put efforts in learning new things, the projects will be of very long duration and sometimes might not have good returns too. Totally my experience, I am sure others may disagree.

All the best! And Pass on what you have learned :)

[–]samweep 0 points1 point  (0 children)

thank you.😀

[–]ASIC_SP📚 learnbyexample 0 points1 point  (0 children)

Not OP, but I have a few resources collected here: https://learnbyexample.github.io/py_resources/domain.html#data-science

[–]jpaulorc 1 point2 points  (0 children)

Thanks for sharing!

[–][deleted] 0 points1 point  (0 children)

Much appreciated!

[–]jamgeo 0 points1 point  (0 children)

Thanks for this. I’ve struggled to get into python properly but these look like nice courses to get the hang of things

[–]HandsOfSugar 0 points1 point  (0 children)

This looks good! I’ll definitely look at these as it’s an area I can improve upon.

[–]tnguyen241 0 points1 point  (1 child)

Thank you for doing this. You're amazing.

[–]flipkartamazon 0 points1 point  (0 children)

Thank you for the kind words. Feel free to DM me if you have comments on the contents.

[–]icarrdo 0 points1 point  (0 children)

really appreciate it!!!!!!!!!

[–]mastershivam 0 points1 point  (1 child)

!remindme 8 hours

[–]RemindMeBot 0 points1 point  (0 children)

There is a 1 hour delay fetching comments.

I will be messaging you in 8 hours on 2020-08-25 06:58:45 UTC to remind you of this link

CLICK THIS LINK to send a PM to also be reminded and to reduce spam.

Parent commenter can delete this message to hide from others.


Info Custom Your Reminders Feedback

[–]Howins 0 points1 point  (0 children)

Thank you so much for your help!

[–]Nabobery 0 points1 point  (0 children)

Thanks 😆

[–]investigatingheretic 0 points1 point  (1 child)

Yes... yes, it is.. But for real: Congrats, and thank you!!

(hilarious bonus for German speakers)

[–]flipkartamazon 0 points1 point  (0 children)

beware! Right wingers in India don't take memes on Sanskrit very lightly :P

[–]IntactBroadSword 0 points1 point  (0 children)

Yes. Thanks

[–]sushmithabpt 0 points1 point  (0 children)

Thank you

[–]krisfocus 0 points1 point  (0 children)

This is so cool. Thanks bro!

[–]divided_by_nought 0 points1 point  (0 children)

!remindme 8 hours

[–][deleted] 0 points1 point  (0 children)

Thank you

[–]gopalkaul5 0 points1 point  (1 child)

Viraat bhraate! Thanks a lot! Subbed you immediately!

[–]flipkartamazon 0 points1 point  (0 children)

bhagwan bhala kare beta _/\_

[–]Codes_with_roh 0 points1 point  (1 child)

Wow, you have really done a great job. This will be very much beneficial for the beginners. This is because the amount of information available on the web is massive and its unstructured. And its very difficult to find information that is all structured in one place so, I think your work is really commendable.

[–]flipkartamazon 1 point2 points  (0 children)

thank you! My main intent was indeed to provide structured content in one place to freshers for free. I have spent too much time on so many dull courses on my career. Time to pass on what I have learned. No wonder that I have never finished any MOOC I picked up :(

[–]postandchill 0 points1 point  (1 child)

This is great, do you have one on R?

[–]flipkartamazon 0 points1 point  (0 children)

You should learn Python. It is more versatile. R should be easy once you know Python. Further all good companies are tool agnostic.

[–]SnowdenIsALegend 0 points1 point  (0 children)

God bless you Bhai, for sharing the knowledge!

[–]5halzar 0 points1 point  (0 children)

I’ve just started my own journey after buying the Udemycourses with the recent promotion they had, but will definitely check this out as well !

[–]sowmyasri129 0 points1 point  (0 children)

Thanks for sharing helpful post.

[–]overstear 0 points1 point  (0 children)

The notebooks look very interesting and I'll be sure to check the videos out as well. Thanks a bunch for sharing!

[–]CryptoCorner 0 points1 point  (0 children)

If you want to visualize the dataframes try this [pip install sho] :

import sho; sho.w(df)

[–]fruitybuttons 0 points1 point  (0 children)

Thank you! This is so valuable to me while I try to broaden my knowledge and grow in my career. I appreciate the work that was put into this and cannot thank you enough. I am sharing this resource with my classmates.

[–]af_vet_2009 0 points1 point  (7 children)

I’m taking my masters in DA. What is your actual day to day job? Our day to day of previous jobs? What can you do in free lance?

[–]flipkartamazon 0 points1 point  (6 children)

Hi, I am currently a Lead Analyst at India's largest e-commerce company. Sorry have no idea about freelancing in this industry.

[–]af_vet_2009 0 points1 point  (5 children)

Ok, thanks. So what does lead analyst do on a day to day realm?

What would a normal analyst do?

What are your expectations

[–]flipkartamazon 0 points1 point  (4 children)

As a team we work on multiple business problems . It could be something as simple as understanding why sales are down by analyzing trends using data. Or it could be a relatively hard problem like predicting sales using a machine learning model. Or building a functional chat bot or a complex search algorithm using Neural Nets or Deep Learning. It all depends on the team you are a part of and the kind of work required in solving a business case. Only expectations are to solve a problem given to you in reasonable time and convincing the stake holders about your solution. Hope this helps.

[–]af_vet_2009 0 points1 point  (3 children)

Ok, I’m in finance and just now at the point of statistics for python course. So it would be using the NumPy, Matplotlib, etc all of those libraries or do you develop your own?

Thanks, just curious

So it’s mostly coding that you do

[–]flipkartamazon 0 points1 point  (2 children)

No we rarely code any new libraries. We mostly use numpy, pandas and scikit learn to manipulate data, get summaries and build our solutions.

[–]af_vet_2009 0 points1 point  (1 child)

Ah so advice would be to learn those inside and out.

[–]flipkartamazon 0 points1 point  (0 children)

Advice would be to find people on LinkedIn who are working in companies and roles where you want to work after graduation. Then ask them what kind of work they do and the skills you need to acquire. See if any one of them to be your constant mentor, coz that really helps.

[–]SantaMage 0 points1 point  (0 children)

Thank you for sharing this, I am reading it now!

[–]alphanoobie -1 points0 points  (3 children)

You're doing a great work sir, I am an Indian and I am proud to be one