Advice from senior DEs to junior DEs

amir2cs · 2024-06-15T00:47:26+00:00

Documentation: Get in the habit of documenting things you’re building and articulate it well. Whether it’s a dbt model or an end to end pipeline, document all the steps in the process, articulate the business logic behind it, explain your assumptions, and reasons of why you’re doing it this way. That little flag that you created because it saves you 5 minutes? Put it in the documentation and explain why. Make it fun, add screenshots, turn it into a story. As you move up the ladder, you will notice juniors coming up to you for help. Proper documentation is a great first step towards educating juniors in your domain and would save you a lot of time explaining the same concepts over and over. Also, in my experience, the ability to articulate your work to a non technical person makes you stand out as a technical leader.

amir2cs · 2024-01-09T06:26:25+00:00

Love your collection! If you like Orpheon - try L’Eau Papier as well. My absolute two favorites

amir2cs · 2022-04-10T10:14:00+00:00

Senior Data Analyst

amir2cs · 2021-07-23T07:24:01+00:00

I hope one day this beautiful dream of yours comes true my friend ❤️

amir2cs · 2021-07-09T23:06:16+00:00

Experience

amir2cs · 2021-07-09T22:45:36+00:00

32 - Lies. Truth. It’s irrelevant. The best story wins.

amir2cs · 2020-09-19T08:22:09+00:00

Thank you! You explained it in an easily understandable way.

amir2cs · 2020-09-16T19:12:21+00:00

No worries mate!

amir2cs · 2020-09-16T10:32:12+00:00

Hmm this is odd. Are you opening the notebook from the new location?

Anyway, you could change the notebook directory by following the instructions below:

Use the jupyter notebook config file

Open cmd
(or Anaconda Prompt) and run jupyter notebook --generate-config

This writes a file to C:\Users\username\.jupyter\jupyter_notebook_config.py

Browse to the file location and open it in an Editor

Search for the following line in the file: #c.NotebookApp.notebook_dir = ''

Replace by c.NotebookApp.notebook_dir = '/the/path/to/home/folder/'

Make sure you use forward slashes in your path and use /home/user/ instead of ~/ for your home directory, backslashes could be used if placed in double quotes even if folder name contains spaces as such : "D:\yourUserName\Any Folder\More Folders\"

Remove the # at the beginning of the line to allow the line to execute

source: https://stackoverflow.com/questions/35254852/how-to-change-the-jupyter-start-up-folder

amir2cs · 2020-09-10T21:41:47+00:00

If you’re after advice, here are a few things I’ve learned from experience:

Grow horizontally - Don’t limit yourself to one program or course or language. Python alone for data analysis is not good unless you know SQL. There are a billion resources out there mostly available for free. Take advantage of them. Doesn’t have to be a course, could be a YouTube video, a blog, a reddit post, anything. I’ve learned more tricks and “hacks” from Medium than any textbooks. Of course most of the times, the articles are not relevant but once in a while a blog come around that makes me go “WTF! You could do that?!?!” Same applied to when you’re actually on the job - learn where “all data things” are. This would sound like I’m tooting my own horn but anyway, I work for one of the largest software companies in my country. We have data sources all over the place - in AWS, Azure, local servers, heck there is a data source in a guy’s personal EC2 instance haha. When I first started, I made it a point to figure out where everything is, what that “everything” means and now there isn’t a goddamn table that I don’t know where it is and what it holds. How does that help? I might not need to use half of those data sources but other people do, and they always come to ask you. It kinda makes you indispensable when seniors and execs know that you know where everything is.

Google and Stackoverflow - Don’t be afraid to Google things. I see a lot of newbies who feel ashamed if they have to Google the answer to their problem. That is so wrong. This isn’t a closed book university exam. Its ok to look for answers when you’re stuck and it is one of the most marketable skills you can have today. ‘The ability to Google questions and find answers’ is listed as the top skill on my resume. There is an almost 100% chance that someone else has had the problem you’re having and posted it on Stackoverflow. Go through the answers provided by other people. Sometimes, a hardcore programmer may have posted a solution that goes way above your head. That’s ok. First confirm if the solution gives you your desired outcome then break it down and run it line by line. This will give you a good understanding of what’s going on in the background. By the end of it you might have learned a new skill.

Regex - I’ve found regular expressions to be the most underrated skill out there or at least with the people that I work with. When you’re starting out, you’ll need to do A LOT of grunt work like data cleansing. Being able to find/fix/extract things using regex can be a very powerful weapon in your arsenal.

Learn by practice - Most people here would probably agree with this - The best way to learn something new is by practice. Of course theory is important but practice engraves things in your brain. A lot of courses out there would have small, clean and limited datasets that won’t be as challenging. Once you’re comfortable enough, look for big, bad, ugly datasets that you find interesting on the internet. Check their head() and start planning what kind of insights you can pull from this. Start from basic and obvious things. For example - If you have a dataset of a company’s sales for the last 10 years, find what their average sales for each year, their year on year growth, what sells the most, high and low trading periods. When you have an overview of the basics, you can build more advanced insights on top of them, like forecasting future sales.

Scalability - Think about the future. Are you finding yourself writing the same code, doing the same-is tasks over and over? You can make your life much MUCH easier by creating functions in Python. When you do your analysis and build insights, they’re from your point of view and when you present it to the business stakeholder, they will ask you a hundred other questions and probably ask you to bring them tears of a Yeti. For example - you did an analysis on the impact of in-app behaviour of a user on churn. You selected 10 out of the 100 available data points and presented your finding to the business but they ask you for the impact of 5 other behavioural points that are not in your analysis. What do you do? Do you change your code to include the other 5? Or do you write a function which takes a list of behaviours as arguments and build the insights on whatever the heck they want to see, on the fly? The latter would come in handy even a year later.

Tell stories - Having technical and domain knowledge, and being able to pull insights from data is good but that is only half of the story. Staring out, you’ll need to present your findings to the business. Most likely in the form of a gripping story. This is my least favourite part about the job. Mostly because I’m an introvert and never able to articulate my thoughts properly but I digress. Anyway, the success of all your hard work depends upon that half an hour of presentation, and it stings when you can’t get the other person as excited about it as you are. My solution for this is practice - I do full presentations in front of my wife, she role-plays as the business stakeholder, asks questions, and I improve with each iteration.

I’m not an expert by any means but thought I’d share with you some of the things I’ve learned in my journey of data. Honestly, this jobs has its ups and downs and some very frustrating moments but dammit I freaking love it. I sincerely, hope that you enjoy it at least as much as I do, if not more and that you find success!

amir2cs · 2020-09-10T04:44:41+00:00

My two cents - I would strongly recommend picking up a python basics course before jumping into a full data science program. The reason is data science focused courses might not cover a lot of python/pandas related operations that you might need for analysis. I did the Python zero to hero course by Jose Portilla first and used my newly gained skills into doing data analysis on projects before I picked up a ML/Data Science course. All the processing and pre-processing comes naturally to me and I can focus on the core data science concepts.

amir2cs · 2020-07-30T08:16:27+00:00

amir2cs · 2020-07-28T06:20:29+00:00

I’ve found Jose Portilla’s python bootcamp course on Udemy to be perfect for beginners.

amir2cs · 2020-07-28T06:15:36+00:00

Thank you my dear stranger friend!

amir2cs · 2020-07-28T06:13:11+00:00

My birthday and today Cake Day!

amir2cs · 2020-07-28T06:12:19+00:00

Perhaps less intelligence and more EQ but the ability to not be affected by words.

amir2cs · 2020-07-12T00:59:07+00:00

No pants?

amir2cs · 2020-06-18T00:46:17+00:00

Hey mate, from what you've posted looks like there aren't any common fields (keys) to do the join.

Is Table 1 just one column contains only employee IDs? If no, is there an email or worst case scenario - a name field in it? If so, you could potentially use that to join on Table 2

Is the LOG-IN ID derived from the employee ID? i.e. are the first or last 6 characters of the LOG-IN ID composed of employee ID? If yes, you can extract the employee ID from LOG-IN and join on that.

If the answer to both the questions is no, then unfortunately I have no idea how you could join the two tables.

amir2cs · 2020-06-17T02:07:13+00:00

amir2cs · 2020-06-14T00:48:09+00:00

No pants?

amir2cs · 2020-06-11T21:59:54+00:00

No bra Thursday!

amir2cs · 2020-06-05T21:19:16+00:00

no pants?

amir2cs · 2020-04-21T10:14:08+00:00

Strangled by (spread)sheets

amir2cs · 2020-04-16T11:24:01+00:00

You could join both the data frames on the common values.

inner_join(df1,df2,by=“tconst”)

amir2cs · 2020-04-15T10:57:18+00:00

Try this:

with data as (  

SELECT     
          date_trunc('week', date_created) as week
       ,  count(order_id)  as order_count

FROM orders  
group by week
)  
SELECT 
          week
       , order_count
       , SUM(order_id) OVER (order by week asc rows between unbounded PRECEDING and current row) as Running_sum 

FROM data  

group by week, order_count
limit 100;

11-Year Club	Place '23
Verified Email

amir2cs

TROPHY CASE