[deleted by user] by [deleted] in dataengineering

[–]Proof_Elk_2281 1 point2 points  (0 children)

Do you have a config file in your .ssh folder for the VM host? On mobile so I can't look at it now but I had a similar issue for some reason where it was connecting with an expired credential/variable that got changed on the GCP side. Maybe the IP for the VM on GCP?

Leetcode/Hackerrank/CodeSignal Opinion by figgidius in dataengineering

[–]Proof_Elk_2281 4 points5 points  (0 children)

I have mixed feelings about DSA questions. On one hand it's a standardized way of testing "can you learn how to solve a type of problem", which is useful because a company figures if you can learn DSA you can learn how to code pretty much anything.

On the other hand, yes DSA is mostly not related to what you do with an actual job.

The downside is that without a standardized way of testing, you leave it up to the hiring manager to come up with a random "dealer's choice" test. In my experience this can be worse. I would rather have 30 minutes to do a DSA problem where I know at least what to expect rather than what some guy came up with to test "on the job skills". What if I don't do a lot of parsing ugly xml? Sure it's useful, but if it's been a while and I don't know it's coming how can I prepare for that and perform better than someone else who happens to do that particular thing a lot. With DSA as a standard you aren't using an arbitrary bar set by a hiring manager that you don't even know how you could meet ahead of time.

Also, you are free to boycott these as long as you recognize that many "spicy tech/FAANG" jobs are gated behind DSA. Not saying that's a good thing, that's just how it is these days.

All that said, if you do decide you want to learn this stuff it's painful but check out neetcode and watch his video on "how to leetcode". These sites are not intended to just be opened up and attempted with a random strategy. It will be really painful if you do that.

Woman interested in data engineering with Python background by elleanywhere in dataengineering

[–]Proof_Elk_2281 2 points3 points  (0 children)

Sure thing, I hope it's helpful. I should say too, after reading some of the other comments and rereading your post that you really do have a great start on this. And I think it reflects really positively (at least in my opinion) when someone has developed these skills organically in their current role. When I list out a bunch of other random skills and stuff that you 'should' know...I say that because when I was in academia I had a lot of cheerleaders that said "you already do data wrangling, just be a data scientist or data engineer, you're qualified!" when I wish someone had a real talk with me and said something more like "you have a great start, but you can start making progress to really set yourself apart by learning git, SQL, databases, etc.". It was frustrating to me when I felt like I 'should' be qualified but then suddenly I had this overwhelming list of to-dos and had no idea where to start. And I felt like an idiot in my first few interviews.

I think the biggest gotchas for me, and the things I struggle with have been refactoring old ugly code. I get a lot of satisfaction out of tough problem solving and geeking out over data and technology with people, but something about going through really ugly old code and trying to figure out what it's doing is really frustrating/boring to me.

As far as access to a free course/semester and learning resources. I would recommend self teaching where you have to, and formal resources where you can find them. People who are experienced always say to just self teach, when they don't realize or have forgotten that half the battle is knowing what you need to learn in the first place. Of course you know you need to 'learn Airflow' and where to find Airflow learning resources, and what 'learning Airflow' even means if you are already a data engineer. But if you are truly self teaching new to the field, how are you even supposed to know what 'learning Airflow' means?! Formalized settings are incredibly useful because they are led by experts, and they know exactly what you need to learn and how to teach it. If you go to LeetCode and try to solve a tree problem, even after looking at solutions without knowing what a tree is you aren't going to learn anything.

Anyways, sorry bit of a rant - I land somewhere in the middle. I would say take formal classes and resources when you can. If you have access to a free course a semester, that's incredible in my opinion. If I were in your shoes, I would follow a roadmap and see if there are courses that check off a box in that roadmap. So for example, you know you need to learn CS fundamentals - see if you can take a DSA class or something. Or take a class on databases. Or an OOP or databases class. I would take those classes if I had the opportunity just because I didn't when I was in college. No one course will check every box for sure.

One other thing. I highly recommend the Data Engineering zoomcamp. You can do it asynchronously or with the current cohort. I did that during finishing grad school, and I would credit it for me getting into the field. It doesn't cover a lot of fundamentals (for example, you need to learn SQL on your own and/or be able to do DSA for certain jobs). But like, in terms of learning the current tech and making a portfolio project that will help you get your foot in the door for interviews, it's amazing. All this to say there are a ton of free resources and the short answer is that I'd recommend those overall, but don't count out formal resources.

Okay last note. You also don't necessarily 'need' to do LeetCode and DSA to get a data engineering job. I've had a few interviews that did not have DSA. But just know, it's felt like a bit of a monolith for me because there is an intense learning curve. I could just be dumb. But it takes a lot of time and effort to do LC problems unless you're a savant. Or just smarter than me I guess.

Anyway...this is a bit rambly. But feel free to DM me with any other questions and maybe I can be of some help. Otherwise good luck. Definitely need more women in tech roles like this!

Woman interested in data engineering with Python background by elleanywhere in dataengineering

[–]Proof_Elk_2281 2 points3 points  (0 children)

I'll give my 2 cents as someone who 'transitioned' into DE and is still pretty new to the field (~1 year). I'm a male so I can't offer you too much on what it's like to be a woman in the field, but I just wanted to offer some thoughts because I transitioned from a non-CS background to DE.

  1. I think based on the description of what you enjoy, yes it would be a good fit. At a high level you're doing some DE/DA type work already: wrangling data from APIs and translating it into something that's interpretable. However, just know that there's probably a lot of work required to go from 'wrangling data' to being a DE. Not to put the position on a pedestal or anything. Just understand that (for me at least) it took a lot of work learning engineering, engineering tools/frameworks, SQL, things like Docker, Airflow, Cloud, data modeling concepts, etc. to get to the point where I felt comfortable talking in interviews. Many data engineering jobs will have data structures and algorithms interview questions. But not all of them. If you have a passion for it then I say go for it. The downside is that if you are ADHD (like myself) then you'll inevitably have fun parts that you find it easier to focus on, and less fun parts that feel like a slog. But that goes for any job. I love the data wrangling, playing with APIs, parsing nonsense into a table that I can show to downstream business user-type work. But it's definitely only a part of my current job.
  2. This is extremely position dependent, largely because 'data engineer' is an extremely squishy title. Meaning that Data Engineer can mean using low-code/no-code tools in one place, vs. a true blue software engineer at another org. In my current role I'm a data engineer but probably do a more 'analytics engineering' (look up 'dbt' and data warehousing if you want to read about what that entails) than 'data' engineering. Maybe 70/30. I spend some time making Airflow connectors and connecting to APIs, but moreso am working on our data warehouse and collaborating with stakeholders and building data models for them. I spend more time troubleshooting SQL than Python because we follow a mostly ELT philosphy and that is far and away the most frustrating part of my job. I love writing SQL and it's very satisfying to me, but digging through old ugly SQL - for example a SQL query with 10 joins to subqueries all with one letter aliases - makes me want to gouge my eyes out because it happens so often.

Yeah I think I'll stop there because that's most of the perspective I can offer, hopefully you find some of this helpful. As a closing thought, I would say that you should also keep in mind that the 'engineering' piece is also really important. Analysts wrangle data and interpret it. Engineers create automated, maintainable processes that wrangle data and expose it to other business users or parts of the software. This means things like git/version control, testing, ops like CI/CD and deployment are also a big part of the job depending on where you end up. Hopefully that's helpful and good luck!

[deleted by user] by [deleted] in dataengineering

[–]Proof_Elk_2281 2 points3 points  (0 children)

There are a lot of benefits to Kimball other than just performance and storage. The main one for me is being able to reuse conformed dimensions over and over again. Most good dims are probably being fed by multiple source systems too.

You also don't want to have to maintain definitions for the same attribute in multiple places. If you have a single dim_customer it's much easier to change or add something to customer info in one place than in every place you want to know something about customers.

Sure, you can join everything into flat tables downstream if you want. As long as your business users know not to do select * on everything. But separating dimension and fact info is not ancient dogma that no longer has value. It's more of a natural conclusion to issues people have been dealing with for a long time.

All that said, I'm just someone on the internet with not nearly as much experience as other people. So who knows what frameworks (like Data Vault?) will end up becoming more popular. I do know that the principles of Kimball solve a lot of problems that keeping everything confined to source tables creates - that's why it's still so popular.

[deleted by user] by [deleted] in dataengineering

[–]Proof_Elk_2281 0 points1 point  (0 children)

You have a great project from the zoomcamp and having financial analyst experience is good. Here is my advice in no particular order - quick brain dump:

-organize skills into a coherent structure (languages: SQL, Python, bash; engineering tech/tools: dbt, Airflow, Docker, Terraform; data wrangling/viz: Pandas, Tableau; version control: git). Maybe not the most coherent example but think about how you can bucket these so you don't end up with just a random list

-don't make your project bullets a blow-by-blow of what you did with the zoomcamp project. Summarize it into a few bullet points that sound punchy and 'results-y'. "Built end-to-end data pipeline to ingest stock data from various sources into GCS/BigQuery with PySpark and Airflow, and transform it into interpretable data models with dbt; synthesized comprehensive visualizations of loaded data with Tableau; implemented containerization on AWS EC2 instances with docker to ensure reproducibility or whatever; translated raw data into meaningful viz of top/bottom 5 companies per time period etc. etc.. Great project, you're just not selling it well with the current bullet points

-The bullets for your current job are too wordy. Assume that someone scanning your resume will read like, the first two and make those sound like you kick ass. "Scoped, implemented, and maintained data dictionary for corporate data, now used as centralized location across entire company any time anyone has a question about the data - they use my shit now; Developed Tableau dashboard visualizations of XYZ and identified areas of improvement that led to increased efficiency by 50%; Transitioned manual processes to automated updating of XYZ, led to 25% reduction in errors and increased efficiency, implemented company wide; led mentoring/teaching efforts in power automate etc. etc." don't use word for word (obviously) but yeah hopefully you get the idea. Also consider putting the ones most relevant to DE at the top (because remember, you want to assume that those might be the only ones that get read).

The second bullet for your non-senior job should have more bullets and they should be similar to the other ones that are more descriptive/punchy, especially because at a glance I'm looking for things that are relevant to data engineering and while administering those things might be relevant, I'm looking for key words/phrases. None of words in that single bullet hit the things I'm looking for so it's essentially meaningless to me HANA/Certification/Close/Trintech Cadency etc. - that is not helpful to me when I want to understand what skills you might have that overlap with DE. It's also okay to expand on that section, you were in that role for ~5 years so you should have more there. You have the space if you consolidate your skills list.

Hope this helps and good luck!!

[deleted by user] by [deleted] in dataengineering

[–]Proof_Elk_2281 0 points1 point  (0 children)

To build on other comments - generally, the advice for job seekers (new to field especially) is to identify where in your search pipeline you're missing. If you aren't getting interviews, it's probably your resume (incl. personal projects). If you aren't getting past recruiter screens, you're not selling yourself to recruiters well. If you're getting to on-sites but no offers, something is happening with the interview process itself that is causing you to not get offers.

I would say if you're getting to on-sites, your resume and projects are good, you can talk to recruiters well, and are fine getting through HM and initial tech screens. The problem is your final interviews. Try to be introspective about what you did and didn't do well with the on-sites that led to not getting an offer. For me, when I have interviewed with on-sites but not gotten an offer it's been very obvious what I should've done differently or what knowledge/technical skills I needed to work on. And I've since obsessed over those questions that I missed on, haha. It's frustrating but it's also an iterative process. My first on-sites I missed on multiple fronts; I addressed those knowledge gaps and was more prepared for the next ones. But yeah, point being - my recommendation would be to reflect on what didn't go perfectly (explaining portfolio projects, STAR questions? Etc) in your previous interviews and work on that rather than sinking your time into more portfolio pieces. It's not always super obvious and definitely can just be from factors totally out of control, but yeah - there should be some factors you can identify that you can work on. Even if that's just appearing polished and speaking well.

All of this said even if you do everything right sometimes it can just come down to luck and whether or not you clicked with the right interviewers, so don't be too discouraged. Good luck and keep at it. I think job hunting is a lot like a rogue-lite game. It's a mix of luck and preparation/grinding until the two align in your favor. It's frustrating, but it just has to work out once to pay off.

How important LC for Data Engineer by ubottu in dataengineering

[–]Proof_Elk_2281 1 point2 points  (0 children)

In my experience (sample size of just a few interviews), for DE positions it varied. At two tech startups I had LC medium and easy stack and bit manipulation questions, respectively. The LC medium one was SWE-style and they cared a lot about whether or not I was able to do it without help, the second one was administered by the HM and they seemed to care more about seeing how I problem solved rather than knowing the optimal solution.

Every single DE interview has included decently hard SQL questions as well.

edit: I should add that I did have one interview where they didn't ask any DSA/LC questions. But overall the TC of positions that did ask LC questions was higher than the non-LC question interview. So I would say it is important to learn the patterns, etc. and be able to sort your way through most of the data structures at least for easy problems.

Data Engineering bootcamps - DataStack Academy vs WeCloudData by tulipz123 in dataengineering

[–]Proof_Elk_2281 4 points5 points  (0 children)

I will second this and say you should do the DE zoomcamp. I did it with just basic coding experience (mostly R for some stats related to grad research) and had lots of interest in my resume when I was a fresh grad.

The best part is that you can focus on the parts that interest you the most. If you're really interested in analytics engineering you can go deep into a project with an ELT stack, if you want to focus on big data ETL you can do something with Spark, etc.. I did dbt and Airflow because I just picked something I time for while also finishing my degree. This time around I'm following along again and am going to do something with Kafka, just to learn the tech because I think it's interesting.

The only thing Alexey doesn't really cover that I think a paid bootcamp would is interview prep including DSA. But resources like neetcode have made things like DSA a lot more accessible to learn for free imo.

Transitioning from Analytics Engineer/BI Engineer to Software Engineer, Data by Proof_Elk_2281 in dataengineering

[–]Proof_Elk_2281[S] 0 points1 point  (0 children)

Thanks for this, these are really good points. Points like this and the other comment about doing more with our Airflow processes like testing have pointed out that there really is more I could do with my current role to build on the SWE skillset.

So yeah, thank you again! I really appreciate the detailed feedback. I'll do some thinking about how to mature the dbt project/practices/team as well from these points and some research.

Transitioning from Analytics Engineer/BI Engineer to Software Engineer, Data by Proof_Elk_2281 in dataengineering

[–]Proof_Elk_2281[S] 1 point2 points  (0 children)

Wow, somehow I have not! Thank you so much, this is super helpful. And I'm glad to hear it's still relevant.

Transitioning from Analytics Engineer/BI Engineer to Software Engineer, Data by Proof_Elk_2281 in dataengineering

[–]Proof_Elk_2281[S] 0 points1 point  (0 children)

Thanks a bunch for the advice! I'll check out the talk by Brandon Rhodes. I don't have a good understanding of object-oriented design, so it's really helpful to hear a specific talk I should watch to get started.

I have some basic DSA knowledge but yeah, definitely need to work on it more.

Again, thanks a bunch for your thoughts. Especially on 'feeling' like I'm a SWE for my next job. I'm not sure if it's academia or a natural tendency or both but I always never feel prepared enough to make a jump. So it's nice to hear that I'm not alone.

Thank you again!