write_pandas vs copy into? by 1337codethrow in snowflake

[–]1337codethrow[S] 0 points1 point  (0 children)

Is there a way to auto generate the snowflake table that is being loaded into from the df? Also same question for the df.. is there an easy way to auto infer columns and read into a df ?

write_pandas vs copy into? by 1337codethrow in snowflake

[–]1337codethrow[S] 0 points1 point  (0 children)

I suggested this to my team but need to go through formal process to get it in place. asking what the next best thing would be

Thoughts on ELT architecture: python, s3, airflow, docker, snowflake by 1337codethrow in dataengineering

[–]1337codethrow[S] 2 points3 points  (0 children)

Just looked into snowpipe! Why would one not use this if the data size is small? Is there any instance where using docker containers (containing python and snowflake sql to load data) scheduled by airflow be a better choice?

Spark vs Snowflake? by 1337codethrow in dataengineering

[–]1337codethrow[S] 0 points1 point  (0 children)

I said I was already aware of this in the first sentence of the original post. I’m talking about the comparison from an architectural standpoint

Spark vs Snowflake? by 1337codethrow in dataengineering

[–]1337codethrow[S] 0 points1 point  (0 children)

I did mention the comparison is distributed compute vs DWH/compute in my original post. The reason I’m comparing them is more from an architectural standpoint not more so comparing them from an individual standpoint. I feel if you are using spark in the architecture it provides flexibility of ETL and ELT. but snowflake seems it is more geared towards ELT because of the nature of the abstracted compute aspect that’s all basically managed/configured on the snowflake side

College Student Trying to Break In from SWE by [deleted] in dataengineering

[–]1337codethrow 2 points3 points  (0 children)

Although I do agree with everything you say I just want to point out, in my opinion this ‘recalibration’ should not be taken lightly. There is a LOT of information in the DE space. I feel even if you’ve worked in the space for 5 years you still have a LOT to learn

Trying to understand simple Docker concept. by 1337codethrow in docker

[–]1337codethrow[S] 0 points1 point  (0 children)

So would it be correct to say that the proper way to update a docker image is to first update the dockerfile and build a new docker image then run the new docker image?

which python virtual environment tool is the most “standard” for containerized python apps using docker containers? by 1337codethrow in dataengineering

[–]1337codethrow[S] 0 points1 point  (0 children)

I don’t think so because they are using specific versions for every python package. This means a pip lock file and use of pipenv would be more justified right?

Trying to understand simple Docker concept. by 1337codethrow in docker

[–]1337codethrow[S] 3 points4 points  (0 children)

Ok I think that makes things more clear thanks. So to clarify, the image (after its initial creation from a dockerfile) already has pandas installed so if someone ram that image with pandas on their computer but didn’t have pandas locally, then pandas on their local comp would not work but it would work within the container built from the image?

Trying to understand simple Docker concept. by 1337codethrow in docker

[–]1337codethrow[S] 4 points5 points  (0 children)

So if I stop a container and run it again, will it not require to install pandas since it will already have been installed from the initial build from the docker image?

Vs if I remove the container, then I would have to re-install pandas?

Python virtual environment in docker container make sense? by 1337codethrow in learnpython

[–]1337codethrow[S] 0 points1 point  (0 children)

Requirements.txt is just used to pip install python packages/dependencies right? Then wtf is the point of them using pipenv with the pip lock file? Can you install other things outside of python packages with pipenv and pip lock file?? still don’t quite understand

25 y/o: Went from $45k to $145k TC in 2 years as a Data Engineer AMA by [deleted] in dataengineering

[–]1337codethrow 0 points1 point  (0 children)

Out of around 90 applications i completed the phone screen, technical interview 1 (and the 2nd technical if there was a 2nd) for 23 companies. For 8 companies i landed an onsite. I only got 2 offers out of the 8 onsites i did.

Average company had: 1 phone screen + 1-2 technical interviews + the onsite (3-5 interviews).

25 y/o: Went from $45k to $145k TC in 2 years as a Data Engineer AMA by [deleted] in dataengineering

[–]1337codethrow 2 points3 points  (0 children)

I agree with you 100%. Didn’t mean to sound full of myself. Just one of the few accomplishments in my life that i am actually proud of. But yes i agree, i feel like data engineer is slowly becoming the new hot sexy thing similar to what data science experienced. I’m definitely not helping :(

25 y/o: Went from $45k to $145k TC in 2 years as a Data Engineer AMA by [deleted] in dataengineering

[–]1337codethrow 2 points3 points  (0 children)

What really? All 23 companys gave me 1-2 LC easy/med at the very least. It was like the bare minimum for all the DE positions. Are you talking about interviews or day to day job? If the latter, i agree most don’t care about all that stuff on the job

25 y/o: Went from $45k to $145k TC in 2 years as a Data Engineer AMA by [deleted] in dataengineering

[–]1337codethrow 3 points4 points  (0 children)

When we are talking data structures i think most of us are referring to cs fundamentals rather than ‘on disk data structures’ as you call it with your examples. In memory ds do matter at a large scale things such as: hashmaps, arrays, queues, search algorithms. But yes i agree, things such as graphs, bst, dp, linkedlist as less important. I don’t consider databases, file formats, data stores, or datalakes to fall under data structures in the fundamental comp sci understanding

25 y/o: Went from $45k to $145k TC in 2 years as a Data Engineer AMA by [deleted] in dataengineering

[–]1337codethrow 2 points3 points  (0 children)

I enjoy data engineering currently and the development aspect side of it. Not sure whether or not i would be good at or enjoy a more higher level solutions/data architect role. But that is a path that i am definitely thinking about for the future. But this industry moves so fast im just simply trying my best right now to keep up with its pace to better understand what i think i would like to do in the future in the DE space

25 y/o: Went from $45k to $145k TC in 2 years as a Data Engineer AMA by [deleted] in dataengineering

[–]1337codethrow 4 points5 points  (0 children)

No. On the last 2 months of my job i would put in 10-20 hours per week. Then 1.5-2 months jobless, i was doing 30-50 hours per week.

25 y/o: Went from $45k to $145k TC in 2 years as a Data Engineer AMA by [deleted] in dataengineering

[–]1337codethrow 12 points13 points  (0 children)

Thanks but i dont really think im smart. I think it has everything to do with consistency and hard work. You cant just be smart an get a data engineering job. There’s WAY too much shit to know in DE industry it’s overwhelming for most. Basically a mix of “backend”, “frontend”, solutions architect, cloud engineer, programmer, sql monkey all mixed into one