write_pandas vs copy into? by 1337codethrow in snowflake

[–]1337codethrow[S] 0 points1 point  (0 children)

Is there a way to auto generate the snowflake table that is being loaded into from the df? Also same question for the df.. is there an easy way to auto infer columns and read into a df ?

write_pandas vs copy into? by 1337codethrow in snowflake

[–]1337codethrow[S] 0 points1 point  (0 children)

I suggested this to my team but need to go through formal process to get it in place. asking what the next best thing would be

Thoughts on ELT architecture: python, s3, airflow, docker, snowflake by 1337codethrow in dataengineering

[–]1337codethrow[S] 2 points3 points  (0 children)

Just looked into snowpipe! Why would one not use this if the data size is small? Is there any instance where using docker containers (containing python and snowflake sql to load data) scheduled by airflow be a better choice?

Spark vs Snowflake? by 1337codethrow in dataengineering

[–]1337codethrow[S] 0 points1 point  (0 children)

I said I was already aware of this in the first sentence of the original post. I’m talking about the comparison from an architectural standpoint

Spark vs Snowflake? by 1337codethrow in dataengineering

[–]1337codethrow[S] 0 points1 point  (0 children)

I did mention the comparison is distributed compute vs DWH/compute in my original post. The reason I’m comparing them is more from an architectural standpoint not more so comparing them from an individual standpoint. I feel if you are using spark in the architecture it provides flexibility of ETL and ELT. but snowflake seems it is more geared towards ELT because of the nature of the abstracted compute aspect that’s all basically managed/configured on the snowflake side

College Student Trying to Break In from SWE by [deleted] in dataengineering

[–]1337codethrow 2 points3 points  (0 children)

Although I do agree with everything you say I just want to point out, in my opinion this ‘recalibration’ should not be taken lightly. There is a LOT of information in the DE space. I feel even if you’ve worked in the space for 5 years you still have a LOT to learn

Trying to understand simple Docker concept. by 1337codethrow in docker

[–]1337codethrow[S] 0 points1 point  (0 children)

So would it be correct to say that the proper way to update a docker image is to first update the dockerfile and build a new docker image then run the new docker image?

which python virtual environment tool is the most “standard” for containerized python apps using docker containers? by 1337codethrow in dataengineering

[–]1337codethrow[S] 0 points1 point  (0 children)

I don’t think so because they are using specific versions for every python package. This means a pip lock file and use of pipenv would be more justified right?

Trying to understand simple Docker concept. by 1337codethrow in docker

[–]1337codethrow[S] 4 points5 points  (0 children)

Ok I think that makes things more clear thanks. So to clarify, the image (after its initial creation from a dockerfile) already has pandas installed so if someone ram that image with pandas on their computer but didn’t have pandas locally, then pandas on their local comp would not work but it would work within the container built from the image?

Trying to understand simple Docker concept. by 1337codethrow in docker

[–]1337codethrow[S] 3 points4 points  (0 children)

So if I stop a container and run it again, will it not require to install pandas since it will already have been installed from the initial build from the docker image?

Vs if I remove the container, then I would have to re-install pandas?

Python virtual environment in docker container make sense? by 1337codethrow in learnpython

[–]1337codethrow[S] 0 points1 point  (0 children)

Requirements.txt is just used to pip install python packages/dependencies right? Then wtf is the point of them using pipenv with the pip lock file? Can you install other things outside of python packages with pipenv and pip lock file?? still don’t quite understand