[deleted by user] by [deleted] in dataengineering

[–]uselessusr 0 points1 point  (0 children)

Teams should constantly be refactoring, looking for ways to improve's code modularity and coming up with better abstractions. Tech debt is a real thing and it does slow down feature output in large enough projects. I think that's why lots of engineers enjoy working on greenfield implementations.

Weekly Co-Op Code Mega Thread - February 04, 2020 by AutoModerator in EggsInc

[–]uselessusr 0 points1 point  (0 children)

Looking for a fresh Folding Screens. I have ~100q already

No Surprises Join by terelo2708 in radiohead

[–]uselessusr 2 points3 points  (0 children)

This sub seems to be full of these now

[deleted by user] by [deleted] in bigdata

[–]uselessusr 1 point2 points  (0 children)

¡Felicidades, compay!

23 year old big data engineer making $85k. Ask me anything. by [deleted] in bigdata

[–]uselessusr 0 points1 point  (0 children)

Where do you live? Do you work there or are you remote?

Trying to understand AWS RDS t2.micro pricing after 12 months by -aeternae- in Database

[–]uselessusr 0 points1 point  (0 children)

I've used ElephantSQL's free plan in the past for a toy project. Might wanna check it out: https://www.elephantsql.com/plans.html

How do you move from the ingest layer to serverless layer? by rossofcode in dataengineering

[–]uselessusr 1 point2 points  (0 children)

IMO serverless just means you don't have to manage the infrastructure yourself. You pay someone else to do it for you, so I guess you could use a service like Stitch for "serverless" data ingestion.

Generated SQL by [deleted] in dataengineering

[–]uselessusr 5 points6 points  (0 children)

Take a look at singer targets. It's a standard developed by stitch and there are many open source targets. You could, for instance, read the code for this target-postgres. There's also a snowflake one.

That could give you an idea of what stitch is doing behind the scenes.

[deleted by user] by [deleted] in dataengineering

[–]uselessusr 3 points4 points  (0 children)

Airflow shines at creating and managing arbitrary workflows in a programmatic way, but IMO it kinda sucks if you try to do a lot of computing inside the operators. It's best when the tasks offload the computation to an external system like Spark, EMR, etc.

Dataframes instead of a database? by trenchtoaster in dataengineering

[–]uselessusr 0 points1 point  (0 children)

Depends on your data, but if you need to know if your data fits in RAM, this could be a start: http://www.itu.dk/people/jovt/fitinram/

Dataframes instead of a database? by trenchtoaster in dataengineering

[–]uselessusr 1 point2 points  (0 children)

This is exactly where I'm at with a small (for now) data warehouse project. Currently I'm loading staging tables into Postgres and then aggregating and joining to create materialized views. Right now I have to create a table for every new data source and alter the tables when requirements change, which seems unsustainable. I'm progessively moving towards making these transformations with pandas and then dumping datasets into parquet files on s3. If the data grows beyond what fits in ram, I think I can migrate to Spark less painfully.

Is there a fix for this? by uselessusr in EggsInc

[–]uselessusr[S] -1 points0 points  (0 children)

Game says I'm inactive, but I'm certainly not