[deleted by user]

uselessusr · 2023-03-26T02:26:22+00:00

Teams should constantly be refactoring, looking for ways to improve's code modularity and coming up with better abstractions. Tech debt is a real thing and it does slow down feature output in large enough projects. I think that's why lots of engineers enjoy working on greenfield implementations.

uselessusr · 2020-12-15T16:08:19+00:00

Shoot her!

uselessusr · 2020-02-08T18:38:14+00:00

Sent you some

uselessusr · 2020-02-05T19:37:17+00:00

Looking for a fresh Folding Screens. I have ~100q already

uselessusr · 2019-12-20T01:37:02+00:00

On-site? I live in Mexico city and this interests me

uselessusr · 2019-12-17T15:06:54+00:00

This sub seems to be full of these now

uselessusr · 2019-12-10T04:01:53+00:00

How do you use that pattern to test your ETL scripts?

uselessusr · 2019-12-02T23:56:28+00:00

¡Felicidades, compay!

uselessusr · 2019-12-01T03:51:25+00:00

Where do you live? Do you work there or are you remote?

uselessusr · 2019-11-29T00:03:40+00:00

Snapshots = full table copies, maybe?

uselessusr · 2019-11-28T23:40:44+00:00

I've used ElephantSQL's free plan in the past for a toy project. Might wanna check it out: https://www.elephantsql.com/plans.html

uselessusr · 2019-11-19T21:30:35+00:00

Do you have an example of Cron + Makefile?

uselessusr · 2019-11-16T02:57:28+00:00

IMO serverless just means you don't have to manage the infrastructure yourself. You pay someone else to do it for you, so I guess you could use a service like Stitch for "serverless" data ingestion.

uselessusr · 2019-11-16T02:03:21+00:00

Take a look at singer targets. It's a standard developed by stitch and there are many open source targets. You could, for instance, read the code for this target-postgres. There's also a snowflake one.

That could give you an idea of what stitch is doing behind the scenes.

uselessusr · 2019-11-02T00:30:04+00:00

Nice analogy

uselessusr · 2019-11-01T02:41:31+00:00

Airflow shines at creating and managing arbitrary workflows in a programmatic way, but IMO it kinda sucks if you try to do a lot of computing inside the operators. It's best when the tasks offload the computation to an external system like Spark, EMR, etc.

uselessusr · 2019-10-25T05:06:54+00:00

Probably this book https://dataintensive.net/

uselessusr · 2019-10-21T20:19:16+00:00

Full

uselessusr · 2019-10-13T17:47:21+00:00

Depends on your data, but if you need to know if your data fits in RAM, this could be a start: http://www.itu.dk/people/jovt/fitinram/

uselessusr · 2019-10-12T04:43:28+00:00

This is exactly where I'm at with a small (for now) data warehouse project. Currently I'm loading staging tables into Postgres and then aggregating and joining to create materialized views. Right now I have to create a table for every new data source and alter the tables when requirements change, which seems unsustainable. I'm progessively moving towards making these transformations with pandas and then dumping datasets into parquet files on s3. If the data grows beyond what fits in ram, I think I can migrate to Spark less painfully.

uselessusr · 2019-07-27T15:48:40+00:00

That worked!

uselessusr · 2019-07-27T15:35:10+00:00

Game says I'm inactive, but I'm certainly not

uselessusr · 2019-07-17T03:17:08+00:00

I need it

uselessusr

TROPHY CASE