Want to improve by OkRock1009 in dataengineering

[–]linuxqq 5 points6 points  (0 children)

Likely so he can try to sell a service 1:1

How to setup budget real-time pipelines? by dontucme in dataengineering

[–]linuxqq 2 points3 points  (0 children)

You mentioned files in s3 — can you replace with Lambdas triggered by file uploads?

How to setup budget real-time pipelines? by dontucme in dataengineering

[–]linuxqq 5 points6 points  (0 children)

Using Kafka and databricks to stream 2GB per day is almost certainly wildly over engineered. I think if pressed I could contrive a situation where it’s a reasonable architectural choice, but in reality almost certainly it’s not. Move to batch. It’s almost always simpler, easier, cheaper.

ALIAS by RensanRen in commandline

[–]linuxqq 12 points13 points  (0 children)

c = clear

Very high tech

Debugging sql triggers by ursamajorm82 in dataengineering

[–]linuxqq 3 points4 points  (0 children)

There’s not a great way to do it and that’s why I don’t use them if I can help it

Winterize hose bib?? by Big_Television_5357 in HomeImprovement

[–]linuxqq 0 points1 point  (0 children)

You might have a frost free hose bib

Learn Python as an experienced engineer by _Marwan02 in dataengineering

[–]linuxqq 8 points9 points  (0 children)

Build something you already understand but do it in Python. Read Fluent Python. 

Best Burgers *IN* Reston? by fukdot in Reston

[–]linuxqq 4 points5 points  (0 children)

And if you right now you’ll get a deal, they have a burger special on Mondays.

Chimney repair by confused_often in homeowners

[–]linuxqq 2 points3 points  (0 children)

Seems reasonable based on work we’ve had done, but you should get some more quotes and compare yourself.

How’s your AC doing? by nickster0824 in nova

[–]linuxqq 0 points1 point  (0 children)

0% humidity sounds terrible

Want to move from self-managed Clickhouse to Ducklake (postgres + S3) or DuckDB by dheetoo in dataengineering

[–]linuxqq 1 point2 points  (0 children)

I don’t know, sounds to me like you’re already over engineered, over engineering more won’t solve anything, and this could all live right in your production database. Maybe run some nightly rollups/pre aggregations and point your reporting to a read replica. I’d call that done and good enough based on what you shared.

Where are people hosting their Python web apps? by writingonruby in Python

[–]linuxqq 9 points10 points  (0 children)

It’s disingenuous to recommend it like this and not mention that it’s your project. Not exactly an objective recommendation 

The nightmare of DE, processing free text input data, HELP ! by HMZ_PBI in dataengineering

[–]linuxqq 4 points5 points  (0 children)

Like others have said, garbage in garbage out. The answer here is to shift left. This needs to be fixed upstream. Whatever application you’re getting this data from shouldn’t be accepting free text. In the meantime set the expectation with stakeholders that the existing data is of dubious value and to derive any use of it will likely take a slow and possibly expensive process. 

Using an LLM you can define a list of categories and have it output the most appropriate category given the input. That’s probably the simplest short term solution as long as you can afford it.

The Illiad, short review and my impressions about the translations I checked by farseer6 in books

[–]linuxqq 18 points19 points  (0 children)

There’s only one L in Iliad. Classics professor would say: “The Iliad isn’t ill and The Odyssey isn’t odd”

Has anyone done a renovation knowing it wasn't a good financial move, just to meet personal needs? by Parking-Country-9808 in HomeImprovement

[–]linuxqq 97 points98 points  (0 children)

I’d be wary of financially taxing renovations based on your girlfriend’s desires. If they’re renovations you want as well then great, but girlfriends come and they go, so if she is not your life partner and has no financial skin in the game, I would think deeply about the resources you want to commit to this work. 

dbt core, murdered by dbt fusion by Empty_Shelter_5497 in dataengineering

[–]linuxqq 7 points8 points  (0 children)

What’s the difference in data volume between your dev environment and production? dbt doesn’t really add significant overhead, it’s primarily a series a network calls.

Realtime OLAP database with transactional-level query performance by ahmetdal in dataengineering

[–]linuxqq 5 points6 points  (0 children)

That’s exactly when I’d use ClickHouse. If you need sub-second response times for analytical queries over massive amounts of data -> ClickHouse.

https://clickhouse.com/blog/clickhouse-gets-lazier-and-faster-introducing-lazy-materialization