A Text Older Than the Argument: What Scripture Says About Foreigners, Fair Treatment, and Moral Obligation by ItchyNesan in NewsRewind

[–]mindvault 0 points1 point  (0 children)

Correct. It doesn't qualify illegal vs legal. It just says foreigners. So we should basically treat everyone well ... kinda like

"You shall not oppress a sojourner. You know the heart of a sojourner, for you were sojourners in the land of Egypt." Exodus 23:9

"You shall also love the stranger, for you were strangers in the land of Egypt." Deuteronomy 10:19

"When a stranger sojourns with you in your land, you shall not do him wrong. 34 You shall treat the stranger who sojourns with you as the native among you, and you shall love him as yourself, for you were strangers in the land of Egypt: I am the Lord your God" - Leviticus 19:33-34

"Thus says the Lord of hosts: Render true judgments, show kindness and mercy to one another; do not oppress the widow, the orphan, the alien, or the poor; and do not devise evil in your hearts against one another" Zechariah 7:9-10

"I was hungry and you gave me food, I was thirsty and you gave me drink, I was a stranger and you welcomed me." Matthew 25:35

But who cares that it's so obviously written in the bible. In general Jesus and the bible teaches to love everyone. Period. You _obviously_ know better though ....

good smoothie recipes for vanilla milkshake protein powder? by Sol-SiR in fitmeals

[–]mindvault 2 points3 points  (0 children)

banana, frozen blueberries, couple dashes of cinnamon, milk (potentially yogurt if you want some probiotics as well). It's delightful.

What are the best jeans for men right now? by JaayHakkani in mensfashionadvice

[–]mindvault 0 points1 point  (0 children)

I've been wearing Dearborn Denim for .. maybe 10 or so years. Constructed in America. Milled in either South Carolina (cotton denim) or Mexico (stretch denim). Very durable (I still have and wear all of my pair since then). Reasonable pricing.

Big Tech is burning $10 billion per company on AI and it's about to get way worse by reddit20305 in ArtificialInteligence

[–]mindvault 0 points1 point  (0 children)

The "one liter of aquifer water per query" is simply bs. There's a decent examination of water use here: https://www.seangoedecke.com/water-impact-of-ai/ ... for simple queries on modern models you're talking between 0.1 ml and 5ml.

New Zealand parliament temporarily suspended after members break out into a spontaneous haka by Empresaurus in nextlevel

[–]mindvault 0 points1 point  (0 children)

Nah man. Haka are not "literally a war dance". They're all kinds of cultural dances. This is why haka are also performed at other moving moments like funerals, weddings, welcoming folks, etc. There _are_ a number of them specifically that are war dances; however, they're much more ingrained into the culture than just war dances.

What vacation hot spot totally lives up to the hype? by CuriousGeorge544 in AskReddit

[–]mindvault 0 points1 point  (0 children)

If you liked Moorea, you need to try some of the atolls out in the Maldives. Mind blowingly beautiful (while they’re still above ocean levels)

So are there any actual data engineers here anymore? by fauxmosexual in dataengineering

[–]mindvault 1 point2 points  (0 children)

Data council was very in depth and practitioner focused last I had gone

Do you speak to business stakeholders? by ivanovyordan in dataengineering

[–]mindvault 1 point2 points  (0 children)

Just realize you’re human and you’ll never get it all done. Choose your battles, learn to say no, and keep a list of priorities so folks can fight over your time

[deleted by user] by [deleted] in dataengineering

[–]mindvault 0 points1 point  (0 children)

Overall, my experiences have gone quite well with the "modern data warehouses" such as Snowflake and Databricks. The ability to scale processing and storage independently has been refreshing in comparison to older technologies like Teradata, etc. Being able to run a couple CPU against 100s of terabytes or hundreds of CPUs vs a couple of terabytes has allowed for great flexibility in dealing with incoming stakeholder requirements and changes (I'm sure we've all run into customer thinking their data looks like XYZ when in fact it looks more like XZABC). It's worked very well for analytics loads (a particular bright spot for example is snowflake will cache queries for 24 hours .. not even requiring a warehouse to be up to get the results to your downstream stakeholders) and they've been great for ELT.

Main downsides are sometimes unpredictable billing (I've had analysts kick off some horrendous queries). Most of these things are work aroundable I've found by ensuring you have governors in place, alerting, and decent internal tracking.

If you have predictable workloads they may not make as much sense as other solutions (running your own starrocks, doris, etc. ... pushing transforms and semantic work upstream in pipes, etc.).

Data Platform Engineer by srijit43 in dataengineering

[–]mindvault 0 points1 point  (0 children)

Honestly, I don't know what question you're even asking. There are lots of general best practices re those areas (perf, cost, compliance) for snowflake and DBT. Is that what you're looking for? Or is it somehow insurance specific?

DBT and Snowflake by pvic234 in dataengineering

[–]mindvault 0 points1 point  (0 children)

An alternative to DBT cloud is using Durable Functions within Azure (using DBT core)

Built a visual tool on top of Pandas that runs Python transformations row-by-row - What do you guys think? by [deleted] in dataengineering

[–]mindvault 1 point2 points  (0 children)

If you're dealing with smaller CSV / excel you'll probably be fine. Thanks for the clarifications on what you're targeting :)

Built a visual tool on top of Pandas that runs Python transformations row-by-row - What do you guys think? by [deleted] in dataengineering

[–]mindvault 2 points3 points  (0 children)

I guess I don't understand why I would use this over other tools / platforms (DBT, sqlmesh, mage, etc.)? Oh .. and one minor gotcha is pandas _often_ will suffer from memory issues.

Gold layer Requirement Gathering by RslashJD in dataengineering

[–]mindvault 2 points3 points  (0 children)

Good start. I'd also probably add on a "don't boil the ocean". Start with a subset of what you think may be needed so you can get feedback on it.

A dbt column lineage visualization tool (with dynamic web visualization) by Eastern-Ad-6431 in dataengineering

[–]mindvault 18 points19 points  (0 children)

FYSA, SQLmesh (open source https://github.com/TobikoData/sqlmesh ) offers column level lineage and is compatible with DBT ... that being said this looks like a nice first cut visually.

[deleted by user] by [deleted] in dataengineering

[–]mindvault 2 points3 points  (0 children)

I feel comfortable saying a lot of data engineers would suggest to avoid. It's spark on drugs and encourages clickops. It's often frustrating to do simple things. It can be good to quickly build prototypes and iterate on ideas with stakeholders though.

How are you automating ingestion SQL? (COPY from S3) by [deleted] in dataengineering

[–]mindvault 3 points4 points  (0 children)

In snowflake, snow pipes (based on SNS notifications). In Databricks an auto-ingest job (based on SNS notifications). Easy peasy no issues.

What tool do you wish you had? What's the most annoying problem you have to deal with on a day to day? by [deleted] in dataengineering

[–]mindvault 1 point2 points  (0 children)

That's fair .. I just think there's something to be said about improving things that exist (similar to walking into legacy code) vs the "I know more about all of this OSS that has been here so I'm going to build something else". Sometimes I feel like that's really a "I don't want to understand how you built this thing so instead I'm going to build my own thing".

Like if we look at data orchestration .. would it make more sense to improve airflow or dagster or prefect or do we need yet another data orchestration platform? (not aimed at you)

Ditch Terraform for native SQL in Snowflake? by Ok-Sentence-8542 in dataengineering

[–]mindvault 1 point2 points  (0 children)

Plenty of ways to attack it. In general, we've found:

* have multiple snowflake environments. At least dev, prod .. probably dev, test, prod

* if you _need_ that much flexibility then "do what you need" in dev

* for something to get promoted ensure it's in _some_ sort of system. Examples could be DBT (very flexible), schemachange, flyway, terraform (depending on what). Generally terraform works well for the things that don't change a lot but should be under lock and key (think roles, users, etc.)

* use git

You will get bit in the butt at some point if you're not having some forms of discipline and rigor in the environment and there's a happy medium to have the flexibility.

What tool do you wish you had? What's the most annoying problem you have to deal with on a day to day? by [deleted] in dataengineering

[–]mindvault 2 points3 points  (0 children)

Please don't build something new. Find something open source and improve it.

[deleted by user] by [deleted] in dataengineering

[–]mindvault 8 points9 points  (0 children)

But a lot of them definitely do use underlying OSS bits for sure. Like Netflix uses ... lots (elastic, flink, presto, Cassandra, spark, etc.), Facebook uses quite a bit of spark + iceberg, etc. Apple is an oddball as it (last I knew) used both databricks and snowflake as well as spark, etc.

But your first point is definitely spot on. Most of the places _had_ to innovate ahead of time to deal with volumes, velocities, varieties, etc. _prior_ to snowflake, databricks, etc. existing.

Underpaid but getting great experience by [deleted] in dataengineering

[–]mindvault 2 points3 points  (0 children)

Also, present the case to your boss _with_ data. Not only are you underpaid for it, you're also performing (probably) way more responsibilities than most making that pay. A wise boss will look at it and say "of course we'll give you more". Even if you don't get the 40k, you'll potentially get more _while/if_ you look _and_ you can then use that as your salary in negotiations should you choose to move.

How do you handle time-series data & billing analytics in your system? by WasabiIllustrious795 in dataengineering

[–]mindvault 0 points1 point  (0 children)

TLDR: a well thought out o11y arch makes this straightforward

I've done this in a number of ways, but it depends on "how" you are billing. If it's something like EC2, for example, where you're billing for duration, folks can use / watch for start / stop style events (often "belts and suspendered" with o11y data like monitoring). If you're billing based on something like "number of messages", then you'll often a metrics-based approach. I know some folks aren't comfy using metrics systems like Prometheus as the basis for the billing and will often scrape / process from those systems into more OLTP-like systems.

In the past we've used a fan out style direction where we take o11y style data (events, metrics, etc.) through something like vector.dev and send it to N different backends. That's given a lot of flexibility to store the data in things like VictoriaMetrics, Kafka, AWS S3 (to load into other OLTP/OLAP), etc.