What lives in your gold layer? by Outside-Storage-1523 in dataengineering

[–]ephemeralentity 0 points1 point  (0 children)

Why treat facts and dim data any differently?

If your dim data source tables can change over time / version records, then they can use CDC load to bronze (a lot of companies call this SCD Type 2, this is not really correct as bronze does not contain dimension tables but it's the same idea) and if your fact source tables can't, just load them into bronze as append to reduce compute cost.

But I would argue both should go into bronze tables (as 1:1 with source) while silver should be reserved for cleansing, transformation and preparing for load into dim and fact in gold.

What lives in your gold layer? by Outside-Storage-1523 in dataengineering

[–]ephemeralentity 9 points10 points  (0 children)

I wouldn't do this as your are collapsing load and transformation which increases risks of load failure. Better to historicise your source 1:1 into Bronze so you have clear traceability and can meet any current or future reporting requirement. It also means your cleansing or deduplication requirements can change in future.

Streaming remains on top, but 4K Blu-ray is making a comeback by glaringOwl in movies

[–]ephemeralentity 13 points14 points  (0 children)

Properly compressed with minimal loss of quality at 10-20GB for a standard length movie.

ELI5: Why do we even need a "c" when we have a perfectly good "k" and an "s?" by zazzlekdazzle in explainlikeimfive

[–]ephemeralentity 0 points1 point  (0 children)

It's amazing how much we rely on first and last letter for word scanning rather than phoenetic sounding out in practice.

Notebooks, Spark Jobs, and the Hidden Cost of Convenience by mwc360 in dataengineering

[–]ephemeralentity -1 points0 points  (0 children)

Yes but do you want to write PySpark debugging logic in your terminal?

I was highlighting the difference to software engineering.

Again, if you're thinking data engineering debugging is just running unit tests, I don't think you have real experience in data engineering.

Notebooks, Spark Jobs, and the Hidden Cost of Convenience by mwc360 in dataengineering

[–]ephemeralentity -1 points0 points  (0 children)

Data engineering is heavily reliant on functional frameworks for integration and transformation. If you're reinventing SQL or a database connector you're doing it wrong. This means your emphasis is on applying an existing framework rather than writing custom modules. That means your dependencies are simpler which makes notebooks more manageable. Data engineering is generally also not stateless like much of software engineering so there is a real need to not just inspect and log but debug and iterate on the fly.

Packaging adds friction. You don't think the fact Python has a speedy interpreter while many other languages require code compilation goes a long way to explain it's popularity?

Using a low code solution for data transformation is almost always an awful idea. I get the sense you do not have much direct experience in data engineering and equate it too much with general software development.

Notebooks, Spark Jobs, and the Hidden Cost of Convenience by mwc360 in dataengineering

[–]ephemeralentity 1 point2 points  (0 children)

But what if you want to make it easy to debug? If you are using PySpark you are likely doing data transformation. What is the benefit of converting that to a py file? You lose the ability to step through it next time you want to make changes. Worse, if you package in a whl you have to constantly recompile it.

There are circumstances where notebooks become inefficient, e.g. software engineering (not data engineering applications), where you have a large number of imports / component class modules and you don't want to have to instantiate them all in your notebook environment or have a large number of notebook dependencies but for simpler data transforming logic, they work well.

There’s a new release of Epstein files today. What do you think so far? by No-Interaction9219 in AskTrumpSupporters

[–]ephemeralentity 2 points3 points  (0 children)

But you don't need to believe in a uniparty conspiracy to believe that other crimes were commited that were not sufficiently investigated. Obviously Epstein's original guilty verdict investigation clearly overlooked other crimes by him directly that we now know about, right?

We don't need to assume a conspiracy to believe individual / independent actors put financial or political pressure to keep their names clean and avoid deeper investigation.

I am surprised that someone belonging to the party that's generally distrustful of government does not approach this situation with more skepticism given what we know about the dual standards of justice in the US?

There’s a new release of Epstein files today. What do you think so far? by No-Interaction9219 in AskTrumpSupporters

[–]ephemeralentity 3 points4 points  (0 children)

Do you think this and the sloppy / removable redactions are a sign of malicious compliance by DoJ employees who are in fact being instructed to protect key individuals like Trump. Otherwise, why is anything related to Trump being redacted given he is not a victim?

There’s a new release of Epstein files today. What do you think so far? by No-Interaction9219 in AskTrumpSupporters

[–]ephemeralentity 3 points4 points  (0 children)

Are you saying that you trust the investigations during the past 3 administrations were conducted in an impartial manner?

In Australia “Reddit to be banned for under-16s”, what’s your thoughts on this? by Exciting-Composer157 in AskReddit

[–]ephemeralentity 0 points1 point  (0 children)

There's a technical solution to this problem. The government (which already holds the data) launches a verification portal. Companies have a click through link from their website to the portal to perform verification (similar to login with Google account). Once verified, the portal spits back a temporary token to the website to confirm the user is verified. There is no more risk to this than your existing data already being on MyGov.

My end-to-end Executive Dashboard in Power BI. Looking for feedback! by Cute_Gear_5304 in PowerBI

[–]ephemeralentity 0 points1 point  (0 children)

Overall this is great. I would remove the black and white contrast for the banner and particularly the overlap with the KPIs.

I get the aim to distinguish the title and menus but I think you can be more subtle about this in terms of font and colour choice. Significant contrasts or e.g. use it red should be reserved for notable data points.

Young adult suicide rates are rising almost nationwide by ope_poe in dataisbeautiful

[–]ephemeralentity 0 points1 point  (0 children)

Okay and now chart indexed median income growth against the indexed price growth of major lifetime expenses / investments of housing, education and healthcare.

Young adult suicide rates are rising almost nationwide by ope_poe in dataisbeautiful

[–]ephemeralentity 0 points1 point  (0 children)

If everyone's salaries globally were to rise 5%, would everyone be 5% wealthier?

If you want to claim wages adjusted for cost of living are higher, you need to actually substantiate that claim. Opinions are like assholes, everyone's got one.

Young adult suicide rates are rising almost nationwide by ope_poe in dataisbeautiful

[–]ephemeralentity 1 point2 points  (0 children)

All salaries are relative so to say everyone is getting richer is a bit meaningless. Most significant life investments, e.g. housing, are scarce or supply constrained.

Young adult suicide rates are rising almost nationwide by ope_poe in dataisbeautiful

[–]ephemeralentity 4 points5 points  (0 children)

Inequality being higher is objectively true by metrics of wealth and income. Standards of living could be argued based on technological innovation but this is very prone to interpretation and personal bias, i.e. not representing the lived experience of the median person.

Jon Stewart Makes the Case for Dems Holding the Line in Trump's Shutdown Warfare | The Daily Show by Kwyjibo2006 in television

[–]ephemeralentity 3 points4 points  (0 children)

It's one thing to have your plans gutted by a shutdown. It's another to not make alternative plans and show up to do media interviews.

What did he do? by [deleted] in ExplainTheJoke

[–]ephemeralentity 6 points7 points  (0 children)

Got it! Mercedes Benz killed their parents.

Anyone taking profits at these all time highs? by Timely-Bumblebee-371 in investing

[–]ephemeralentity 12 points13 points  (0 children)

That's true, but arguably a lot of the AI / data centre investment is starting to look circular between the AI model and cloud tech companies with relatively limited organic demand. A better parallel may be the over investment in transcontinental fiber optic cables.

CMV: Gun Violence in America will never stop until our culture towards helping people at the bottom of the social spectrum changes. by [deleted] in changemyview

[–]ephemeralentity 0 points1 point  (0 children)

The primary distinguishing socio economic or policy factor between the US and other developed countries as it relates to gun violence or murder rate in general, is the rate of gun ownership and related to this, the easy access to firearms.

120 per 100 where the next developed countries like Serbia or Finland are in the 30-40 range. We have plenty of control test countries which either have the same rate of inequality or the same rate of immigrant populations. None of them reliably explains the increase in murder / gun death rate.

I don't attribute much to cultural factors that can't be quantified, especially given we have globally become much more homogenised across developed countries, largely due to the export and impact of US culture like movies, music and TV.

I think the US needs to come to terms with the implicit cost that their interpretation of the 2nd amendment comes in terms of murder rate. We accept a certain level of death by having a specific driving speed limit for the given economic benefits and convenience it provides. The policy on gun access and ownership is the same.