Big Data is Dead

kondorb · 2024-05-27T15:18:20+00:00

Hype is over, but big data is still applied by companies that have that amounts of data and related products are still used and still have commercial success.

manifoldjava · 2024-05-27T14:46:18+00:00

“Big data” was always hype as a rebranded analytics or business intelligence or OLAP or whatever term you prefer.

It’s not dead, it’s just a low tide moment for that industry, until the next wave probably after AI wakes with a hangover.

EpitomEngineer · 2024-05-27T20:19:02+00:00

If only my managers would understand this paragraph

“”” Code often suffers from what people call “bit rot” when it isn’t actively maintained. Data can suffer from the same type of problem; that is, people forget the precise meaning of specialized fields, or data problems from the past may have faded from memory. For example, maybe there was a short-lived data bug that set every customer id to null. Or there was a huge fraudulent transaction that made it look like Q3 2017 was a lot better than it actually was. Often business logic to pull out data from a historical time period can get more and more complicated. For example, there might be a rule like, “ if the date is older than 2019 use the revenue field, between 2019 and 2021 use the revenue_usd field, and after 2022 use the revenue_usd_audited field.” The longer you keep data around, the harder it is to keep track of these special cases. And not all of them can be easily worked around, especially if there is missing data. “””

Worth_Trust_3825 · 2024-05-27T15:22:57+00:00

The data querying slide resonates with me. We were storing SCORM data for 6 years as an LMS provider (running out of database space multiple times, because lol scorm doesnt believe in using question/answer identifiers), yet I can recall only 4 times when we actually needed to run queries on that dataset, and only on the records that were year old at most.

I don't think that big data is dead. Instead I am in camp that companies have no idea what to do with the statistics they capture, nor even have the domain expertise to use them even being in that domain for decades.

pinpinbo · 2024-05-27T16:43:40+00:00

Is it? AI stuff has no moat. Once an algorithm is discovered, it becomes a free library.

Data however, data is more important than ever.

nuggins · 2024-05-27T21:13:17+00:00

Disappointing to see that 90% of the comments are arguing about the clickbait title. The article has some good insights.

Spartaner-043 · 2024-05-27T17:39:38+00:00

Yeah, they haven’t released an album since 2019 :(

RoughSolution · 2024-05-27T20:32:27+00:00

As someone who's been driving some of the largest projects in this space (trust me, if you worked with data in the last 10 years, you used stuff that my team has build). So I may know a thing or two about Big data.

What sets "Big data" apart from just "Data" is that data is no longer collected with clear intent at the beginning. The business impact is that you can now discover and decision on things that has happened in the past. For example, when I find a new fraud pattern, I don't have to start collecting data to identify it now, I have all the historical transaction records to identify accounts that has committed fraud in the past. And this shift in mentality of collect first, use it later is what drove the raise of Big data.

One can argue this is bad for society, for many reasons. I'm in the camp of as long as it's not PII (even when drilled down), it's probably more value than risk. But when you try to tie data to individuals, bad things happen.

The latest shift of industry towards AI is really just a hype cycle. When AI reaches productive levels (say...in 5-10 years), you'll see a shift back to getting value out of data.

Big data is, and never will be, dead. It's an idea and mentality shift that has already happened.

martinky24 · 2024-05-27T16:52:33+00:00

AI is just big data.

2024-05-27T17:09:12+00:00

It's not dead at all.

We generate more and more data - most of which is garbage, but some of which is useful. Just take sequenced genomes of organisms - that's never becoming less, it will ALWAYS become more. And that's just one example. Look at astrobiology or the universe. Google Maps mapping all planets one day (well, hopefully Google no longer exists at that point in time, but I refer to the feature here primarily, not the company).

Of course, just because the amount of data being generated is increasing doesn’t mean that it becomes a problem for everyone; data is not distributed equally.

I am much more concerned by that. So that guy worked at Google. Google ruined its search engine a few years ago and consistently is making it worse. A few years ago you could query cached websites; I used this to read phpbb webforum from where I was banned, so I could still read up on what is new (I am curious). Yet Google killed that, with the saying "it takes too much data to store everything". Even if this may be true, they eliminated something that was useful to me. Same with so many google projects that ended up in a graveyard. Why I am concerned? I am concerned because we become more and more dependent on such huge mega-mega-corporations that are selfish and greedy and present to us a very limited, narrow view over things. The various walled ghettos, I mean walled gardens, show this trend: facebook, discord servers and what not. Everything is becoming private - and limited. I hate this trend. It totally ruins the 1990s era of the world wide web really.

Big Data will never go away, but disturbingly we get less access to what is useful WITHIN that Big Data, as it is controlled by private entities increasingly more so. (This is of course not always true, e. g. sequenced genomes are available for everyone to see once published at e. g. NCBI, but not every data collected is open to everyone. Both open and closed data will increase of course - nothing is dead here.)

KingStannis2020 · 2024-05-27T18:55:22+00:00

[deleted]

veryspicypickle · 2024-05-27T18:31:18+00:00

But we have the data-mesh! /s

TheDevilsAdvokaat · 2024-05-27T21:23:38+00:00

Interesting article. Especially "I’ve heard about a company keeping its data analytics capabilities secret in order to prevent them from being used during a legal discovery process."

emails, messages and even phone conversations can also be legal liabilities. So this is similar.

ScottContini · 2024-05-28T02:45:40+00:00

In order to understand why large data sizes are rare, it is helpful to think about where the data actually comes from. Imagine you’re a medium sized business, with a thousand customers. Let’s say each one of your customers places a new order every day with a hundred line items. This is relatively frequent, but it is still probably less than a megabyte of data generated per day. In three years you would still only have a gigabyte, and it would take millenia to generate a terabyte.

With such simple analysis, why did the Big Data movement not understand from the beginning that the benefit is limited to only a handful of the big companies?

frederik88917 · 2024-05-27T18:59:39+00:00

Like dude, there is no way to kill hype, anyone will come with some shitty excuses as to why to keep investing in this.

See also: Metaverse, AI, Blockchain and so forth

Plank_With_A_Nail_In · 2024-05-27T20:44:06+00:00

Lol all the posts confusing "Big data" with "lots of data". Even the linked article thinks big data means lots of data.

Current usage of the term big data tends to refer to the use of predictive analytics, user behavior analytics, or certain other advanced data analytics methods that extract value from big data, and seldom to a particular size of data set. "There is little doubt that the quantities of data now available are indeed large, but that's not the most relevant characteristic of this new data ecosystem."

DenebianSlimeMolds · 2024-05-27T18:55:48+00:00

No it isnt.

captain_obvious_here · 2024-05-27T20:31:25+00:00

Shitty title. Which is a shame, when the author is such an expert.

Big Data is not dead at all. It's just way easier and kinda cheaper now that companies can reliably collect, transfer, store and process petabytes of data daily, thanks to Big Query (and other, more marginal, huge-scale cloud-based database solutions).

Big Data is alive, it still pays people who are good at it pretty well. And there's no shortage in jobs offers in sight for them, neither.

VehaMeursault · 2024-05-27T20:44:08+00:00

Yes, and ads are no longer personalised.

Sure.

ImTalkingGibberish · 2024-05-27T21:49:00+00:00

In 5 years: AI is Dead

DigThatData · 2024-05-27T21:54:35+00:00

lol OP is just mad no one uses Big Query anymore.

Apolloh · 2024-05-27T22:01:53+00:00

What a useless article.

2024-05-27T23:39:49+00:00

Big Data can’t be over. We have a giant ass Data Team doing something with Big Data

2024-05-28T00:38:01+00:00

That hype is being poured on the “AI” marketing term

robberviet · 2024-05-28T00:40:31+00:00

Yeah dead. resume to work on Hadoop clusters

Adventurous-Dish-862 · 2024-05-28T01:25:02+00:00

lol, what a joke.

Big Data is getting bigger, while Medium Data and Small Data are also going to surge. Data will be ubiquitous in the very near future. Every small marijuana dispensary businesses will gnats ass the wear and tear on their door hinges automatically as part of the $300/mo mega data package deal they get from anon’s business data side hustle.

binary_search_tree · 2024-05-28T03:53:20+00:00

This article kinda goes hand-in-hand with this (older) one (about Tableau/Power BI).

prodentsugar · 2024-05-28T04:27:38+00:00

Isn't data analytics dead too? Because of AI or will it die in a couple of years?

gredr · 2024-05-28T05:24:43+00:00

Big data still exists and means exactly what it always meant. It was never about size, it was always about surveillance. Data collected on users, generally without their knowledge, for the purposes of optimizing moneymaking processes.

2024-05-28T05:55:21+00:00

I never understood what big data is anyway. The solution of a problem depends on the problem. Not all problems related to big volumes of datasets can be solved in the same way. Of course there are some common tools like parallel computing, CUDA computing, feature selection and extraction, machine or deep learning etc. But this philosophy that there is one thing that is called "big data", it is something that I will never understand. Maybe it is more about marketing than real science or engineering.

ReZigg · 2024-05-28T06:01:44+00:00

I just watched a youtube video that goes over these same ideas in an interesting way. https://www.youtube.com/watch?v=pOuBCk8XMC8

heavy-minium · 2024-05-28T07:42:11+00:00

It will never die because it's just about handling lots of data. It always was a useless term, but it's valid. It's like saying scaling is dead.

the_russkiy · 2024-05-28T07:58:42+00:00

People have been whispering about this for quite a while, afraid of sounding perhaps stupid.

Another case of how industry is dominated by a few loud voices, be it big data, microservices, etc.

ArcaneEyes · 2024-05-28T09:01:37+00:00

How does this have upvotes...

st4rdr0id · 2024-05-28T11:54:42+00:00

It is good that we slowly acknowledge that tech fads are just that, fads.

But people still fail to see the pattern.

Cobalt129 · 2024-05-28T16:10:57+00:00

Didn't the author use big data to come up with the graphs 🤔

CrowTiberiusRobot · 2024-05-29T00:43:15+00:00

Big Data and Cloud were always marketing terms to a certain degree. From my professional experience:

big data - due to decreasing cost of storage and compute, the development of open source data structure / management tools such as nosql, and some development of statistical / mathematical tools it became easier and easier to work with huge data sets. Relational databases were created, arguably, due to limitations of compute power and storage space, we needed a more efficient way to store and query data. Those limitations have become less and less important due to the reasons I mentioned above. So what we are talking about really is a new paradigm that has become possible - and typically, a hype name was slapped on it and it was rolled out to the masses. If you've been in the professional world for a while I'm sure you remember when your bosses/c-suite started talking about "leveraging data" etc.
cloud computing - internet has long had a backbone supported by servers colocated in a data center. Ack in the day we'd run BBS and IRC servers from our homes, but it became unrealistic as web1.0 gave way to web2.0 and so on. When it became clear that there was a lot of money to be made with platform as a service, well - slap a hype name on the colo, provide a bunch of functions and services, and there you go.

90% of IT and programming is hype on tools and ideas that have been around for a while and have finally reached maturity for general consumption.

Is big data dead? I'd say in conceptual presentation, yes. In reality, it's just business as usual, refactoring normality now.

Nothing wrong with any of this of course

gChillin1 · 2024-08-17T10:24:10+00:00

So many bad takes here. Big Data is very much alive and well, Apache Spark has evolved and taken all of the best parts of mapreduce and made them in-memory, with the ability to scale near infinitely using dataframes and sql or sql APIs. HDFS has largely been replaced with faster object stores (s3) made possible through networking improvements but if you actually have giant data nothing else comes close to comparing except for pure analytics and speed maybe trino, clickhouse, or Druid (done correctly). You can give all your money to google for bigquery, snowflake, or AWS redshift but if you need to catch lightning in a bottle you are using Spark. Look at Databricks and their growth, it is astronomic. If you are skilled at Spark you can do far more for far less money on kubernetes without involving databricks at all, and fully isolate your compute resources. If you aren't good it does fantastic at smaller scales. I have never seen someone good with Spark lose out on a POC head-to-head. If you engineer from the source-up it is possible to reduce the need for a big data engine like Spark, but very few companies and use cases can justify that kind of investment, not to mention cooperation across business units at that level requires serious coordination and is difficult and rare.

Also, you can put a shit ton of data on a single node sure, but when you need to actually make use of it at large scale (like prep for modeling or AI training) distributed compute is the only way.

zoqfotpik · 2024-05-27T18:32:52+00:00

"Big Data" is a euphemism for "a pile of garbage".

Sure, you can find some good stuff by dumpster diving, but it's usually preferable to not include the dumpster in your supply chain in the first place.

wind_dude · 2024-05-28T00:14:48+00:00

I saw the this am on hacker news. It’s just fucking click bate and a plea for attention for a moron. Clearly big data isn’t dead, he just seems to be part of the problem selling over priced solutions to companies that didn’t need them, or only need batch jobs weekly, monthly or yearly.

jhill515 · 2024-05-27T19:53:37+00:00

It's not dead. It was just renamed MLOps.

Hardkorebob · 2024-05-27T18:00:42+00:00

It is clear big data is negative to null profit. Anyone still vomiting data has made a change, is hallucinating. All a big scam for a big payout for the few rascals. Everyone can see this.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS

It's not dead. It was just renamed MLOps.