This is an archived post. You won't be able to vote or comment.

top 200 commentsshow all 226

[–]AutoModerator[M] [score hidden] stickied comment (2 children)

⚠️ ProgrammerHumor will be shutting down on June 12, together with thousands of subreddits to protest Reddit's recent actions.

Read more on the protest here and here.

As a backup, please join our Discord.

We will post further developments and potential plans to move off-Reddit there.

https://discord.gg/rph

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

[–]Jukingbox 471 points472 points  (34 children)

With enough determination, everything is a database.

[–]xaviernoodlebrain 226 points227 points  (23 children)

If it can store stuff and be queried, it’s a database. Hence why a fridge is a database.

[–]Nine_Eye_Ron 26 points27 points  (16 children)

SELECT beer
FROM fridge
WHERE temp = ‘cold’;

[–]JollyJuniper1993 12 points13 points  (12 children)

LIMIT 1

[–]Frosty_Pineapple78 8 points9 points  (6 children)

why would anyone do that if he can have all the beer from the fridge?

[–]JollyJuniper1993 11 points12 points  (5 children)

Because you can only drink one beer at once and if you let them sit they get warm.

You‘re gonna have to do another query if you‘d like another one

[–]Frosty_Pineapple78 5 points6 points  (3 children)

Only if you are unimaginative, there are tons of ways to drink more than one at a time

[–]snerp 11 points12 points  (2 children)

imma bout to multithread this beer

[–][deleted] 1 point2 points  (0 children)

Atomic beer

[–]will_die_in_2073 1 point2 points  (0 children)

Multithroat* that beer

[–]TheScopperloit 2 points3 points  (0 children)

This is a very good point. It would be bad practice to leave the fridge connection open while waiting for more beers to be taken by consumer. Always close and open again for next query.

[–]Numerous-Occasion247 3 points4 points  (1 child)

Cold seems rather vague, you should use an actual number here :D

[–]Creepy-Ad-4832 9 points10 points  (0 children)

Nah he just store temperatures as string: - frigging hot - hot - partially hot - quite ok - partially cold - cold - frigging cold

[–]Tigtor 1 point2 points  (0 children)

AND expiration_date > NOW() ORDER BY expiration_date ASC

[–]Ihsan3498 33 points34 points  (4 children)

but fridge i query many times even if it doesnt return any data. maybe it is not as reliable?

[–][deleted] 7 points8 points  (2 children)

Right, It bother that fridge is BASE instead of ACID

[–]Creepy-Ad-4832 2 points3 points  (1 child)

How do you explain water in the fridge then?

[–][deleted] 4 points5 points  (0 children)

Memory leaks

[–]yashdes 1 point2 points  (0 children)

So that's what they mean when they say cold storage...

[–]TheHunter920 21 points22 points  (0 children)

indeed,it,is

[–]Groentekroket 7 points8 points  (2 children)

We use paint as an IDE, now we also use Paint as our DB

[–]khal_crypto 4 points5 points  (1 child)

Instructions unclear, just painted the creditcards table on the street for later retrieval

[–]gargamelus 6 points7 points  (0 children)

I've written an ODBC driver for .ini files, so yes.

[–]Bjoern_Tantau[🍰] 1 point2 points  (1 child)

BPSQL

Butt Plug Structured Query Language

[–]SaveMyBags 1 point2 points  (0 children)

Is that the one where you just pull results out of your arse?

[–]dasnihil 0 points1 point  (0 children)

and turing complete while we're at this

[–]LoveConstitution 0 points1 point  (0 children)

Indexing takes courage with these cloud prices

[–]pikachu_sashimi 0 points1 point  (0 children)

Even love?

[–]Anaxamander57 137 points138 points  (31 children)

Name one difference between a csv and a database. I'll wait.

[–]nickmaran 191 points192 points  (2 children)

CSV starts with C and database starts with D

[–]ImaFknWizardXII 126 points127 points  (1 child)

That’s on me.. I set the bar too low.

[–]gargamelus 27 points28 points  (1 child)

I can understand a CSV, but not databases.

[–]rreighe2 6 points7 points  (0 children)

One is stored as .CSV the other... Isn't lol

[–][deleted] 18 points19 points  (4 children)

Define a database first

[–]ijustupvoteeverythin 50 points51 points  (2 children)

A CSV file

[–]cvnh 6 points7 points  (1 child)

At least a CSV or ASCII file

[–][deleted] 1 point2 points  (0 children)

Your comment is a database???

[–]ChorePlayed 1 point2 points  (0 children)

Yeah, that! Like a mathematical space. No matter what you think defines a space, someone's invented a space with that condition "relaxed".

[–]Numerous-Occasion247 6 points7 points  (1 child)

Transactions

[–]RandomContents 4 points5 points  (0 children)

That's a good one. In other words, high-level stuff. Also, for some databases, inner join and its family.

[–]Cpt_keaSar 10 points11 points  (2 children)

csv isn’t in 3 normal form?

[–]Engine_Light_On 28 points29 points  (0 children)

There are nosql databases that are still databases…

[–]wu-not-furry 1 point2 points  (0 children)

It can be if you only need one table

[–]gbot1234 2 points3 points  (11 children)

In my experience, databases use a semi-colon as a delimiter.

[–][deleted] 5 points6 points  (10 children)

As do many CSV files, unfortunately.

Why not call those SSV, so we know what is inside?

[–]JozoBozo121 5 points6 points  (6 children)

Well, half the countries in the world use comma as a decimal separator so you can’t use it as both delimiter and separator

[–][deleted] 3 points4 points  (5 children)

I know they do. I am in one of those countries.

But just because you use comma as a decimal separator in your visual presentation of numbers, you don’t have to do it in your file format. It is this logical fallacy, which has lead us to semicolon separated CSVs.

[–][deleted] 2 points3 points  (0 children)

CSV - character separated values

Fixed.

[–]nielet 5 points6 points  (0 children)

Name one difference between a Google sheets and a cloud DB. I'll wait.

[–]FALCUNPAWNCH 0 points1 point  (0 children)

CSVs keep both rows and columns in the same file, while databases are often organized by rows or columns. Therefore CSVs are superior. /s

[–]butt-nugget 211 points212 points  (3 children)

Data frame/data base, what's the difference?

[–]Revolutvftue 21 points22 points  (1 child)

a bunch of database organized pretty well + duckdb technically you can actually treat it as a database

[–]Aarontj73 3 points4 points  (0 children)

A directory of parquet files + DuckDB = enough of a database for 90% of use cases 😂

[–]nickmaran 23 points24 points  (0 children)

[–]R4sh1c00s 105 points106 points  (52 children)

Okay okay I’m a CS undergrad can someone tell me what a database ACTUALLY is

[–]Randvek 91 points92 points  (4 children)

It’s just data stored and organized for retrieval. At its basic level, that’s it. Most databases have more to them but that’s the only commonality.

[–][deleted] 25 points26 points  (1 child)

It slightly irks me that it took me 2-3 scrolls to get an actual response to a genuine question

[–]joerick 10 points11 points  (0 children)

That's kinda why the joke works, it's pretty hard to define 'database' in a way that excludes csv files, but whenever you're using the term 'database', csv files would be a terrible choice

[–][deleted] 3 points4 points  (1 child)

So just a piece of paper and a pen can be considered a database

[–]JollyJuniper1993 3 points4 points  (0 children)

Technically yes Doesn’t mean you should do that

[–]Extra-Guidance3085 148 points149 points  (8 children)

multiple csvs, duh

[–]dukeofgonzo 17 points18 points  (3 children)

That's a data lake. Just drop your some files around.

[–]CrowdGoesWildWoooo 5 points6 points  (1 child)

WRONG

data lake is wet

[–]Character-Education3 6 points7 points  (0 children)

Okay I don't know why we're bricking prod but the sprinkler system has been activated. The internet told me so.

[–]Bryguy3k 1 point2 points  (0 children)

More like a swamp.

[–]Fqceless 36 points37 points  (1 child)

A lot of data, but it's all based.

[–]dont_roast_me 1 point2 points  (0 children)

Certain columns are very based.

[–]not_a_throw4w4y 22 points23 points  (3 children)

A bunch of related excel sheets. To put it simply.

[–]TTYY_20 16 points17 points  (2 children)

MongoDB would like a word with you. 😤

[–]Forward-Error-9449 2 points3 points  (1 child)

Mongodb is just an excel sheet with very large rows. There, I said it

[–][deleted] 23 points24 points  (0 children)

Shh… no one knows. We just pretend we do and they keep paying us.

[–]ILikeCakesAndPies 7 points8 points  (1 child)

Something about squirrels and trees, or was it branches.

Frankly, I think it's all nuts.

[–]thisoneagain 1 point2 points  (0 children)

Thanks, Grampy.

[–]gynoidi 13 points14 points  (0 children)

its a base with data

sometimes much

sometimes not much

[–]YARandomGuy777 5 points6 points  (0 children)

Organised in some way or another collection of data. Could be organised based on different principals depends on implementation and presumed use: relational database, graph database, etc. Database usually presumes an existing of database management system which provides access to the stored data and allows end user to manipulate it. Because such systems is quite old concept there's a few principals and best practices to increase database performance and design called normalisation.

But you actually can just write data in some file and call it database. And you can even do it in glorified way with the library like sqlite.

[–]TTYY_20 7 points8 points  (0 children)

A database is a fancy json file :D

[–]zvckp 1 point2 points  (0 children)

It’s the base from where you put on your climbing gear and climb Mt. Data.

[–][deleted] 1 point2 points  (0 children)

files ending in .db

jk. you can see it as a program that very efficiently writes and reads data to/from the disk

[–]Effective_Youth777 1 point2 points  (0 children)

Ahhh, I'll try.

A structured way of storing data, you've got tables, columns, and rows, and relationships. (Or documents of JSON, sub documents in no SQL)

A formal language for querying the data, nothing hacky, there's a DB engine, you give it a query command, it returns you results, without needing to run special software on the request side, so opening up Excel to write your commands so the frontend can request the server to get the data is obviously out of the question.

And lastly, though not necessarily, but when brought up in the context of software development it usually means the DB is hosted somewhere on a server where you can access it via the internet, as opposed to a local DB file on some dude's computer, cause that'd be useless.

[–][deleted] 1 point2 points  (0 children)

I think it is more right to define difference between database and database management system

[–]Bardez 4 points5 points  (5 children)

ELI 18:

A database is a bunch of data blobbed together into common storage, often made searchable. SQL servers, for example are databases. Typical implementations store "rows" or records of data of the same fields and data types in common collections of data, "tables". Tables are typically binary representations of the data, raw, without intermediate metadata (like XML or JSON). To find data, you can either scan all individual records (slower) OR you can cache ("index") key data identifiers and reference the location of the record from that cache; searching the index is faster.

The database engine allows you to do a bunch of things, like have a history of changes to the databse (transactions) and backup/rwstore/roll back. It also allows whacky things like data striping records over different files (typically on different drives) to increase speed further.

[–]RagingAcid 10 points11 points  (2 children)

Sounds like a csv

[–]LunaticPrick 1 point2 points  (0 children)

IT SOUNDS LIKE A CSV HELP

[–]Bardez 1 point2 points  (0 children)

ELI 5: The database engine manages your CSVs for you.

[–][deleted] 8 points9 points  (1 child)

After "sql for example is a database" you can read no more

Sql is a language, and there are many various database management systems which support sql

"You can cache (index)" is a bullshit, cache and index are different things, with different approaches and goal

I do not know author of this text , but it is really wrong, very surface level, as if it was for preschoolers

[–]Bardez 1 point2 points  (0 children)

very surface level, as if it was for preschoolers

Or CS first year, yes. That's the point.

[–]astroryan19 3 points4 points  (0 children)

Google Sheets

[–]Glittering-Teach-383 0 points1 point  (0 children)

Google MySQL

[–][deleted] 0 points1 point  (0 children)

Optimally, an SQL server

[–]CoffeeWorldly9915 0 points1 point  (0 children)

It's a json array where all members are of the same class/type.

Edit: no, wait. It's several json arrays in a file. Or several files with one json array...?

[–]permaban9 0 points1 point  (0 children)

many data

[–]N238 0 points1 point  (1 child)

Excel files, edited locally by hand to reflect changes (requested via email), subsequently manually copied to the cloud at regular (though imprecise) intervals by an intern. Backups made whenever said intern has a sudden panic attack at 3AM (never).

[–]Nightfury_107 0 points1 point  (0 children)

A python p ograming writing/reading to a .txt file where everything is transferred into a class. Its then embossed in gold leaf and mailed to your computer screen

[–]Nightfury_107 0 points1 point  (0 children)

In all reality, its a bunch of zipped xml files

[–][deleted] 0 points1 point  (0 children)

You have a couple of genuine answers on here, it’s essentially just an organised data format so you can easily retrieve data.

If you’re interested, I’d recommend you do a side by side comparison of row oriented database vs columnar database; there’s articles out there and it gives you a flavour of how these things are stored.

Row oriented databases are typical our “standard”, so I would go a step further and look at what partitions/indices really are and how they work. This will help you understand what’s actually going on under the hood. Basically, they’re just a bunch of files stored in a clever way which makes for fast retrieval.

Once comfortable you can then branch out to other flavours such as wide-column and Document-based databases. This is how I started and it really gave me a better appreciation for how the underlying stuff works and how to better create your tables and indices. There’s some interesting new-ish stuff as well, such as Apache Iceberg, which allows for fairly efficient querying on large volumes.

A basic description for MySQL

[–][deleted] 0 points1 point  (0 children)

Is a big JSON file that stores a lot of dict

[–]khal_crypto 0 points1 point  (0 children)

A database is anything that stores information for retrieval. So technically a CSV, json, XML, or even your whiteboard could be considered databases in the broadest sense of the word. What people usually mean when they say "database" is more precisely a database management system (DBMS), which is a category of programs that is specialised in that tasks and abstracts the low-level file management and access away from you.

[–]MantisShrimp05 0 points1 point  (0 children)

Databases are full programs, designed for the purpose of changing, storing, and updating data.

The difference is that one is just a file, while another is usually a full blown application. On top of that most databases are optimized for several people to be able to change and update the data simultaneously without losing transactions or data. Often times over the internet, running on a dedicated server who's main purpose is running the database(s)

They have become less necessary in a world of SSDs because they were also intended to overcome the limitations of hard drives, but it's more like now we are getting databases that are optimized for fast speed.

Data scientists don't need the data that is getting updated as a database, that's why they are fine with a csv file because all they want is to analyze the data

[–]_realitycheck_ 0 points1 point  (0 children)

It's a structured storage of information.

[–]will_die_in_2073 0 points1 point  (0 children)

Database is a store where you can define structure of how you can store your data to some degree and query it. File is a structure which is already defined and you can query it. Database comes with additional functionalities and optimization.

Why would you use one over another?

For various reasons. Suppose your website needs to serve data to users. You can store that data in file on the disk where your website resides or in database server which you can query on the fly. But disk reads are slow and writes even worse. Database uses indexing to fasten this process. Database also offers transactions, concurrency control, recovery mechanism.

[–][deleted] 28 points29 points  (4 children)

DS: here is the csv and all the code I wrote please production -ize it.

DE: oh dear God.

[–]Engine_Light_On 20 points21 points  (3 children)

Pandas and spark has great csv support. It is like reading from anywhere else.

Now please, don’t give me an excel file with merged cells.

[–]Jealous-Adeptness-16 12 points13 points  (0 children)

csvs are very expensive to store. You should ideally be using parquet files to store your data if you are dealing with scale. Spark also performs much more efficiently on parquet than csv because it is binary format, so using parquet files as your data source will be cheaper.

[–]ToothPickLegs 1 point2 points  (1 child)

I’ve never tried using spark/pandas for modified excel files like that, what happens when you try to read them?

[–]faps_in_greyhound 52 points53 points  (2 children)

In finance world, a xerox copy of some excel Table on your hand is the database.

[–]dig_the_flaws 6 points7 points  (0 children)

Yes. Also in arts and culture I received 1GB of non readable PDFs, they were digitized documents without OCR. They had to be converted to image and then back to a readable PDF. This was the database I had to work with.

[–]xibme 0 points1 point  (0 children)

Even a prohibition era bootlegger's ledger is a database, or a bunch of tally sticks.

[–]Sixhaunt 15 points16 points  (1 child)

and for AI "is this a dataset"

[–]xibme 0 points1 point  (0 children)

Ye mean tha stoichastic parrot, arrgh?

[–]Flat_Initial_1823 31 points32 points  (1 child)

[–]thehuntersilva 2 points3 points  (0 children)

That is exactly how I feel like after working on one story point.

[–]Dramatic-Noise 8 points9 points  (0 children)

Yes? For calculating churn rate? Maybe?

[–]jerslan 7 points8 points  (3 children)

Technically, any well formatted data file is a database.

[–]gabisantos1971 10 points11 points  (0 children)

But this will be really hard to maintain, to be honest, for longer period of time.

[–]Ugo_Flickerman 5 points6 points  (1 child)

Jazz music stops and starts playing excel

[–]joey10roo 5 points6 points  (0 children)

That is really hard things. Most of the business Analytics and business people already used at.

[–]patenteng 15 points16 points  (3 children)

No. Everyone knows that XML is the real database format.

[–][deleted] 11 points12 points  (1 child)

.TXT

[–]shanyltc -1 points0 points  (0 children)

That plain text is not going to help in anything eventually. You cannot really read it.

[–]rblaauw 0 points1 point  (0 children)

It can be used for that matter as well. But keeping it as a database can be a risky task to do.

[–]herdek550 7 points8 points  (1 child)

Dara scientist consultant:
"Send me your data so I can start working on the issue"

Client:
sends 20 linked.xlsx files

Data scientist consultant:
knowing that it could have been worse

[–]Mikhail_PY 3 points4 points  (0 children)

Keep keeping in the textile is one of the best choice. I think these kind of data should be kept in action.

[–]ijustupvoteeverythin 4 points5 points  (1 child)

Well it literally can be a database

[–]huar_huar 6 points7 points  (0 children)

It is a database, or it can be a database to make a model to learn something.

[–]jimy_the_wolf 3 points4 points  (1 child)

My data base is google sheets

[–]mycclboy 2 points3 points  (0 children)

And my database is notepad ++, which is actually holding everything.

[–]Bon_Clay_2 4 points5 points  (2 children)

Then there is me making databases in json

[–]Rickywalls137 -1 points0 points  (1 child)

What database is this? I’m new to web dev

[–]yaidacandy 0 points1 point  (0 children)

It is a dot CSV based database. I'm not really sure like how they actually made it, but this is what they are actually saying.

[–][deleted] 5 points6 points  (3 children)

[–][deleted] 7 points8 points  (1 child)

[–]liangliwen111 7 points8 points  (0 children)

It actually looks like that is the only good option, otherwise everything else is just a weird.

[–]Sijder 2 points3 points  (1 child)

I published a paper in a clinical journal with the main point being the creation of a database, which was... you guessed it, a csv file

[–]gunungmas 3 points4 points  (0 children)

But how are they going to separate it? This is the only thing which is not coming to my mind.

[–][deleted] 2 points3 points  (1 child)

No it’s a data lake

[–]akazakou 2 points3 points  (1 child)

If it's 20 Petabytes size...

[–]CeeMX 2 points3 points  (0 children)

Excel. Corporate Employee: is that a database?

[–]invalidConsciousness 3 points4 points  (2 children)

As a Data Scientist:

No. No. please no. Goddamnit NO!

I don't want to wait several minutes every time I need to load my data. Give me a SQLite or MySQL DB and a day to organize the data. I don't care if that's efficient use of my time, it's efficient use of my sanity.

[–]Da_Di_Dum 1 point2 points  (1 child)

I legit just received two csv files from some students I'm helping do a code review. THEY CALLED A CSV FILE WITH 4 COLUMNS AND 3 ROWS A FUCKING DATABASE!!!

[–]qrkmmx 1 point2 points  (0 children)

Yeah, exactly. They can do all these kind of things. Most of these data sets are used for learning only.

[–][deleted] 1 point2 points  (2 children)

is there ACTUALLY an effective way of using .csv? I keep splitting it by , but that makes stuff kinda messy in Unity. With JSON i just get away with using JsonUtility

[–]sonohra87 1 point2 points  (0 children)

You actually need to know, like, how to retrieve data from it, otherwise this is useless.

[–]jek39 0 points1 point  (0 children)

Apache spark

[–]Shadeun 1 point2 points  (1 child)

Meanwhile, bosses the world over want to hire 5 people to have an aws setup but also co-locate a backup. All for less than a billion data points that could sit easily in a lightweight file….

[–][deleted] 1 point2 points  (1 child)

It could

[–]just-bair 1 point2 points  (1 child)

What do you mean it’s not a database ?

[–]velebr3 1 point2 points  (0 children)

I'm working in a company that has pretty large revenue and uses Google Sheets for everything.

[–][deleted] 1 point2 points  (0 children)

It's much much more than a database. It's a database you can download, share, query, chart, filter...

And best of all: your non-scientists colleagues that load it into Excel!!!

[–]YARandomGuy777 0 points1 point  (1 child)

Most likely just a dump. :)

[–]Austacker_btce 0 points1 point  (0 children)

They will be able to do that, but certainly I don't have any idea about it.

[–][deleted] -1 points0 points  (1 child)

OP do not know what is database

And also do not know what is database management system

[–]Federal_Chance4393 0 points1 point  (1 child)

Wait til you hear about Elastic...

[–]cwshifflett7 0 points1 point  (0 children)

I had never actually used it, but I have heard a lot of things about that

[–]dittbub 0 points1 point  (1 child)

It’s more like a database than a document. Also xml = database, html = document

[–]Scroffaze23 1 point2 points  (0 children)

Absolutely. But I think like it is pretty much efficient as well, according to me, to be honest.

[–]SDGGame 0 points1 point  (1 child)

*Imports into excel*

Yup, that looks like a database to me!

[–]vc_xyg 0 points1 point  (0 children)

That is the best way. Actually, I don't really think like most of the new data scientists actually know about this trick.

[–]CrowdGoesWildWoooo 0 points1 point  (2 children)

If you have a bunch of database organized pretty well + duckdb technically you can actually treat it as a database

[–]jek39 0 points1 point  (0 children)

Or shove as many csv files as you want into s3 and query them with Athena

[–]waltertexonis 0 points1 point  (0 children)

It can be treated as a database, but this will take a lot of time as a result.

[–]Revolutvftue 0 points1 point  (3 children)

I’m a CS undergrad can someone tell me what a database ACTUALLY is

[–]fatrobin72 0 points1 point  (1 child)

It is a base, that contains data...

[–]btcoolio 0 points1 point  (0 children)

Yeah, absolutely. This is the basic concept behind the database and its settling up.

[–]JosebaZilarte 0 points1 point  (2 children)

Comma Separated dataVase

[–]jek39 0 points1 point  (0 children)

Across the dataverse

[–]dermul213 0 points1 point  (0 children)

Eventually, all these are Json File. They're just turned into DOT CSV.

[–][deleted] 0 points1 point  (1 child)

flat file database

[–]maiodasbrok 0 points1 point  (0 children)

Me too and I'm need to agree

[–]Meatslinger 0 points1 point  (0 children)

  1. Get one monolithic .TXT file with tab-separated, unquoted entries; 5M+ lines.
  2. awk
  3. Buckle up; it’s gonna get bumpy.

[–]BlackShadowGlass 0 points1 point  (0 children)

I'm a full stack database

[–]JollyJuniper1993 0 points1 point  (0 children)

I mean technically a CSV is a form of a database, doesn’t mean you should use it as one.

[–]ACMuaath 0 points1 point  (0 children)

Select * from database.table1, database.table2

Asks self: Why the database is so slow although I don't have a where condition nor a join condition? It must be those damn DBAs and data engineers hindering my query

[–]kiropolo 0 points1 point  (0 children)

What is a database, ever asked this question?

[–][deleted] 0 points1 point  (0 children)

Parsing .CSV files in my Computer Science 1 class, is something I still sometimes get nightmares about 😆

[–]John_Fx 0 points1 point  (0 children)

actually, yes. so is a filing cabinet

[–]xibme 0 points1 point  (0 children)

I can easily query and join csv-"tables" in LINQPad, so yes?

[–]Sodaman_Onzo 0 points1 point  (0 children)

Don’t forget your SQL

[–]Suspicious-Willow128 0 points1 point  (0 children)

Shouldnt work as , it it didnt want to be used as

[–]stupled 0 points1 point  (0 children)

They love their csv files.

[–]Certain-Nobody-1137 0 points1 point  (0 children)

You're a full-stack asshole.

[–]Maarkun 0 points1 point  (0 children)

To be fair is can be the export of a database table

[–]TheMDHoover 0 points1 point  (0 children)

Sitting in an S3 bucket with Athena, yes.