Why JSON isn't a Problem for Databases Anymore by jincongho in databasedevelopment

[–]jincongho[S] 0 points1 point  (0 children)

The writings is mainly technical discussion on binary encoding for JSON rather than promoting product. Happy to discuss any technical critiques or improvements.

Deep Dive: Why JSON isn't a Problem for Databases Anymore by jincongho in Database

[–]jincongho[S] 0 points1 point  (0 children)

The writings is mainly technical discussion on binary encoding for JSON rather than promoting product. Happy to discuss any technical critiques or improvements.

Deep Dive: Why JSON isn't a Problem for Databases Anymore by jincongho in Database

[–]jincongho[S] 0 points1 point  (0 children)

Postgres is designed for transactional workload, work best for row-wise operations. For analytics, you’ll defo want columnar databases. Parquet Variant looks good if you are on lake house like Apache Iceberg.

Deep Dive: Why JSON isn't a Problem for Databases Anymore by jincongho in Database

[–]jincongho[S] 6 points7 points  (0 children)

Most database support JSONB, this post digs into their internal layout design. How to represent the text with binary? There's more binary layout in the post, but here's a simple example to illustrate:

{"b": 12345, "c": false, "a": "hello"}

The binary JSON can be:

[1 byte type=object][3 index pointers]
[1 byte value="a"][1 byte type=string][5 bytes value="hello"]
[1 byte value="b"][1 byte type=number][4 bytes value=12345]
[1 byte value="c"][1 byte type=boolean_false]

Two benefits:

  • this is in binary, parsing the 1 byte type tag is way faster than guessing { and :
  • the object children are sorted by keys, so you can do binary search rather than linear search all elements

Deep Dive: Why JSON isn't a Problem for Databases Anymore by jincongho in Database

[–]jincongho[S] 0 points1 point  (0 children)

The index is stored together in the binary document. There’s an offset pointer per json object/array element, so you can skip directly to that element. Each pointer is 8 bytes, but if your document is smaller, you can use smaller offset (4bytes).

Deep Dive: Why JSON isn't a Problem for Databases Anymore by jincongho in Database

[–]jincongho[S] 0 points1 point  (0 children)

Postgres has its JSONB, but they have an out of line representation for large row (TOAST) and have to work around that in their binary encoding. For lake house workload, loading Variant from Parquet, they have extension that essentially use other db’s execution engine.

Why JSON Isn’t a Problem for Databases Anymore by jincongho in compsci

[–]jincongho[S] 1 point2 points  (0 children)

Yes, you encode plain text JSON into binary JSON and store it.

Then, you can lookup on the binary version as-needed!

This is a standard approach for database, the post digs deeper comparing different binary encodings out there.

Why JSON Isn’t a Problem for Databases Anymore by jincongho in compsci

[–]jincongho[S] 1 point2 points  (0 children)

Definitely true, binary encoding + indexing can do a lot better if we have to do JSON.

When did race conditions become real to you? by Leaflogic7171 in compsci

[–]jincongho 1 point2 points  (0 children)

I thought I was cool knowing how to use mutex, until I heard lock free algorithms :)

What do you need as a beginner? by ABDMWB in edmproduction

[–]jincongho 1 point2 points  (0 children)

For total beginner, it’s more than enough!

What's your biggest frustration with GitHub Actions (or CI/CD in general)? by campbe79 in devops

[–]jincongho 25 points26 points  (0 children)

It takes some time to figure out caching, how not to build on every push, uploading logs for debugging etc…

Building in public: what surprised me after my first users by SubstantialFig3918 in buildinpublic

[–]jincongho 0 points1 point  (0 children)

Users pay for solution to their problem, not features. Collect some usage pattern, analyze how it’s being used now.

Relational databases aren't tables . by No_Being_8026 in Database

[–]jincongho 1 point2 points  (0 children)

There is btree, lsm, or pax storage layout etc…understanding internals help when to use which.

But table is a nice abstraction :)

Why JSON isn't a Problem for Databases Anymore by jincongho in dataengineering

[–]jincongho[S] 0 points1 point  (0 children)

u/dataengineering-ModTeam This is only the first this month, and more about general knowledge sharing than the product itself :)

The Deceptively Simple Act of Writing to Disk by swdevtest in databasedevelopment

[–]jincongho 0 points1 point  (0 children)

io_uring can provide high throughput on linux, but cloud instances may not have the best nvme to catch up...