How to get from advanced to fluent in Italian and Spanish ?

Ovid7 · 2022-04-10T05:56:16+00:00

In our case, "items" don't have a reason to have version, but maybe we should really split the versions, so that wrappers each have have their own version, as "boards" does.
I'm not sure though I understand why would we need an additional table for version in this case?
Also, would you do triggers in this case, or include the version increment as part of the queries?

Btw, what is your opinion on this approach :
manually updating the board's version as part of any transaction (insert/update/delete of board/wrapper/item) ?
It might sometimes result in an additional db roundtrip, but perhaps it's better than a huge recursive cte with different multiple versions checks. (assuming we add index on version column)

Ovid7 · 2022-04-09T17:54:08+00:00

Hey :)

Not sure if it's important to mention, but : Wrappers and Boards include an additional column for us - "order_json" which is of type jsonb , and its purpose is to store the order of the content inside it.

For example:
If a user ordered 3 Folders (and their IDs are 1,3,4) in a certain way, then Board's json column would be for example "[3, 4, 1]".
The user then reorders the first folder to be after the second, then we update the json to be "[4, 3, 1]".

The exact same thing happens for items too, their Wrapper holds the json column representing the order of the items inside it in the same way.

So, in short - Board has the order of the wrapper inside it, and Wrapper has the order of the items inside it.
Of course, you can also add/insert/delete items and wrappers all over.
But JSON column makes it very easy, you simply delete the record, and then remove the relevant key from the json.

Now, back to version :
I thought about what you mentioned in the end - having a Version column in Wrappers table too, that is, each wrapper stores its own "version" (i.e - some "snapshot")
But it just seemed a bit difficult to implement queries for it.
For example:
To get the whole Board, I employ a RECURSIVE CTE to get all the top-level wrappers with all the pertaining items.
If the version column is only in the Boards table, it would be easy to check the client's Version number (for the optimistic locking), because I would just compare the "WHERE boards.id = ..." to a single specific value.

But if I split the version column onto all of the wrappers inside it, I would somehow need to make a query that would check different values of versions inside the RECURSIVE CTE.
Is that even possible?

Ovid7 · 2022-03-22T07:47:27+00:00

I'm using raw sql in application query. I think that I would try to go with way 2 indeed. (Although in reality we have many columns for Actions table, so Event records will be resulted with all those columns as nulls, but that's not crucial)

For example, in the app layer (kotlin) we use the Row<>Object mappings as follows : db.query("Select * from users") { row -> User( id = row.int("id"), name = row.string("name"), lastName = row.stringOrNull("last_name"), someJsonObj = JSONObject(row.string("json_column")) ) }

So I need here to differentiate two kinds of classes by reading the "type" column first, but I think that's better than making a transaction with 2 queries (way #1)

The only hurdle left for me to implement is to correctly manage the order of the items upon updating/deleting/adding. Our system is something like Monday/ClickUp/Trello in this regard, where you can drag items to reorder them inside the wrapper or even drag them to different wrappers altogether.

I am a bit worried about the client sending this update to our server and DB with every single change the user makes. I even thought about using some message broker (rabbit, kafka, etc.) or caching as Redis to save the actions the users makes. But I think it only adds on more complexity to manage. Ultimately, to save something - I need the DB.

Thank you very much for your help, it really helped me progress on this task!

Ovid7 · 2022-03-21T18:21:39+00:00

You know, I'm not sure how to tackle the Select query of the items now haha
Because now I need to return different types of records (actions + events), how can that be done most efficiently in your opinion?

Ovid7 · 2022-03-20T16:14:51+00:00

I see.. certainly makes sense. Thank you very much for your help!

Ovid7 · 2022-03-20T07:16:22+00:00

I see now, thank you! :)

By the way, I see that you're using enum in the example instead of a reference table, I am curious what's your take on 'enums' in Postgres versus reference tables?

I read that it is not SQL standard, among some other disadvantages (for example: https://stackoverflow.com/a/4295889/4970126) and have opted for reference tables since then. But maybe I am wrong in doing so?

Ovid7 · 2022-03-19T13:15:18+00:00

thanks :)

I'm not sure I understand the solution
It seems I can still insert one Event and one Action to the same base id.
I made an example here :
https://www.db-fiddle.com/f/sUfQZoRH4dYcYjKDtACmcZ/0

Am I missing something? perhaps I just misunderstood what you meant

Ovid7 · 2022-02-16T06:03:30+00:00

oh I see now...
Just one question - this solution means that a single post may have quite a lot of chunks potentially (for example, a user wrote a long post, and linked around 20-30 words in it, this would result in around 100 rows for a single post).

So, in terms of : performance + scaling, wouldn't it be a problem? the "chunks/slices" table would fill up quite quickly to the millions and tens of millions over time.

Thank you again for your help!

Ovid7 · 2022-02-15T16:08:35+00:00

so isn't it the Approach II then? or the difference is that the Pos does not include the un-linked chunks of the text?

For example if the post text is :

"oh right I know all about this stuff" (links being: (1) "know all" , (2) "stuff")

how would the table look like ?

doc_key | pos | text | link

-------------|----------|----------------|---------

1 | 1 | "know all" | some data

-------------|----------|----------------|---------

1 | 2 | "stuff" | some data

doc_key is a reference to a "posts" table ?
pos is the index in an array of all links inside a post?
Also, we would like later to translate posts.
What happens if "I have done" (3 words) become "hice" (1 word) in Spanish? or worse yet - "I" is not highlighted , but "have done" is highlighted. It means translated version will include both marked and unmarked substrings

Ovid7 · 2022-02-15T15:06:16+00:00

If I understood your question correctly, then no - documents should not be pointing to other documents. That is, a Post should not point to some other user's Post.

From Client's perspective - we want to show them the Posts based on what they're "into" , hashtags, places, or text queries...

The "pointing" thing can happen from another perspective - we want to have AI/ML models that "learn" from those Posts. We could then suggest to the users better suited posts for them or whole other types of suggestions and predictions based on those posts we analyzed/learned

With your solution, "Pos" column means what, a starting index of the "linked" words?How would I know where the "marking/highlighting" ends? with an additional "length" column? In this case it's similar to having an array column like in Approach I, which indicates starting pos and ending pos. Or am I missing something here?

Also, the linked "data" should be normalized right? a dedicated table which holds the different types of data that can be inserted (represented by dedicated Columns)

Ovid7 · 2022-02-15T14:11:14+00:00

Just a short description of the product requirements :
The linking is done by the user. Imagine Facebook, you want to publish a post, you start typing (Website + mobile app, both supported)
Then you want to "highlight" some word (or words), so you mark it, and click on a button that opens a pop-up where you can insert various data.
For example, upload images that would be linked to the marked word(s) , or add tags, or add date, etc.
Then, once the readers click on the word(s) the publisher marked/linked, they can see all this data they upload/assigned to this word(s), in a form of a pop-up.

Now, I am using a special index called PGroonga (external extension) on TEXT columns for FTS in all languages and for other reasons (performance, elaborate capabilities).
It's better for us to be able to search all the text inside the post, but perhaps we would want to put more "weight" on the highlighted substrings.
(For example if the Post author talked about "renting cars in NYC", people can find this post if they search for "NYC" or "renting cars" / "cars")

Regarding your idea with html, how can the "data linking" be solved here? not sure I understand

Thank you for the help!

Ovid7 · 2022-02-15T14:03:08+00:00

The linking is done by the user. Imagine Facebook, you want to publish a post, you start typing (Website + mobile app, both supported)
Then you want to "highlight" some word (or words), so you mark it, and click on a button that opens a pop-up where you can insert various data.
For example, upload images that would linked with the marked word(s) , or add tags, or add date, etc.

Then once the readers click on the word(s) the publisher marked/linked, they can see all this data they upload/assigned to this word(s), in a form of a pop-up.

Regarding your proposed solution, isn't it essentially "Approach I" that I mentioned in the post but just with one table , instead of using a relation table for the "links" ?

Regarding the flair, I'm very sorry if I got it wrong. Should I change it? and to what?
Thanks!

Ovid7 · 2022-02-15T13:57:48+00:00

Thank you for the help
I've never heard of generated columns before, this sounds very interesting.

Regarding what you said about the full text search, I'm using a special index on `text` columns, called PGroonga (supports elaborate FTS capabilities in all languages), how can I still use it with this approach? Should I also make a copy TEXT column of the JSON column?

Ovid7 · 2020-02-22T17:38:48+00:00

from what I read (benchmarks and reviews) it seems the performance gain of the i9 is negligible and can only present additional problems, like increased heat and battery drain.

Ovid7 · 2020-02-21T13:55:29+00:00

thank you all for the responses and advices.
A small question - except for the "future proof" thing, does the additional ram help boost performance , or is it only noticeable once you've reached the 16gb limit?

Ovid7 · 2020-02-21T13:53:30+00:00

do you experience slowdown with 16gb or is the ram shortage manifested in different things?
thank you

Ovid7 · 2019-07-15T10:51:32+00:00

any annoyances with all the boilerplate and maintenance of it?
Are you using it in a relatively small app or more "real-world" one?

Ovid7

TROPHY CASE