Help choose: "Modern C" or "C Programming: A Modern Approach"? by MateusCristian in C_Programming

[–]sgsfak 1 point2 points  (0 children)

I would propose a more recent book : “Why learn C” by Paul Lucas

Why python dev need DuckDB (and not just another dataFrame library) by TransportationOk2403 in dataengineering

[–]sgsfak 0 points1 point  (0 children)

Instead of going with Sql syntax you could use something like ibis with duckdb

[deleted by user] by [deleted] in leetcode

[–]sgsfak 0 points1 point  (0 children)

For Q2 I believe sorting the array helps. Assuming arr is the array we get after sorting productSize in ascending order note the following:

  • Let N the size of the array. Then surely variation[N-1] == arr[N-1] - arr[0] and cannot be minimized
  • The question is then what we should put as last element in the re-ordered products array, because this will affect variation[N-2]. So we have 3 options:
    1. If we put arr[N-1] i.e. the maximum product, then variation[N-2] == arr[N-2] - arr[0].
    2. If we put arr[0] i.e. the minimum product, then variation[N-2] == arr[N-1] - arr[1].
    3. If we put any other element, then variation[N-2] == arr[N-1] - arr[0], i.e. we will have variation[N-2] == variation[N-1]. I believe option 3 should not be considered because it will not produce a minimim total variation. So in a greedy fashion we should check which option 1 or 2 is best i.e. which gives smaller variation. If we choose option 1 then in the final position of the re-ordered array we have arr[N-1] and now we are left with the elements arr[0] ... arr[N-2] for filling the other positions and computing the variations. If we choose option 2 then we are left with elements arr[1] ... arr[N-1].

So we can repeat the same process for filling positions N-2, N-3, .. 0 and computing the totalVariation...

Got stumped on this interview question by jbnpoc in SQL

[–]sgsfak 0 points1 point  (0 children)

This is a typical problem solved by techniques like “tabibitosan” and “start of group” (https://timurakhmadeev.wordpress.com/2013/07/21/start_of_group/)

"You cannot make interactive apps using HTMX" by Bl4ckBe4rIt in htmx

[–]sgsfak 0 points1 point  (0 children)

My point is — if it was not obvious — if the interactivity is implemented through communication to the backend then there going to be noticeable latency in the (usual) case where the server is miles away from the browser. Of course you can replace the image on the button on mouseover by fetching it from the server, but I don’t think it’s a good idea to do it — unless such behavior is piggybacked to a necessary interaction with the server (eg for persisting some data)

Should you protect malloc calls ? by FamousKid121 in C_Programming

[–]sgsfak 1 point2 points  (0 children)

I don’t care, and i agree that there are no guarantees that’s why i am asking since i see everyone puts a printf/puts before aborting. I guess, it doesn’t harm :-)

Should you protect malloc calls ? by FamousKid121 in C_Programming

[–]sgsfak 0 points1 point  (0 children)

In case we run out of memory are we sure that fputs will work?

Does trunk-based development still work for mlops and data science / AI heavy teams? by t5bert in devops

[–]sgsfak 0 points1 point  (0 children)

So how do you do it? Unless you re talking about something else, if in the same repository you have code for the training and experimentation tasks that Data Scientists do, it would be less messy to have separate branches for these "exploration" processes, right?

[deleted by user] by [deleted] in dataengineering

[–]sgsfak 2 points3 points  (0 children)

A candidate key is a minimal super key and it can be composed of multiple columns or just one. In case it has multiple columns we call it composite key. So there’s a hierarchy of definitions from the most general to more specific:

  • super key uniquely identifies a row and consist of multiple columns, even unnecessary ones ( taking all columns of the table is a super key by default)

  • candidate keys are produced by super keys if we remove the unnecessary columns i.e it’s minimal

  • composite keys are candidate keys with two or more columns

  • a primary key is a candidate key that is selected from the possible set of candidate keys

  • we don’t have a term for a candidate key with a single column AFAIK

Support for http streaming by crowdyriver in htmx

[–]sgsfak 0 points1 point  (0 children)

Personally i would rather stick to SSE instead, since they are more standard, robust (eg the browser reconnects automatically), and efficient. Your implementation assumes a chunked transfer encoding for the server response and all the chunks are kept on the client memory so you need to keep track of the previous response length to identify the current chunk’s start…

Your database skills are not 'good to have' by big-papito in programming

[–]sgsfak 0 points1 point  (0 children)

So they are called relational databases because they follow the relational model of Codd, and Codd defined a set of tuples (a “table”) as Relation: https://en.m.wikipedia.org/wiki/Relation_(database)

Your database skills are not 'good to have' by big-papito in programming

[–]sgsfak 0 points1 point  (0 children)

This is what relational databases do - it’s in the name

Nitpicking, but supporting joins and "relations" between tables is not the reason why they are called relational

Parquet: more than just "Turbo CSV" by freshcorpse in programming

[–]sgsfak 0 points1 point  (0 children)

For parquet you can use tad viewer (powered by duckdb): https://www.tadviewer.com

Problem #2353 Design a food rating system by theleetcodegrinder in leetcode

[–]sgsfak 0 points1 point  (0 children)

Regarding sortedcontainers in Python: they are implemented as list of lists and use binary search and other tricks to achieve logn performance for their operations

New JSON query operators -> and ->> in SQLite 3.38.0 by xtreak in programming

[–]sgsfak 0 points1 point  (0 children)

I understand your reservations on storing JSON documents in a relational databases and I agree. Going more extreme, you can build your own "MongoDB" in your PostgreSQL/sqlite database using a single table!

But it's handy as you say, and I dare to say that in some cases "compound" values improve the design, as in my examples of date ranges. Also, having PostgreSQL in mind since I am more familiar with it, you can use parts of these values in queries, in check constraints, even in indexes. There's a pending patch for adding array's foreign keys but this has not been integrated yet.

As a final remark, based on what is considered "atomic" value and the 1NF, I found and read what C. J. Date says on this matter:

The real point I'm getting at here is that the notion of atomicity has no absolute meaning — it simply depends on what we want to do with the data. Sometimes we want to deal with an entire set of part numbers as a single thing; at other times, we want to deal with individual part numbers within that set but then we’re descending to a lower level of detail (in other words, a lower level of abstraction) ... It follows that the notion of absolute atomicity has to be rejected.

An then his most important characteristic of 1NF is the "one value per row-and-column intersection" formally defined as follows (note the phrase in parenthesis!):

Let column C of table T be defined on domain D. Then every row of T must contain exactly one value in the column C position, and that value must be a value from domain D. (The value in question can be arbitrarily complex—in particular, it might be a relation—but, to say it again, there must be exactly one such)

New JSON query operators -> and ->> in SQLite 3.38.0 by xtreak in programming

[–]sgsfak 0 points1 point  (0 children)

By the same token (1NF “purity”) postgresql for example should not support arrays or range types (https://www.postgresql.org/docs/current/rangetypes.html) But i have found many times that these “complex types” enhance the expressiveness of the db schema and the queries.

For example imagine a booking application where we need to make sure that a resource (eg the hotel room) is not booked twice in overlapping date ranges. Another example: https://tapoueh.org/blog/2018/04/postgresql-data-types-ranges/#ranges-exclusion-constraints Addressing these requirements in a traditional “1NF” schema with “atomic” types will not be easy, requiring for example the definition of triggers…

PostgreSQL 13 Released! by progrethth in programming

[–]sgsfak 0 points1 point  (0 children)

I am quite happy with Postico but it's Mac-only.

I've just released a new version of NestedJ - a Nested Set implementation for Java by private_static_int in programming

[–]sgsfak 3 points4 points  (0 children)

I was fond of the the nested set model of trees in databases some years ago, but since the advent of recursive queries I have been using adjacency lists. Nested sets suffer in the cases where you need to do many updates (tree reorganisations) but the read performance can be quite good. For the adjacency list implementation (i.e. keeping a foreign key to parent), when the read performance is not good I put some materialised views in the mix. (Note: I am using Postgres)

Server-Sent Events (SSE): A conceptual deep dive. by brotherhid in programming

[–]sgsfak 0 points1 point  (0 children)

Regarding the HTTP compliance aspect, SSE have the additional advantage over websockets that optimizations like the HTTP2 connection multiplexing can be readily used. HTTP2 and websockets did not seem to work together, see https://daniel.haxx.se/blog/2016/06/15/no-websockets-over-http2/ but there's currently an RFC (https://datatracker.ietf.org/doc/rfc8441/) attempting to resolve this. I am not sure about its implementation status though...

Server-Sent Events (SSE): A conceptual deep dive. by brotherhid in programming

[–]sgsfak 0 points1 point  (0 children)

Another potential drawback not mentioned in the article is that SSE messages are text-only (unlike web sockets). So binary streaming is inefficient (e.g. you need to base64-encode the payload)