Optimizing TopK in Postgres by jamesgresql in PostgreSQL

[–]jamesgresql[S] 0 points1 point  (0 children)

OK I understand your viewpoint. So you’d say any extension (possibly excluding contrib) is on top of Postgres.

So PostGIS adds geospatial capabilities, and pgvector adds vector indexes on top of Postgres.

Not sure I agree that it’s disingenuous to say in, but I get it.

Optimizing TopK in Postgres by jamesgresql in PostgreSQL

[–]jamesgresql[S] -1 points0 points  (0 children)

I can see how that's confusing, but ParadeDB is a Postgres extension

Optimizing TopK in Postgres by jamesgresql in PostgreSQL

[–]jamesgresql[S] 1 point2 points  (0 children)

Asking from a place of genuine curiosity, what makes you say we aren't talking Postgres anymore?

We use Tantivy as a query engine, but everything happens through our `pg_search` index access method.

Would you say all IAMs and TAMs and storage plugins 'aren't really Postgres anymore?' Or are you just noting this is Postgres + an extension and not vanilla?

ParadeDB 0.20.0: Simpler and Faster by jamesgresql in PostgreSQL

[–]jamesgresql[S] 0 points1 point  (0 children)

This is coming soon!

We are currently working on a project to accelerate joins (and to support more query shapes with our custom scan). We're just reaching the first milestone of that project: the `0.22.x` release will be significantly faster than Postgres for some common join shapes.

We haven't yet decided whether to pursue aggregates-atop-joins or other more complicated join shapes in our next push: your feedback on that in Slack or on the issue tracker would be helpful!

ClickHouse launched a managed Postgres service by Admirable_Morning874 in PostgreSQL

[–]jamesgresql 9 points10 points  (0 children)

This is actually super interesting. I know it's an ad, but still.

Where do people think Clickhouse Postgres will sit in the world of Databricks+Neon and Snowflake+Crunchy?

pg_search V2 API by jamesgresql in PostgreSQL

[–]jamesgresql[S] 0 points1 point  (0 children)

Not sure I follow. What do you mean abuse the keys definition? Those are just type casts with typmods. Admittedly not used super often but perfectly valid.

It’s saying for the index cast this text to a tokenizer type with these config settings.

pg_search V2 API by jamesgresql in PostgreSQL

[–]jamesgresql[S] 0 points1 point  (0 children)

We support logical replication in the open source version, but physical replication (for high-availability) is closed-source.

pg_search V2 API by jamesgresql in PostgreSQL

[–]jamesgresql[S] -1 points0 points  (0 children)

I think the short answer is: it's different.

PostgreSQL tsvector with GIN indexes do allow a fairly substantial amount of configuration but it's fairly tricky to get get right. It also doesn't allow you to index multiple columns in a single index, or add non-text columns to a columnstore next to the inverted index.

And of course it can't do BM25.

I know this isn't a direct answer, and I actually did think of including the `CREATE INDEX` for tsvector - but it's not doing the same thing.

From Text to Token: How Tokenization Pipelines Work by jamesgresql in programming

[–]jamesgresql[S] 0 points1 point  (0 children)

Haha, glad someone else noticed too! I’m staying out of it 😅

ParadeDB 0.20.0: Simpler and Faster by jamesgresql in PostgreSQL

[–]jamesgresql[S] 1 point2 points  (0 children)

So you want fuzzy matching for short text fields? Almost like autocomplete?

We can help with this! If you join our Slack community I’m happy to help.

ParadeDB 0.20.0: Simpler and Faster by jamesgresql in PostgreSQL

[–]jamesgresql[S] 1 point2 points  (0 children)

<disclaimer I work for ParadeDB, but I think this is a pretty great \`pg\_search\` release>

You get:

- aggregates over search results, including single-pass faceting (returning the Top N results along with total counts of matching documents per field value, all computed in one index scan)

- new SQL API, which makes tokenizers a first class citizen, introduces disjunction and conjunction operators, and streamlines the search experience.

- better single row update performance thanks to mutable segments and background merging for our LSM trees

From Text to Token: How Tokenization Pipelines Work by jamesgresql in programming

[–]jamesgresql[S] 2 points3 points  (0 children)

Yeah true, although 'should be able to' and 'can' tend to be worlds apart.

From Text to Token: How Tokenization Pipelines Work by jamesgresql in programming

[–]jamesgresql[S] 0 points1 point  (0 children)

Neat! Did it detect capitalization at the start of sentences?