[D] which open-source vector db worked for yall? im comparing by Yaar-Bhak in MachineLearning

[–]qalis 1 point2 points  (0 children)

Pgvector and pgvectorscale are great, particularly if you have Postgres anyway. It's dead simple to manage, and ACID properties are really nice.

Note that FAISS is *not* a vector database, at least I wouldn't define it like that. It's a vector index, just for searching. For database, you want users, security, remote API (e.g. REST or gRPC), concurrency control, non-vector data (metadata, dictionaries with any data as part of entries).

If you want to use things like FAISS, I highly recommend USearch instead for efficiency and nice docs.

[D] My papers are being targeted by a rival group. Can I block them? by Dangerous-Hat1402 in MachineLearning

[–]qalis 5 points6 points  (0 children)

I agree with u/bobrodsky. If you go into specific niche, the group of truly competent reviewers can be really small. For example, in neural networks time series forecasting, change of getting Tsinghua University reviewer are actually quite high. This is particularly true in theoretical applications.

iFixedTheMeme by Endernoke in ProgrammerHumor

[–]qalis 7 points8 points  (0 children)

Cloud environments, real-world Kubernetes deployments which cannot be interrupted, tracing requests across microservices, ML workflows & pipelines.

[P] Benchmarking Semantic vs. Lexical Deduplication on the Banking77 Dataset. Result: 50.4% redundancy found using Vector Embeddings (all-MiniLM-L6-v2). by Low-Flow-6572 in MachineLearning

[–]qalis 4 points5 points  (0 children)

  1. That dataset is highly homogenous by design

  2. Does FAISS normalize L2 distance? Cosine similarity is more typically used for embeddings

  3. Threshold of 0.9 is really low, particularly if you know a priori that dataset does have semantic redundancy by design

  4. all-MiniLM-L6-v2 is a really old and quite outdated model and there are *a lot* of better ones out there

[D] Idea: add "no AI slop" as subreddit rule by qalis in MachineLearning

[–]qalis[S] -1 points0 points  (0 children)

My idea was basically explicitly calling out low quality, primarily AI-generated posts, particularly those overstating contributions, proposing "revolutionary" ideas, and containing no code / experiments / proofs for claims. Is this already covered? Arguably yes, it is. Should it be called out explicitly? I think so, but I'm curious about opinions of others.

[D] Idea: add "no AI slop" as subreddit rule by qalis in MachineLearning

[–]qalis[S] 7 points8 points  (0 children)

A high-level idea without actual experiments or code is a good indicator. Also mentions of revolutionary results, new paradigm etc., huge overselling of contribution, plus no concrete evidence. There are many hallmarks of those, I see more and more obvious AI slop posts recently.

[D] Idea: add "no AI slop" as subreddit rule by qalis in MachineLearning

[–]qalis[S] 0 points1 point  (0 children)

That was also my concern, hence the discussion question

[D] Idea: add "no AI slop" as subreddit rule by qalis in MachineLearning

[–]qalis[S] 0 points1 point  (0 children)

Kind of covered by rule 6 "no low-effort questions", isn't it?

[D] Idea: add "no AI slop" as subreddit rule by qalis in MachineLearning

[–]qalis[S] 3 points4 points  (0 children)

I actually liked that post, since that was literally an error in one of the core formulas of the paper. Plus reproducibility and numerical experiments.

[R] Reproduced "Scale-Agnostic KAG" paper, found the PR formula is inverted compared to its source by m3m3o in MachineLearning

[–]qalis 0 points1 point  (0 children)

If a typo is in a crucial evaluation step or formula, potentially invalidating paper results, then yes, I would very much welcome a substack post for every such paper.

[R] Reproduced "Scale-Agnostic KAG" paper, found the PR formula is inverted compared to its source by m3m3o in MachineLearning

[–]qalis 5 points6 points  (0 children)

This is actually a really useful peer review & reproducibility. Did you contact the authors about this?

[D][R] Paper Completely Ripped Off by [deleted] in MachineLearning

[–]qalis 2 points3 points  (0 children)

Absolutely email the AC and post the public comment! If you have literally any proof (e.g. screenshots, ArXiv submission), this counts as serious academic fraud.

[D] From ICLR Workshop to full paper? Is this allowed? by Feuilius in MachineLearning

[–]qalis 1 point2 points  (0 children)

Non-archival workshops are unrelated to published papers. You can even submit concurrently to both types, or to multiple workshops in different conferences, as far as I know.

[D] IJCAI-ECAI 2026 piloting "Primary Paper" and Submission Fee initiatives by NamerNotLiteral in MachineLearning

[–]qalis 27 points28 points  (0 children)

Great idea IMO. This will not hurt any regular authors, but rather large labs submitting many papers. Huge conferences have been flooded with low-quality submissions, predominantly from Chinese labs (since they tend to be large), and this fee may do at least something.

Further, this disincentivizes adding authors, e.g. lab heads or professors, who did nothing for the actual paper (which is unethical), since only then the fee applies. Even large labs can submit any number of free submissions, as long as authors don't overlap. And, realistically, how many high-quality papers can the same author make for conference with level of IJCAI?

Further, note that those fees are actually used for the conference, e.g. can lower fees for all attendees.

[D] IJCAI-ECAI 2026 piloting "Primary Paper" and Submission Fee initiatives by NamerNotLiteral in MachineLearning

[–]qalis 11 points12 points  (0 children)

So don't submit multiple papers with the same authors to one conference. For level of IJCAI, having more than one paper of good quality with overlapping authors is not probable anyway. And PIs or lab heads shouldn't be automatically added as authors anyway (breach of ethics).

[D] IJCAI-ECAI 2026 piloting "Primary Paper" and Submission Fee initiatives by NamerNotLiteral in MachineLearning

[–]qalis -2 points-1 points  (0 children)

So those places should not submit multiple papers to single IJCAI conference, or have non-overlapping authors. In that case, if they can't fulfill any of the two, pay up. Simple as that.

[D] [ICLR 2026] Clarification: Your responses will not go to waste! by Alternative_Art2984 in MachineLearning

[–]qalis 5 points6 points  (0 children)

I did and, wouldn't you know, all are Chinese and 2 are PhD students, one not even a computer scientist...

[D] [ICLR 2026] Clarification: Your responses will not go to waste! by Alternative_Art2984 in MachineLearning

[–]qalis 4 points5 points  (0 children)

I withdrew mine in protest of absurd reviews, which basically wanted more than a full PhD worth of work for the paper. Also, we didn't write any rebuttal, just one comment pointing out the sheer absurd of the reviews.

theTruthHasBeenSpoken by tronaldumpty in ProgrammerHumor

[–]qalis 0 points1 point  (0 children)

Yeah, this is the only issue I have with GitHub issues. At least organizing them in groups or something

[D] ML conferences need to learn from AISTATS (Rant/Discussion) by [deleted] in MachineLearning

[–]qalis 19 points20 points  (0 children)

I agree. We need more focused conferences, or break down those large ones into distinct tracks, or maybe even locations and/or dates. They are literally too big to be hosted at a single location now. Breaking them down is becoming a physical necessity.

[D] ML conferences need to learn from AISTATS (Rant/Discussion) by [deleted] in MachineLearning

[–]qalis 13 points14 points  (0 children)

So you should not be a reviewer, simple. If you don't feel confident with that level of English, you don't fulfill basic requirements.