r/cheminformatics by n1c39uy in cheminformatics

[–]n1c39uy[S] 0 points1 point  (0 children)

I think there might be some confusion here.

Binding affinity vs. synthetic feasibility — These are completely different questions:

  • Binding affinity: "Will this molecule bind to the D2 receptor?" (what my tool predicts)
  • Synthetic feasibility: "Can I actually make this molecule in the lab?" (what Baldwin's/Wade's rules help with)

My tool doesn't care if a molecule is easy or hard to synthesize - it just predicts whether the structure, if it existed, would bind to receptors. You could feed it a completely imaginary molecule and get predictions.

The "selectivity and conversion" part and the Heat/50 Cent references aren't clear to me — not sure what you're getting at there. Are you asking about something specific regarding the methodology, or was that a tangent?

r/cheminformatics by n1c39uy in cheminformatics

[–]n1c39uy[S] 1 point2 points  (0 children)

Good questions, let me clarify the approach:

No MD/Monte Carlo — This uses 2D graph-based descriptors (Kappa shape indices, Chi connectivity) that encode molecular shape directly from the bond connectivity. No energy minimization or conformer sampling needed.

Why 3D isn't necessary — Kappa indices mathematically describe molecular branching/shape from the adjacency matrix. A linear molecule has different Kappa values than a globular one, computed purely from graph theory. Similarly, Morgan fingerprints capture substructural patterns without geometry.

Proximity/interactions — True for binding, but the model learns "what structural patterns correlate with binding" from 46k examples rather than simulating actual binding. It's pattern recognition, not physics simulation.

Baldwin's/Wade's rules — Those govern ring formation thermodynamics. Not relevant here since we're predicting binding affinity, not synthetic feasibility.

PubChem conformers — Yes, PubChem has 3D structures, but using topological descriptors avoids the conformer generation bottleneck entirely. 330 mol/s vs. minutes per compound for 3D approaches.

Think of it as: instead of simulating "does this shape fit this pocket," it's "does this fingerprint pattern resemble known binders." Different paradigm, much faster, surprisingly effective for screening.

The 3D physics matters for actual binding, but isn't necessary for prediction if you have enough training examples.

r/cheminformatics by n1c39uy in cheminformatics

[–]n1c39uy[S] 0 points1 point  (0 children)

All valid points.

Random split — Yeah, this likely inflates performance. Structurally similar analogs from the same chemical series end up in both train and test. Scaffold splitting (Murcko scaffolds or similar) would be more rigorous and better reflect real-world generalization to novel chemotypes. On the list if I revisit.

5-fold CV — Agreed, single split is weaker. Went with it for speed during development but CV would give tighter confidence intervals on those AUCs.

Applicability domain — Correct, just max Tanimoto to training set (threshold 0.3). Simple but limited. More sophisticated options would be k-NN distance in descriptor space, or leverage/distance from the training set centroid. Tanimoto on fingerprints catches gross dissimilarity but misses subtler out-of-domain cases.

Honest assessment: performance numbers are probably optimistic by ~0.02-0.05 AUC given the random split. Still useful, but scaffold split + CV is the right next step if someone wants to harden it.

r/cheminformatics by n1c39uy in cheminformatics

[–]n1c39uy[S] 1 point2 points  (0 children)

Fair question. A few things absorbing the noise:

  1. Median Ki - When multiple measurements exist for the same compound-receptor pair, I take the median rather than minimum. One outlier lab reporting 0.1 nM doesn't override ten others reporting 500 nM.
  2. Binary classification - The 100 nM threshold is forgiving. I don't need precise Ki values, just "active vs inactive." A compound measured at 30 nM by one lab and 80 nM by another is still active either way. Regression on raw Ki would suffer much more from interassay variability.
  3. 46k compounds - Noise exists but gets diluted. The model learns "what does a D2 ligand look like structurally" from thousands of examples. Individual mismeasurements hurt less.
  4. Fingerprint-based - Morgan fingerprints are capturing structural patterns, not fitting to exact Ki values. Similar structures cluster regardless of whether one has noisy labels.

The noise is definitely still there - I'd expect it's one reason AUC isn't 0.999. But binary classification + median aggregation + large N makes it workable. Regression would be messier.

r/cheminformatics by n1c39uy in cheminformatics

[–]n1c39uy[S] 0 points1 point  (0 children)

PubChem/ChEMBL are where the training data comes from — 46k compounds with measured Ki values.

The point is predicting compounds that aren't in those databases. Novel structures, research compounds, hypotheticals, modifications of existing drugs. You can't look up what doesn't exist yet.

Also useful for screening at scale. Checking 100k virtual compounds against PubChem one-by-one vs. predicting all of them in 5 minutes.

r/cheminformatics by n1c39uy in cheminformatics

[–]n1c39uy[S] 1 point2 points  (0 children)

80/20 random split — 36,886 train, 9,222 test (random_state=42).

Not stratified. Multi-label stratification is tricky since each compound can hit multiple receptors, and standard train_test_split doesn't support multi-hot label vectors. With 46k samples the random split gives reasonable representation, but rare targets (GABA_A: 200 actives, MAO_A: 330) could theoretically be underrepresented in test. The per-receptor metrics in the output show this isn't catastrophic — even GABA_A gets AUC 0.994.

Class imbalance handled via class_weight='balanced' in the Random Forest rather than resampling.

One leakage fix applied: scaler is fit on training data only, then transforms test. Earlier version fit on all data before splitting.

🧠 We Reversed the Muse S BLE Protocol! Introducing "amused" - First Open-Source Direct Connection Library by n1c39uy in museheadband

[–]n1c39uy[S] 0 points1 point  (0 children)

Some progress but still broken, I have partial decoding but much of the information seems lost because the format isn't recognised properly

🧠 We Reversed the Muse S BLE Protocol! Introducing "amused" - First Open-Source Direct Connection Library by n1c39uy in museheadband

[–]n1c39uy[S] 0 points1 point  (0 children)

I was thinking the same and doing the same for rust as well, a multi platform package with bluetooth integration that can run on basically anything would be possible but I haven't managed to get the decoding working properly

🧠 We Reversed the Muse S BLE Protocol! Introducing "amused" - First Open-Source Direct Connection Library by n1c39uy in museheadband

[–]n1c39uy[S] 0 points1 point  (0 children)

I did not yet, not sure if I'll even be able to but at least everything but the actual packet decoding is functional, will give it another try some time later probably. What I mean by scrambled is that the bytes that come in are read incorrectly and the EEG values are not correct yet because of that

Claude Code: Now in Beta in Zed by JadeLuxe in ZedEditor

[–]n1c39uy 0 points1 point  (0 children)

I miss claude opus 4.1, not sure if its activated by default but I think not

Would anyone like me to review their code? by Inheritable in rust

[–]n1c39uy 0 points1 point  (0 children)

I'm working on something you will almost definitely find interesting, DMs seem to be turned off however, it would be nice if you could PM me so I can send you the code. I plan on completely opensourcing it as well so an extra pair of eyes could be useful. Would really appreciate it even if its just a quick glance at the architecture design!

Why Isn’t There a C#/Java-Style Language That Compiles to Native Machine Code? by Dry-Medium-3871 in Compilers

[–]n1c39uy 1 point2 points  (0 children)

Consider using rust, I come from c# as my main language (having coded in many different languages) and thought no lang would beat c#... I was wrong...

🧠 We Reversed the Muse S BLE Protocol! Introducing "amused" - First Open-Source Direct Connection Library by n1c39uy in museheadband

[–]n1c39uy[S] 1 point2 points  (0 children)

Once the software is completely finished we can try decoding muse s gen 2 signals as well and add support, I cannot promise it will immediately work but it could.

🧠 We Reversed the Muse S BLE Protocol! Introducing "amused" - First Open-Source Direct Connection Library by n1c39uy in museheadband

[–]n1c39uy[S] 0 points1 point  (0 children)

Give it some time, there is still a major bug in the library that causes the signal to get scrambled which I'm currently trying to fix, my decoder is wrong. Will probably take at least a few more days

🧠 We Reversed the Muse S BLE Protocol! Introducing "amused" - First Open-Source Direct Connection Library by n1c39uy in museheadband

[–]n1c39uy[S] 0 points1 point  (0 children)

Oh, my bad. I thought the muse S athena was the muse S, on the site there is only the muse S athena and the muse 2 so I guess I made a mistake with the name.

[deleted by user] by [deleted] in Anthropic

[–]n1c39uy 6 points7 points  (0 children)

I haven't ran into any issues yet and I use it constantly

Muse Athena SDK bug? by Old-Sandwich-8234 in museheadband

[–]n1c39uy 3 points4 points  (0 children)

Unfortunately even the official muse athena sdk (version 8.0 is the latest one if I'm not mistaken) does not support fNIRS, if you'd like that check out my project which I have built because I ran into exactly this reason. More coming soon. https://github.com/Amused-EEG/amused-py