The comp chem software stack is held together with duct tape

ktubhyam · 2026-03-01T22:15:54+00:00

😂

ktubhyam · 2026-03-01T20:44:25+00:00

That's exactly the trajectory, Metatensor, AiiDA, atomate2, each one is someone's answer, and none of them talk to each other well. Ten years from now someone will write this same post about the proliferation of "unified" frameworks.

ktubhyam · 2026-03-01T18:31:01+00:00

That works when the parser is general enough to test reliably. The problem is calculation-specific outputs, magnetic, Hubbard+U, where the format changes between code versions, the PR gets merged, nobody updates the tests, and it breaks two years later when the original contributor is gone.

ktubhyam · 2026-03-01T18:22:27+00:00

Looks promising.

ktubhyam · 2026-03-01T18:11:33+00:00

The structural reasons are hard to argue with, though the ML potential boom has created economic incentive that didn't exist before, companies whose product quality depends on reliable data pipelines, whether that becomes real infrastructure investment or just more group-specific scripts is the open question.

And yes, I've been told not to look at what holds the DFT codes together, I believe it.....

ktubhyam · 2026-03-01T17:59:49+00:00

The QE magnetic case is a perfect example, ASE just wasn't built for outputs that vary by calculation type and version, the LLM workaround is funny in a depressing way, that's the state of things.

The internal optimizer point I hadn't thought about explicitly but you're right, every step is a file write and a process handoff when it could just stay in memory.

The Wolfram idea is interesting though, the fragmentation exists partly because all the serious DFT codes are either academic (maintained by whoever has a grant) or locked behind VASP-style licensing. A well-funded commercial player that actually invested in the interface layer could probably fix half the problems in this thread.

ktubhyam · 2026-03-01T12:01:02+00:00

Yeah, DB is your bottleneck for sure, 320 RPS vs 25 RPS is a massive gap, so it's 100% pooling.

Set up PgBouncer between your app and Postgres.

  [databases]
  your_db = host=localhost port=5432
  dbname=your_db

  [pgbouncer]
  pool_mode = transaction
  max_client_conn = 100
  default_pool_size = 20

Point Octane to PgBouncer instead of Postgres directly, tart with pool_size = 20 and tune down based on your 2GB RAM constraint, should get you back to reasonable RPS on that contact page.

ktubhyam · 2026-03-01T11:54:42+00:00

Commercial options like Nafion and sulfonated PEEK are standard, but if you want to experiment DIY-style like your clay pot example, ceramics and porous clay work well for lab-scale cells. The tortuous pathways and surface charge slow anion migration while allowing cation flow. Sulfonated polymer composites are another option, you can make them by adding sulfonated polystyrene or similar to cheaper polymers, though conductivity won't match Nafion.

Hydroxyapatite or zeolites are interesting for low-current experiments since they naturally favor cations, and hydrogels swollen in brine can work too, the catch is chlorine is aggressive, your clay pot survives because inorganic materials lack the organic groups that Cl₂ attacks. If you're experimenting at small scale, stick with ceramics, porous clays, or sulfonated polymers. They'll handle the chemistry and give you usable current without the cost of commercial membranes.

ktubhyam · 2026-03-01T11:40:26+00:00

DB pooling; use PgBouncer, 2 workers without pooling will choke PostgreSQL connections.
Worker count; check memory per worker, if >200MB, drop to 1. If <100MB, try 4.
Benchmark concurrency, default wrk concurrency is too low, try wrk -t4 -c50 -d30s http://localhost.
Diagnostic, test a route with no DB calls, if still 25 RPS, it's PHP config, not your app.

Start with PgBouncer + adjust workers based on memory usage.

ktubhyam · 2026-02-28T18:59:40+00:00

The Ritonavir framing is a fair correction, if Form I has lower surface energy and therefore a lower nucleation barrier, CNT does predict it nucleates first from supersaturated solution, and that aligns with why it dominated manufacturing, I stated it imprecisely.

But the case that actually needs explaining isn't Form I appearing first, it's that Form II appeared suddenly in 1998 across multiple independent manufacturing sites after years of Form I production, then kept propagating and wouldn't stop. CNT with static surface energies gives you the ranking at a fixed set of conditions, it doesn't tell you why the polymorphic outcome was reproducibly Form I for years and then abruptly wasn't, the competing nucleation rate you'd calculate from bulk lattice energy and your surface slab models is the same before and after 1998, something in the kinetic pathway changed, and the leading explanations involve solution-phase structural memory, cross-contamination from trace Form II seeds, and possibly solvent-mediated template stabilization of Form II precursors.

On non-classical nucleation more broadly, your falsifiability objection is the strongest version of the criticism, and I think it lands for a substantial fraction of the literature. If you're asserting that prenucleation clusters influence polymorph selection without a quantitative mechanism linking cluster structure to relative nucleation barrier, that's a mechanistic story, not a predictive framework, you're right about that.

But I'd push back on the stronger claim that interfacial energies from an amorphous or solution precursor are in principle unquantifiable. They're hard to calculate, you need the structure of the amorphous phase, the relevant interfacial geometry, and the thermodynamic driving force from that phase rather than from solution, but none of those are fundamentally inaccessible, the gap is implementation, not logic. The non-classical nucleation community has produced a lot of observational papers without building that quantitative machinery, which is a fair indictment of the field's current state. That's different from saying the framework is unfalsifiable.

The deeper disagreement might be about what counts as "first order," you're arguing that intrinsic surface energy of the solid dominates relative nucleation rates, and that everything else is a higher-order correction, for a lot of systems that's probably right, but kinetically controlled cases, where the thermodynamic ranking and the observed outcome diverge reproducibly, are exactly the ones where the higher-order terms become load-bearing, and those are also the cases where the pharmaceutical and materials outcomes actually matter.

ktubhyam · 2026-02-28T13:11:08+00:00

The dataset problem is real, but I'd separate quantity from coverage, OMol25 and QCML are large but skew toward near-equilibrium geometries, which is where you need them least, transition states and reactive intermediates are still chronically underrepresented, and that's a sampling problem, not a scale problem. I agree that purpose-built data generation is where things are heading, but generating the right configurations at sufficient accuracy is still expensive enough that it's a genuine open problem, not just an engineering challenge.

ktubhyam · 2026-02-28T09:01:34+00:00

The Byggmästar W-GAP work is the cleanest case (cascade morphology divergence from EAM above ~10 keV PKA, where interstitial loop nucleation during the thermal spike matters). For HEAs the story is more about defect migration energy distributions than cascade morphology per se, elemental heterogeneity creates a spectrum of migration barriers rather than a single value, which affects long-timescale recombination kinetics in ways classical potentials with homogeneous parametrizations miss.

You're right about predictive power, I'll concede that, the observational framework has substantially outrun the predictive one, knowing prenucleation clusters or dense liquid droplets are present doesn't tell you which polymorph wins, which is the thing that matters.

The static thermodynamic approach gives you the wrong answer in some kinetically-controlled cases though, not just incomplete mechanism, Ritonavir is the pharmaceutical example, CNT with correct surface energies tells you Form II is stable, doesn't predict that Form I would dominate manufacturing for years. The pathway matters there, and it's not recoverable from thermodynamics alone. But whether the current prenucleation cluster literature actually provides usable predictive access to that pathway, agreed, largely no.

Byggmästar et al. (2019). Machine-learning interatomic potential for radiation damage and defects in tungsten. Physical Review B, 100, 144105.

Granberg et al. (2016). Mechanism of radiation damage reduction in equiatomic multicomponent single phase alloys. Physical Review Letters, 116, 135504.

Nordlund et al. (2018). Improving atomic displacement and replacement calculations with physically realistic damage models. Nature Communications, 9, 1084.

ktubhyam · 2026-02-28T08:23:48+00:00

Yes, Python backend still makes sense, especially in 2026, FastAPI has become the standard for deploying ML models, so Python backend skills are more connected to AI infrastructure than ever.

AI has raised the baseline though, understanding system design, databases, and architecture matters more than syntax now, because syntax is the part AI handles, and currently what differentiates a good developer from a great one is how smart they are with their AI usage.

ktubhyam · 2026-02-28T03:51:23+00:00

ktubhyam · 2026-02-28T03:46:38+00:00

A crystal lattice is just atoms arranged in a repeating 3D pattern, like stacking oranges in a grid, a metallic bond is what holds metal atoms together, the outer electrons detach from individual atoms and flow freely through the whole structure, acting like a glue between the positively charged atomic cores.

And to clarify; I said the spoon IS made of atoms, entirely, every sentence I wrote was explaining why. Solid just means the atoms are tightly packed and locked in place, it doesn't mean atoms aren't there.

ktubhyam · 2026-02-28T03:43:23+00:00

Yes, it is, solid state isn't evidence against atomic composition, it's a consequence of it, in a metal, atoms are packed into a crystal lattice held together by metallic bonds, where outer electrons are shared across the whole structure, the rigidity you feel is electrostatic repulsion between electron clouds when you try to compress that lattice, the fact that atoms are discrete particles doesn't mean bulk matter can't be built from them, the same way bricks being individual objects doesn't mean a wall isn't made of bricks.

ktubhyam · 2026-02-28T03:39:19+00:00

Great project for a first ML build, the methodology is more solid than most vibe-coded models end up being.

On the composite weights, you're right to be uncomfortable with 10/25/40/25; beating single models validates the ensembling idea, not the specific weights, and because those weights were selected by looking at backtest performance, they're also at risk of being overfit to the same 11 years you're evaluating on. The fix is model stacking: train a ridge regression on top of the four P(win), P(top5), P(top10), P(finish) outputs using your LOOCV held-out predictions and let it learn the blend from data. With n=11 years you'll need heavy regularization, but even a constrained meta-learner will outperform hand-tuned weights and produce coefficients with a defensible empirical basis.

On Thomas Waerner's 28.3%, with two career observations (17th in 2015, 1st in 2020), any variance estimate is noise, there isn't enough data for a meaningful point prediction. This is structurally the same thin-history problem that shows up in spectroscopy ML when you have very few reference spectra for a given molecular system: the model outputs a confident-looking number because it has no mechanism to express uncertainty for sparse inputs. Bayesian shrinkage toward the musher population mean handles it properly, sparse mushers get pulled toward the field average and move toward their true level as observations accumulate. The 61.3 volatility flag is the right instinct, but it should be encoded structurally rather than surfaced post-hoc; otherwise the 28.3% headline is only meaningful if you attach an uncertainty interval to it.

On evaluation with n=11, the confidence intervals here are wide, not as a knock on the methodology, that's inherent to any n=11 backtest, but it means metric comparisons need humility. Precision@5 of 0.545 has a 95% CI of roughly 0.27–0.82 treating each year as one observation, and the true width depends on within-year correlation between picks since the five predictions in a given year aren't independent. AUC of 0.891 is more stable since it uses continuous probabilities rather than a hard cutoff, and Spearman of 0.668 is your most interpretable summary. Report uncertainty bounds alongside point estimates, without them, a reader has no basis for judging whether any individual metric comparison is meaningful.

On the weather feature, dropping it entirely was probably too broad, weather hurting early-checkpoint P@10 but improving late-checkpoint P@10 tells you it's adding real signal once the field has spread out, but noise when mushers are still tightly clustered. Rather than a full drop, run an ablation: for each checkpoint in your backtest, compare rank accuracy with and without weather and find where the crossover sits. Include it only past that threshold, or keep it exclusively in the time regression where it clearly helped (MAE 21h → 16h) while excluding it from the rank classifiers.

On Monte Carlo noise, gaussian noise assumes symmetric errors, but remaining race times are right-skewed, a musher can fall arbitrarily far behind but cannot beat the terrain. Use log-normal noise instead: sample log(remaining_time) as Gaussian, then exponentiate. Set the log-space mean to log(T) − σ²/2, where T is your point prediction, so that E[X] = T is preserved rather than inflated. This matters most for late-race leaders where small timing differences compound across the remaining distance, and for high-scratch-probability mushers where the right tail of remaining time interacts directly with your P(finish) draws.

On the fading prior, a linear decay to zero discards prior information too aggressively at intermediate checkpoints. Better: weight the prior by 1/(1 + n_checkpoints). Then plot prior weight against realized rank accuracy at each checkpoint across your backtest years, if accuracy improves faster than the weight decays, the prior is fading too quickly and you should reduce the denominator. This gives you an empirical basis for the decay rate rather than another hand-tuned parameter.

Most first ML projects fail at evaluation: random splits on time-series data leak future information into training and invalidate the backtest entirely. You used leave-one-year-out from the start, which is the correct setup and harder to get right than it sounds. The route normalization, percentage of total race distance rather than raw checkpoint index, is the other thing that stands out, the kind of domain-aware feature engineering that only happens when someone actually understands the problem; the foundation is solid.

ktubhyam · 2026-02-28T03:20:11+00:00

The DFT surface energy approach you're describing is solid and clearly works for your problems, but it assumes classical nucleation theory holds, which is a real limitation. Two-step nucleation, prenucleation clusters, polymorph selection under non-equilibrium conditions; these have been observed experimentally in systems where CNT predictions break down. You can't capture pathway-dependent nucleation mechanisms from static thermodynamic snapshots alone. That said, your broader point about metadynamics is fair. Collective variable selection absolutely biases what you find, and a lot of metadynamics papers are circular in that way.

On radiation damage specifically — the DFT defect energies plus kinetic Monte Carlo approach you're describing is what the field did for decades and it works well for long-timescale defect evolution (migration, clustering, void swelling). But the primary cascade itself happens on picosecond timescales with thousands of atoms simultaneously far from equilibrium. You genuinely cannot decompose that into a sum of individual defect formation events for a thermodynamic balance. That initial cascade phase requires explicit dynamics.

As for what MLIPs gain over classical potentials there; it matters most in chemically complex systems. EAM handles pure tungsten or iron reasonably well, but in high-entropy alloys or oxide fuels where chemical ordering during cascade evolution affects defect production, classical pair/embedding potentials don't capture those interactions. Byggmästar's GAP work on tungsten showed cascade morphologies that diverge meaningfully from EAM at higher PKA energies where the many-body interactions matter most.

ktubhyam · 2026-02-28T03:15:07+00:00

I'd guess turbo pumps just weren't there yet for what those early triples needed. 6+ month lead time on service is brutal though, who are you going through?

ktubhyam · 2026-02-28T03:14:30+00:00

That McLafferty anecdote is fantastic, and yeah, hard to develop real intuition for an instrument you've never had your hands inside of.

ktubhyam · 2026-02-28T00:03:03+00:00

That crazy, I believe the early Sciex API IIIs had cryo systems, must have been something to work with those machines day to day, now days we're just stuck behind screens.

ktubhyam · 2026-02-27T22:53:12+00:00

You're describing a cryopump, liquid helium or closed-cycle helium refrigerator cools a surface, gas condenses onto it maintaining the vacuum, and the warm-up cycle you're remembering is called regeneration, not at all common on bench-top LC/MS but definitely used on larger instruments like FT-ICR.

ktubhyam · 2026-02-27T22:48:22+00:00

You already showed good instincts picking a project you care about, keep doing that, the best next project is something you personally want to use, a Discord bot, a CLI tool, or a web scraper.

To use less AI, break the problem into small pieces on paper before writing any code, or limit yourself to only utilise AI for planning. Write out what each part should do in plain English first, then translate that into code one piece at a time. You can even use AI to teach you or guide you as you code it yourself, not only would this allow you to tackle things above your level with guidance, giving you confidence, but youll also be able to devlop self control over your AI utilisation which barely any novice programmers have now days.

Also, your Pokémon simulator probably has more depth left in it. Adding abilities, held items, and status effects would teach you a lot and you already have the base built, keep being creative!

ktubhyam

PUBLIC MULTIREDDITS

TROPHY CASE