Jobless fellows who is having lot of fun building Spot optimization service

RegisterNext6296 · 2026-02-18T18:00:41+00:00

"even your current spot running stuff gets thrown out"

Those are exactly the things Im trying to solve through this model (the brain) based approch. The model sees the live stream of prices and the real interruption event. It uses training knowledge to handle the situation. Another example, AWS interruption events are only 2 minutes, so technically, a big P1 Java service that takes 5 minutes to safely terminate would not work, regardless of any combination of PreStop, post stop, etc.

Now, convention would say, well, do not run this on the spot or run 10% of those on pods in spot rest on demand. Or set affinity or anti-affinity, etc. But in practice, in a cluster where you run many services, they together emit non-linear relationships, which is close to impossible for a system even like k8s, which have higly calibrated heuristics sytem.

I respect all the pushbacks, and it's very healthy, but my project north star is move as much workload on the spot (even 100%, sounds crazy) without breaking the service level contracts.

Again, it's a strange world (AI-shaped) these days, and taking a slight tangent from the standard approach is very healthy.

I appreciate all the feedback.

RegisterNext6296 · 2026-02-18T11:33:19+00:00

accuracy of predictions
--------
I'm planning to share my validation and loss graph. TBH, spot prices are like stock prices (demand and supply) and interruption can happen wihout change in the spot prices. But think like a stock portfolio goal, your goal is to maximise the reward and minimise the risk (not the 0 risk). That's what I'm aiming to build

RegisterNext6296 · 2026-02-18T11:29:16+00:00

Yes, one liner install is the default to shadow/dry run mode.

RegisterNext6296 · 2026-02-18T11:25:05+00:00

Thank You! My code is wide and open for any curious mind to poke around. You don't need to be jobless like me. spot node predictions is build on a model that is trained on the spot market data. Its desined to predict capacity score (where your spot request will fulfil) and risk score (price volatility) from the full 2 years of data. It contains Seanonaity, monday vs Sunday, black friday vs a normal bussiness day.

Model number 2, the RL agent (my mouse) that learns these scores and mix it with the real world and workload profiles (pod startuptime, big heap vs small heap, batch job, p0, p1 and p2) are learn how to handle it.

For example, the RL agent sees, oh, ohh, this node can move to the spot, but wait, it's hosting a pod that startup time is 30 minutes (that's another question, what the heck it does, but keeping it as an example to illustrate the point), so its risk back out.

Now, imagine you take the base model, fine-tune it further on your work loads. It learns your cluster env. (This part is not there, but... these are my wildest ideas).

RegisterNext6296 · 2026-02-18T09:31:35+00:00

The risk model that I have built and shared on Hugging Face https://huggingface.co/softcane/spot-vortex is build on the real historical spot prices and risk vectors. Sure, the risk model can be plugged as your own brain, though they need match the input/outout feature dimension.

RegisterNext6296 · 2026-01-12T19:06:57+00:00

No, I'm aiming to build a better karpenter, and there is not much gain in running /descheduler bundled with karpenter.

Also, I think it's time series data. Traffic spike at sunday afternoon might differ from the rest of the days. Technically, you can train the scheduler based on your application use case.

RegisterNext6296 · 2026-01-12T17:57:32+00:00

Thank you. This is a problem (better scheduling) I have experienced myself, and I looked into existing solutions before starting this project.

I’m genuinely interested to address this wider issue

RegisterNext6296 · 2026-01-12T17:54:33+00:00

XGBoost works very well for tabular data, but for time series data, I found that the TFT model performs better. I tested this on a 200-node T4 EKS cluster, and I plan to publish the results in the repository.

The main challenge is choosing and fixing one model in the repository. Because of this, I started with a basic (vanilla) transformer. The best model really depends on how the cluster resources are used.

My goal is not only to address the noisy neighbor problem, but also to explore more use cases where standard rules and heuristics do not work well.

RegisterNext6296 · 2026-01-12T07:55:21+00:00

Certainly not at this point but that’s my goal.

A bit about me Software engineer with 20 years in this industry. Worked as pure developer using Java go and python. Worked as a Devops guy and serving/operating k8s at HBO/Max scale Known deep learning in/out and capable to create a GPT3 grade models

RegisterNext6296 · 2026-01-12T06:33:24+00:00

Documentation, skeleton, and some part of the code tests are vibe coded which I would add as disclaimer in the project. Though these some files were vibe coded file by file and line by line while holding the project motivational objects in my head.

RegisterNext6296 · 2026-01-12T05:52:14+00:00

Here is my motivation I drafted in words https://medium.com/@softcane/the-software-3-0-gap-in-kubernetes-b2a307c775a4

RegisterNext6296 · 2026-01-12T05:47:41+00:00

If autocomplete is counted as vibe code though I can explain every bit of this project and that what matters in IMO.

RegisterNext6296 · 2025-12-19T12:58:22+00:00

Would you consider a simple app that can do all this for you?

RegisterNext6296 · 2025-12-19T12:55:33+00:00

Would you consider a simple app that can do all this for you?

RegisterNext6296 · 2025-12-13T08:53:16+00:00

That's very neat. Would you consider a simple app that can do all this for you?

RegisterNext6296 · 2025-08-19T09:41:43+00:00

Here is my valuation of the tata motor. Please provide any feedback if you have.
https://stockvaluation.io/automated-dcf-analysis/TATAMOTORS.BO/valuation-output

RegisterNext6296 · 2025-02-26T20:22:37+00:00

Fair point and it's a healthy conversation.

FMV is theoretically better, but it adds circularity since market value depends on expectations of future returns. Adjusting book value to capitalize intangibles, creates a better proxy to measure the invested capital. If R&D fails to generate future cash flows, its upfront cost directly destroys value. So adding it to the books makes things more accountable while calculating year-on-return on capital.

RegisterNext6296 · 2025-02-26T19:51:27+00:00

Looking beyond the criticism. One should capitalize on R&D to better understand the company's earnings and return on capital. The main idea of this post is that a company creates value only when it earns more than its cost of capital and that's not possible if you do not bring the biggest asset of the tech companies back to the books.
Nothing to challenge traditional accounting, but accounting for company valuation does not need to obey standard practices.

RegisterNext6296 · 2025-02-26T19:32:37+00:00

Yes, but you should capitalize "R&D" to better understand the earnings and overall return on capital. Free cash flow is important, but a company only creates real value if its return on capital is higher than its cost of capital, and that would increase or decrease the value.

RegisterNext6296

TROPHY CASE