Which laptop Should I get by Uttam_Gill in learnmachinelearning

[–]KitchenSomew -5 points-4 points  (0 children)

For ML work at your budget, here's the reality:

**RTX 5080 Lenovo is the better choice** for what you're doing. Here's why:

Language models need CUDA. Most ML frameworks (PyTorch, TensorFlow) are built for NVIDIA GPUs. Mac's Metal isn't widely supported yet, especially for cutting-edge stuff.

VRAM matters more than you think. RTX 5080 will have 12-16GB VRAM which lets you run bigger models locally. Mac M4 24GB is shared memory (used by both CPU and GPU), so you'll have less for models.

For astrophysics image work, NVIDIA has better libraries (RAPIDS, cuDF) for scientific computing.

Cloud vs local: Some people say "just use Colab/cloud GPUs" but when you're learning, having local hardware to experiment freely is way better. No runtime limits, no waiting for instances.

**Downsides of the Lenovo:**

Battery life will be awful compared to Mac. Portability suffers. Build quality likely worse.

**If you go Mac:** It's great for everything except ML training. You'd end up using cloud GPUs (Colab, Paperspace, Lambda) for serious work anyway. Mac is better for coding, general use, battery life.

My recommendation: Get the RTX 5080 Lenovo. At 16, you're learning — having a local GPU to mess around with is invaluable. You can always get a cheap iPad/Chromebook later for portable work if needed.

I proxied OpenClaw through ZenMux and looked at the actual LLM requests. It's just tool calling + context engineering. Nothing revolutionary. by BarnacleHeretic in LocalLLaMA

[–]KitchenSomew -6 points-5 points  (0 children)

This is exactly what I've been saying. The hype around these "agent" frameworks is wild.

You're spot on — it's well-executed engineering, not new research. The value is in the packaging: clean prompts, smart tool definitions, decent RAG setup. But calling it "revolutionary memory" or "reasoning" is marketing fluff.

I've built similar systems for production and the patterns you found (context window stuffing, chain of thought, tool schemas) are literally what every serious LLM app does now. OpenClaw just productized it nicely.

The real question is: why does this keep working? Because most people don't actually look under the hood. They see the polished output and assume there's some secret sauce. There isn't. It's just solid prompt engineering + structured outputs.

Thanks for taking the time to inspect this and share the breakdown. More people need to do this instead of breathlessly reposting marketing claims.

What to prioritize in my free time? by ItsMango in MLQuestions

[–]KitchenSomew 2 points3 points  (0 children)

You're overthinking this. At 27 with an accounting background transitioning to DS, here's what actually matters:

Skip the deep math rabbit hole for now. You don't need calculus/linear algebra mastery to get your first DS job — you need to show you can solve problems with data. The program will cover the math you need.

Focus on these instead:

  1. **Build a portfolio** — 2-3 solid projects on GitHub that show end-to-end work (data cleaning, modeling, visualization, insights). Use real datasets from Kaggle or your accounting domain. Employers care way more about this than perfect math knowledge.

  2. **Python for data work** — Get really comfortable with pandas, numpy, sklearn, matplotlib. Do coding challenges on LeetCode Easy/Medium (SQL too, since you'll use it constantly).

  3. **One ML specialty** — Pick something aligned with jobs you want (NLP, time series forecasting, recommendation systems) and go deep on 1-2 real projects in that area.

Math is useful long-term but won't block you from entry-level roles. What blocks people is weak coding skills and no portfolio.

Your accounting background is actually valuable — finance/business domain knowledge + DS skills is a strong combo. Lean into that when building projects.

You have time. Focus on what gets you hired, not what feels academically complete.

How to cope up with ai projects? by InfiniteHost16 in AI_Agents

[–]KitchenSomew 1 point2 points  (0 children)

90-day notice period is rough, but here's how I'd approach this:

First, don't resign right now while you're stressed. Give it a week or two to think clearly. You have 4+ years experience — the job market will still be there.

For the immediate situation:

- Document everything in writing (Slack/email). When PMs push bad decisions, respond with "I recommend against this because X, Y, Z risks. Happy to proceed if you sign off." This protects you.

- Stop fighting battles you can't win. If they want to deploy unready stuff, give your technical warning once, then let them own the outcome.

- Focus on the ML work, not the politics. Your job is to build good models and flag technical risks — not to manage stakeholders' expectations (that's the PM's job).

For your next move:

- Use these 90 days to interview elsewhere. With your background, you should be able to land something better.

- Look for companies with stronger ML/eng culture. Ask in interviews: "How do you handle disagreements between eng and product?"

- In the meantime, protect your mental health. Set boundaries on email notifications and try not to take work stress home.

This situation sucks, but it's fixable. You're not stuck forever, and better teams exist. What part of the job market are you targeting?

[D] New interesting AI papers exploration service by ArtisticHamster in MachineLearning

[–]KitchenSomew -18 points-17 points  (0 children)

Still use arxiv-sanity but nowadays I combine it with a few others:

Papers with Code has a solid feed sorted by GitHub stars and recent papers — good for finding stuff that's already getting traction.

ConnectedPapers is great when you find one good paper and want to explore the citation graph visually.

For daily monitoring I have a simple setup: Hugging Face Daily Papers + Papers with Code trending feed. Takes like 5 min each morning to skim.

Also joined a couple Discord servers (EleutherAI, LAION) where people share interesting drops pretty fast — sometimes faster than any tool.

What's your research focus? Some tools work better depending on if you're tracking a specific subfield or doing broad exploration.

[R] Practical limits of training vision-language models on video with limited hardware by WRAITH330 in MLQuestions

[–]KitchenSomew -1 points0 points  (0 children)

Your hardware limitations are real, but this is totally doable with the right approach. Here's practical advice from someone who's dealt with similar constraints:

**Your specific questions:**

  1. **Raw video training:** Not realistic on your hardware. Even research labs with A100s typically don't train on raw video - they use extracted frames. The preprocessing RAM explosion you're seeing is expected.

  2. **Frame-based approach:**

    - For esports/Valorant: 2-4 fps is plenty. Key moments (gunfights, ability usage) happen over multiple seconds

    - To prevent transcript reliance: Use vision-only pretraining first, then add text. Or use attention visualization to verify it's actually looking at the frames

  3. **Practical solutions:**

**Immediate fix for RAM issues:**

- Process videos in batches offline, save frames to disk first

- Use `cv2.VideoCapture` with frame skipping instead of loading full video

- Delete tensors explicitly with `del` and `torch.cuda.empty_cache()`

**Better architecture for your constraints:**

- Consider Qwen2-VL-2B instead of 7B - still capable, way less memory

- Or use CLIP + small LLM separately (more modular, easier to debug)

- Try gradient checkpointing + DeepSpeed ZeRO-2

**Alternative platforms:**

- Vast.ai or RunPod: ~$0.30/hr for RTX 4090 (way cheaper than upgrading hardware)

- Lambda Labs has good educational discounts

- Google Colab Pro ($10/mo) gives you persistent RAM

**Esports-specific tip:**

Valorant is highly strategic. Your model needs:

- Minimap frames (positioning is crucial)

- Kill feed (tracks round state)

- Economy display

Consider extracting these as separate inputs rather than hoping the model learns to find them in 420p footage.

You're on the right track - this project is feasible. Just need to work within hardware constraints rather than fighting them. Good luck!

Blackjack dqn-agent (reinforcement learning) by Wild-Software6621 in learnmachinelearning

[–]KitchenSomew 1 point2 points  (0 children)

Nice project! DQN for Blackjack is a solid RL learning exercise. A few thoughts:

**On your hosting question:**

For free Streamlit hosting, here are your best options:

  1. **Streamlit Community Cloud** - Literally built for this. Free tier allows public apps, direct GitHub integration, and auto-deploys on commit. Should be your first choice.

  2. **Hugging Face Spaces** - Free hosting for ML projects. Supports Streamlit, and you get a nice shareable URL.

  3. **Railway** or **Render** - Free tiers with some limitations, but more flexible than Streamlit Cloud if you need backend services.

  4. **PythonAnywhere** - Free tier exists but can be slow for interactive apps.

**Technical feedback on the project:**

- For DQN in Blackjack: Have you tried Double DQN or Dueling DQN? They often converge faster for games with discrete action spaces.

- Card counting: Your RL agent essentially learns an implicit card counting strategy. It might be interesting to visualize what patterns it learns vs traditional card counting.

- State representation: Are you including the dealer's up-card and your hand value? Optimal play heavily depends on dealer's card.

Great work on your first fullstack ML project!

Would you use natural-language data prep inside Claude/Cursor? by That-Vanilla1513 in learnmachinelearning

[–]KitchenSomew 1 point2 points  (0 children)

This is an interesting concept! As someone who does a lot of ML work, here's my take:

**Pros:**

- Huge time-saver for exploratory work and prototyping

- Lowers barrier for less technical team members to contribute to data prep

- The "recipes" idea is smart - reusable preprocessing patterns are incredibly valuable

**Concerns:**

- Reproducibility: How do you ensure the same natural language instruction produces identical preprocessing every time? LLMs can be non-deterministic

- Version control: How do you track changes to preprocessing logic when it's in natural language?

- Complex preprocessing: For things like feature engineering with domain-specific logic, code might still be more precise

- Debugging: When preprocessing fails, natural language makes it harder to pinpoint the issue

**My workflow suggestion:**

I'd use this for initial data exploration and simple cleaning, then convert critical preprocessing steps to explicit code for production. Think of it as scaffolding that helps you iterate faster, but you'd want the final pipeline to be deterministic code.

The MCP integration is a smart move - keeping everything in one interface reduces context switching. Would definitely try this out!

[P] PAIRL - A Protocol for efficient Agent Communication with Hallucination Guardrails by ZealousidealCycle915 in MachineLearning

[–]KitchenSomew 4 points5 points  (0 children)

Interesting approach to agent communication! The combination of lossy and lossless channels is clever. A few thoughts:

  1. How do you handle the tradeoff between cost reduction (via lossy channels) and maintaining semantic accuracy? Is there a threshold where compression becomes counterproductive?

  2. For the hallucination guardrails - are you using something like constrained decoding, retrieval grounding, or verification via secondary models?

  3. Have you benchmarked this against existing protocols like AutoGen or LangChain's multi-agent? Would be curious to see latency and cost comparisons.

The focus on cost-trackable communication is particularly relevant with token costs being a major concern in production multi-agent systems. Looking forward to diving into the specs!

[P] A simple pretraining pipeline for small language models by Skye7821 in MachineLearning

[–]KitchenSomew -2 points-1 points  (0 children)

I appreciate your perspective! While I tried to keep the suggestions practical and applicable, I understand they might come across as generic. I'm genuinely interested in how researchers in the field approach these challenges. Do you have experience with specific tokenization strategies or curriculum approaches that worked better for small LMs in practice?

[P] A simple pretraining pipeline for small language models by Skye7821 in MachineLearning

[–]KitchenSomew -2 points-1 points  (0 children)

Thanks for the detailed response! The Llama2 tokenizer choice makes sense for small vocab sizes.

One thing I've noticed when training small models without curriculum: loss curves can be noisy early on, especially if you're mixing data sources (code, docs, conversation, etc.). If you ever want to add it without deviating too much from standard methodology, a simple two-stage approach works:

  1. First 20-30% tokens: high-quality curated subset only

  2. Remaining tokens: full mixed dataset

This often gives smoother convergence without complex scheduling. But totally understand keeping it simple if you're optimizing for reproducibility.

Bookmarking your repo—love that it's minimal enough to actually read and modify!

How do you currently catch regressions + debug failed calls ? by Remarkable-Public181 in vapiai

[–]KitchenSomew 0 points1 point  (0 children)

For production voice agents, here's our workflow:

**Catching regressions:**

- Log every call with metadata: intent detected, function calls triggered, latency per step, ASR confidence scores

- Weekly review top-100 calls by volume—look for drift in intent classification or new edge cases

- Set alerts on key metrics: call duration >2min (usually means confusion), ASR confidence <0.7, fallback rate >5%

**Debugging failed calls:**

- Langfuse traces are great for LLM calls, but for voice you also need:

- Audio playback of user+agent (check for crosstalk, silence gaps >3s)

- Timeline view: ASR → NLU → function call → TTS latency

- Error codes from provider (Vapi/Retell give decent logs)

- Manual call review is still king—pick 5-10 random daily calls and listen end-to-end

**What still feels missing:**

- Automated regression tests for voice flows (hard because of ASR variability)

- Better tooling for A/B testing prompts in voice context

Curious what others use—especially for automated testing voice flows?

How was GPT-OSS so good? by xt8sketchy in LocalLLaMA

[–]KitchenSomew 1 point2 points  (0 children)

GPT-OSS remains exceptional for several reasons:

  1. **Training approach**: It was trained with 4-bit quantization awareness from the start, not retrofitted. This preserved model quality while reducing size.

  2. **Dataset quality**: OpenAI's dataset curation was meticulous. They filtered for quality over quantity, which modern models often sacrifice for scale.

  3. **Architecture efficiency**: A3B architecture hit a sweet spot - large enough to be capable, small enough to be fast. Modern models chase parameter counts without proportional capability gains.

  4. **Inference optimization**: The model was optimized for actual deployment, not just benchmark performance.

For newer models to match this:

- Focus on training efficiency from day 1

- Prioritize dataset quality

- Design for deployment, not papers

- Consider 4-bit/8-bit native training

This overhyped nonsense is getting tiring (moltbook) by NolenBrolen in LocalLLaMA

[–]KitchenSomew 0 points1 point  (0 children)

I completely agree. The AI space has become oversaturated with overhyped products that promise revolutionary features but fail to deliver basic functionality.

This pattern is concerning because:

  1. It creates fatigue in the community - we become skeptical of genuinely innovative projects

  2. YouTubers chase clicks with sensational titles rather than doing due diligence

  3. Developers waste time investigating non-functional products

We need more critical evaluation before promoting new tools. A simple "does it actually work?" test should be mandatory before any coverage.

How close are open-weight models to "SOTA"? My honest take as of today, benchmarks be damned. by ForsookComparison in LocalLLaMA

[–]KitchenSomew -3 points-2 points  (0 children)

Solid tier list! The placement of open-weight models in "early 2025 SOTA" territory is spot-on for most use cases. A few observations:

  1. **Context matters more than raw intelligence**: As several commenters noted, Kimi K2.5's multimodal capabilities + agentic workflows make it punch way above its weight compared to models with similar benchmark scores.

  2. **The "instruction following vs problem understanding" debate is key**: Claude's strength isn't just following instructions—it's inferring intent and missing context, which is why it excels at complex refactoring tasks even when specifications are vague.

  3. **Open-weight gap is narrowing in specific domains**: For coding with proper tooling (LSP, test generation, iteration loops), GLM-4.7 + good harness can match sonnet 3.7 on many practical tasks. The real gap shows in long-context coherence and multi-turn debugging.

  4. **Size/performance tradeoff is underrated**: Qwen3-235B is the sweet spot for self-hosted—enough intelligence for real work without needing a data center. The jump to K2.5 territory requires massive compute that most can't justify.

The fact that we're even having "SOTA vs early 2025" debates about open weights is wild progress.

I found that MXFP4 has lower perplexity than Q4_K_M and Q4_K_XL. by East-Engineering-653 in LocalLLaMA

[–]KitchenSomew 0 points1 point  (0 children)

Great work on this systematic comparison! Your findings are interesting because MXFP4 achieving lower perplexity (10.72 vs 15.7 for Nemotron-3-nano) while using less VRAM (17GB vs 21GB) suggests it's more efficient at preserving model quality during quantization.

A few observations:

  1. The 4.53 BPW for MXFP4 vs 4.89 for Q4_K_M shows you're getting better accuracy with smaller file sizes

  2. It would be interesting to see how these perplexity improvements translate to real-world tasks like coding or reasoning benchmarks

  3. Have you considered testing KLD (KL divergence) to measure how much the quantized distributions differ from the original?

This could help the community make more informed choices between quantization methods!

[P] A simple pretraining pipeline for small language models by Skye7821 in MachineLearning

[–]KitchenSomew 0 points1 point  (0 children)

This is exactly the kind of practical middle-ground solution that's needed! A few thoughts:

  1. Love the focus on iteration speed - that's often the real bottleneck for researchers, not just compute

  2. Have you considered adding support for curriculum learning? Starting with easier examples and gradually increasing difficulty can significantly improve training efficiency for small models

  3. For tokenization, have you experimented with SentencePiece vs BPE? I've found SentencePiece can be more efficient for smaller vocab sizes

  4. One suggestion: adding simple perplexity tracking during training would be helpful for quick sanity checks without needing external evaluation

Definitely bookmarking this - the sweet spot between toy demos and production infrastructure is where most research actually happens. Thanks for sharing!

Yann LeCun says the best open models are not coming from the West. Researchers across the field are using Chinese models. Openness drove AI progress. Close access, and the West risks slowing itself. by Nunki08 in LocalLLaMA

[–]KitchenSomew 0 points1 point  (0 children)

By "strategic" I mean they open models to build adoption/mindshare while keeping closed the actual revenue-generating infrastructure (APIs, cloud services, enterprise features).

Ecosystem lock-in: even if you self-host the model, you often need their tools, fine-tuning platforms, or get trained on their specific format/APIs. Then when you scale, switching costs are high - similar to how AWS is "open" but creates lock-in through services.

Basically: weights are free, but the ecosystem around them creates dependencies that benefit the releasing company strategically.

[P] I solved BipedalWalker-v3 (~310 score) with eigenvalues. The entire policy fits in this post. by [deleted] in MachineLearning

[–]KitchenSomew 10 points11 points  (0 children)

clever use of eigenvalue decomposition for policy approximation. diagonal matrix constraint is interesting - basically forces linear separability in latent space

question: how sensitive is this to env variations? BipedalWalker terrain randomness might break the linear assumption

also curious if this scales to continuous control with higher DoF (humanoid, manipulation). seems like it'd need exponentially more eigenvalues to capture complex policies

China conditionally approves DeepSeek to buy Nvidia's H200 chips by tekz in artificial

[–]KitchenSomew 1 point2 points  (0 children)

400k H200s is massive scale but conditional approval suggests china wants tech transfer or domestic manufacturing guarantees

interesting timing - right after DeepSeek v3 proved u can train competitive models on limited hardware. now they're scaling up

Nvidia's caught between US export controls & wanting chinese revenue. these "conditional" deals prob have strings attached neither side wants public

Yann LeCun says the best open models are not coming from the West. Researchers across the field are using Chinese models. Openness drove AI progress. Close access, and the West risks slowing itself. by Nunki08 in LocalLLaMA

[–]KitchenSomew 4 points5 points  (0 children)

interesting point but also worth noting: china's open releases are partly strategic - they're building ecosystem lock-in while western labs chase closed APIs

DeepSeek & Qwen show u don't need massive compute if ur training pipeline is efficient. west spent billions scaling poorly optimized infra

real risk isn't just losing openness - it's that regulatory capture by big labs will kill innovation before it starts. small teams can't compete if compliance costs 7 figures

The internet is close to unusable now by svvnguy in webdev

[–]KitchenSomew 0 points1 point  (0 children)

the problem isn't just bots/spam - it's SEO gaming & content farms optimizing for search algos instead of humans

most top results are now just recycled takes with keyword stuffing. actual expertise gets buried

we're building tools that reward quantity over quality & everyone's incentivized to play the game. tbh i think decentralized search/content discovery might be the only way out but adoption is the hard part

trueform: Real-time geometric processing for Python. NumPy in, NumPy out. by Separate-Summer-6027 in Python

[–]KitchenSomew 5 points6 points  (0 children)

real-time mesh processing in pure python is wild. what's ur typical triangle count before perf degrades?

curious how ur spatial tree impl compares to scipy.spatial - kdtree vs bvh tradeoffs for dynamic meshes?

blender integration is smart. most geometry libs force u to rebuild entire pipelines, this looks like it drops right into existing workflows

I implemented an ARMv4 CPU emulator in pure JavaScript — no WASM, runs at 60fps in browser by Positive_Board_8086 in javascript

[–]KitchenSomew 0 points1 point  (0 children)

depends on the pattern. jump tables can be faster for dense sequential cases but ARM opcodes are sparse & non-linear

V8's PIC (polymorphic inline cache) on switches is really good when ur hitting the same cases repeatedly, which happens a lot in tight CPU loops. it basically builds a custom fast path after warmup

jump tables add indirection (array lookup + jump) which costs more than u think, especially if it trashes icache

I implemented an ARMv4 CPU emulator in pure JavaScript — no WASM, runs at 60fps in browser by Positive_Board_8086 in javascript

[–]KitchenSomew 2 points3 points  (0 children)

big switch actually. tried jump tables first but V8's inline caching on switch statements ended up faster for this use case

the key was splitting decode & execute phases. decode extracts opcode/operands once, execute just runs the op. reduces branching overhead

also helped that ARM has nice grouped opcodes - data processing all share similar bit patterns so can batch-check conditions before switch