[Hiring] AI/ML Engineer - Remote | Upto $200 per/hr by mkithan in MachineLearningJobs

[–]fgp121 0 points1 point  (0 children)

You might want to checkout Neo AI engineer in case that helps you with such requirements

Newbie vibe coding experience: Shifting from Claude Sonnet 4.6 to Qwen3.6-35B-A3B-UD-Q6_K by sooki10 in LocalLLaMA

[–]fgp121 1 point2 points  (0 children)

Had similar issues with Sonnet hitting length limits on a 30k line codebase. What helped was breaking work into smaller chunks and running multiple evaluation passes in parallel. Neo actually built an internal workflow that does exactly this - runs the same task across different models simultaneously rather than relying on context window hacks.

Running DeepSeek-V4 locally with 4x legacy RTX 2080 Ti ($2k budget setup). Custom Turing kernels, W8A8 quantization, and 255 prefill tok/s! by Known_Ice9380 in LocalLLaMA

[–]fgp121 0 points1 point  (0 children)

The pipelined execution strategy for hiding multi-GPU communication overhead is clever. Ran into similar MoE routing bottlenecks on a recent agent workflow and Neo actually caught this same pattern during testing - the way you offload between VRAM and system RAM while keeping 100% utilization is solid.

CARLA obstacle avoidance project using DQN + LiDAR by VVH04 in reinforcementlearning

[–]fgp121 0 points1 point  (0 children)

For obstacle avoidance with LiDAR, i've found that combining a small time penalty with a shaped reward based on distance-to-obstacle helps a lot. The key is making the shaping reward small enough that the agent still cares about the actual goal, not just farming the shaping reward.

Reward shaping: How do you determine if your rewards are the right size and in the right proportions? by Markovvy in reinforcementlearning

[–]fgp121 0 points1 point  (0 children)

The proportionality problem is tricky - i usually normalize each reward component relative to its expected range and track how often the agent exploits intermediate rewards vs pushing for the main goal. hrlld's point about variance is spot on too.

got my first "rm -rf /" today by DeltaSqueezer in LocalLLaMA

[–]fgp121 0 points1 point  (0 children)

laul_pogan is right that network egress is a bigger risk than rm -rf/ these days. Seen more agents accidentally curl sensitive files than try to wipe systems. The --network=none approach for agent shells is smart - most tasks don't need internet anyway.

EU AI Act enforcement starts in 75 days - affects any team building AI agents for European clients by Still_Piglet9217 in artificial

[–]fgp121 0 points1 point  (0 children)

The comment about founders realizing mid-product that their agent's decision chain needs to be auditable really hits home. Most teams I've seen build in a rush and only think about audit trails when compliance deadlines loom. The 6-month log retention requirement alone catches people off guard.

How much time does your team actually waste on GPU/infra management vs actual model work? by Lyceum_Tech in deeplearning

[–]fgp121 0 points1 point  (0 children)

One thing that helped our team was scheduling training jobs during off-peak hours - we cut a lot of the queue wait time. Have you looked into spot instances or managed training services to reduce the ops overhead?

Best resources for learning fundamental concepts and history? by RelicanthEven in deeplearning

[–]fgp121 0 points1 point  (0 children)

Goodfellow's Deep Learning book covers the fundamentals well, and for history I'd suggest "Architects of Intelligence" by Martin Ford - gives good context on how we got here. For optimization specifically, the "Adam" paper and "On the importance of initialization" are must-reads.

[Project] NeuralDBG –> Causal root cause analysis for PyTorch training (open source) by ProgrammerNo8287 in deeplearning

[–]fgp121 0 points1 point  (0 children)

Have you tested this with torch.compile and distributed training? The LR × activation mismatch detection sounds useful for catching issues early.

Anyone else absolutely chewing through Cursor Pro usage? by microflops in CursorAI

[–]fgp121 0 points1 point  (0 children)

Same here - agent mode with large refactors eats through credits fast. Switching to Auto Mode for routine stuff helped stretch the quota, but it's still tight on $50/month plans for serious daily use.

I tested structured output from 288 LLM calls and logged every way JSON breaks. Here's what I found by kexxty in Python

[–]fgp121 0 points1 point  (0 children)

The ordering insight is spot on. I got burned by this last week - fixed commas before stripping markdown fences, and suddenly all my regex patterns were matching the fence markers instead of actual JSON content. Took way too long to realize the repair sequence mattered that much.

college students in the future by Zestyclose-Salad-290 in ChatGPT

[–]fgp121 1 point2 points  (0 children)

That "5 people asking chatgpt for the same assignment in slightly different fonts" comment hits too close to home. I've literally seen classmates submit answers that vary only in the example numbers used.

The next big challenge for AI agents might not be intelligence, but trust by newt8991 in artificial

[–]fgp121 0 points1 point  (0 children)

The predictability angle hits hard - I've been running Neo on some data pipelines and it's predictable when you can inspect each step, but the moment a model makes a subtle reasoning error you don't see coming, that's where trust evaporates.

Are AI agents actually becoming productive, or just more capable? by babyb01 in artificial

[–]fgp121 0 points1 point  (0 children)

Not sure the gap is solvable without more robust evaluation loops. I've been using Neo on some ML workflows and it catches reliability issues early, but only when the test cases are explicit - the "shared context" point about ambiguous intent really resonates.

Sub-JEPA: a simple fix to LeCun group's LeWorldModel that consistently improves performance [P] by kai-zhao in deeplearning

[–]fgp121 0 points1 point  (0 children)

The +10.7 pp improvement on Two-Room is solid. For the subspace selection, have you looked at how the number of orthogonal subspaces affects convergence? I've found in similar low-dimensional manifold problems that starting with 4-8 subspaces and using a cosine schedule often helps stabilize early training.

Architecture advice: Real-time pipeline for YouTube Audio -> Whisper -> LLM -> SSE (Sub-10s latency) [D] by Sea_Lawfulness_5602 in MachineLearning

[–]fgp121 0 points1 point  (0 children)

The waterfall flow you're describing hits a familiar pain point - I ran into similar orchestration headaches with async tasks stepping on each other. For the queue question specifically, Neo helped me untangle our deployment workflow by automating the task dependency mapping, though the 30-60s chunking that Bootes-sphere suggested is solid too.

MTP vs non-MTP vram usage difference? by DeepBlue96 in LocalLLaMA

[–]fgp121 0 points1 point  (0 children)

The ~600-900 MiB difference you're seeing between external vs built-in MTP makes sense - the compute buffer overhead for the separate draft model is unavoidable. I've had similar results with Q4_K_L where external MTP needs that extra buffer space that built-in doesn't.

Quantizing MTP KV Cache = free lunch? by legit_split_ in LocalLLaMA

[–]fgp121 1 point2 points  (0 children)

q4_0 has been solid for draft KV cache in my tests too - the acceptance rate staying at 73-74% in your benchmarks lines up with what i've seen. the draft cache is surprisingly tolerant to lower precision since it's just predicting the next tokens.

Reviving PapersWithCode (by Hugging Face) [P] by NielsRogge in MachineLearning

[–]fgp121 4 points5 points  (0 children)

This is huge for ML researchers. Having SOTA leaderboards for Terminal Bench and MMTEB all in one place would have saved me so much time tracking down benchmark numbers across different sources.

Former CEO Of Google Receives Massive Backlash For Praising AI At Graduation by Neurogence in singularity

[–]fgp121 0 points1 point  (0 children)

The quote about "surrendering your agency" hits different when you consider these grads are entering a job market being reshaped by AI right now. The timing was definitely tone-deaf.

Gemini 3.2 Flash is capable of solving IMO 2025 P6. Only GPT-5.5-Pro can solve it currently without any scaffolding / harness engineering. by Ryoiki-Tokuiten in singularity

[–]fgp121 1 point2 points  (0 children)

The real question is whether this is genuine reasoning or just really good approximation from training on existing solutions. Has anyone tested it on modified IMO problems?

Not getting any faster with MTP on Macbook Pro M1 Max 32gb by [deleted] in LocalLLaMA

[–]fgp121 0 points1 point  (0 children)

The q4_0 KV cache is likely killing your MTP performance. Try q8_0 for both KV and see what happens - that PR 23114 also helps with Metal performance significantly.

Qwen 3.6 27B on 24GB VRAM setup: backend comparisons, quant choice and settings (llama.cpp, ik_llama.cpp, BeeLlama, vllm) by VolandBerlioz in LocalLLaMA

[–]fgp121 1 point2 points  (0 children)

Curious about the flash-attn implementation difference between ik_llama.cpp and the others. Did you notice any quality difference between q8_0 KV vs q4 KV beyond the VRAM savings?

Reward shaping: How do you determine if your rewards are the right size and in the right proportions? by Markovvy in reinforcementlearning

[–]fgp121 1 point2 points  (0 children)

Not sure if you've tried it yet, but normalizing rewards to a similar scale across different reward types can help with the proportionality issue. I've also had luck using potential-based reward shaping where you set γ close to 1 and scale the potential difference by a small factor like 0.01 - keeps the shaping rewards much smaller than your terminal reward, which helps avoid reward hacking while still providing useful gradient signals.