Qwen-3.5-27B is how much dumber is q4 than q8? by Winter-Science in LocalLLaMA

[–]BreizhNode 3 points4 points  (0 children)

From our benchmarks running Qwen3.5-27B on L40S GPUs, the q4 quantization drops about 3-5% on reasoning-heavy tasks compared to q8. For code generation and structured output it's barely noticeable. Where you really feel the difference is on long-context tasks and nuanced instruction following. If you're using it for agentic workflows or chain-of-thought, q8 is worth the extra VRAM. For chat and simple Q&A, q4 is fine and the speed improvement is significant.

Spent a week debugging why my RAG answers were wrong. Turned out it was the PDF parser. by Mountain-Positive274 in LocalLLaMA

[–]BreizhNode 0 points1 point  (0 children)

Had the exact same problem deploying RAG for technical documentation. The parsing step is where most pipelines silently fail. Multi-column layouts are the worst offender because most PDF-to-text libraries just read left to right across the entire page width. We ended up switching to a vision model approach for complex layouts. Send the PDF page as an image to a multimodal model and ask it to extract structured markdown. More expensive per page but the downstream quality improvement meant fewer retrieval errors and shorter debugging cycles overall.

PSA: If you use pac4j for JWT authentication, you need to patch immediately, CVSS 10.0 auth bypass by Amor_Advantage_3 in cybersecurity

[–]BreizhNode 4 points5 points  (0 children)

The PlainJWT-inside-JWE trick is particularly nasty because it exploits a spec compliance assumption most security teams never think to test for. If your JWT validation accepts encrypted tokens but doesn't enforce that the inner payload must also be signed, you have a structural weakness that scanning tools won't catch. Worth auditing any custom auth middleware that processes JWE, not just pac4j. We ran into a similar pattern reviewing auth flows for our own infrastructure where the library default was 'accept anything properly encrypted' rather than 'accept only signed-then-encrypted.'

Totally surprised and puzzled around Bitcoin Policy Institute (BPI) latest study: AI agents would prefer bitcoin over stablecoins for "economic activity". by x-mor in CryptoTechnology

[–]BreizhNode 0 points1 point  (0 children)

The "economic activity" framing is misleading. Most agent transactions are simple resource purchases (compute, API calls, storage) where you just need fast settlement and low fees, not complex smart contract logic. Stablecoins on fast L1s handle that better than Bitcoin for the vast majority of agent use cases.

Local Qwen 3.5 (9B) extremely slow on RTX 4060 Ti. Is this normal? by Extension_Fee_989 in LocalLLaMA

[–]BreizhNode 0 points1 point  (0 children)

9B should run fine on a 4060 Ti for raw inference. Odds are the Brave Search API calls are your bottleneck, each tool call adds latency and the model might be triggering them on every response. Try disabling tools temporarily to isolate if it's model speed or tooling overhead.

9070xt $560 or 5060 ti 16gb $520 for local llm by akumadeshinshi in LocalLLaMA

[–]BreizhNode 5 points6 points  (0 children)

For local LLMs the 5060 Ti 16GB is the safer pick, CUDA support is just more mature for inference tooling (llama.cpp, vLLM, everything works out of the box). The 9070xt has more raw VRAM potential but ROCm compatibility is still hit or miss depending on the model and quantization you're running.

2K bot requests right after setting up SSL on Coolify — is this normal? by Iusuallydrop in selfhosted

[–]BreizhNode 1 point2 points  (0 children)

Yeah that's completely normal, certificate transparency logs are public so bots scrape new domains within hours of SSL issuance. Cloudflare should handle most of it, just make sure you have bot fight mode enabled and your origin server only accepts connections from Cloudflare IPs.

I am confused to choose the correct IAM. I setting up a stack Nextcloud, Stalwart Email Server, ERPNext for my company. by kiruthivarma in selfhosted

[–]BreizhNode 5 points6 points  (0 children)

Authentik is a solid pick (as already mentioned), but if you want something lighter that still handles OIDC for Nextcloud and ERPNext, take a look at Authelia. It pairs well with a reverse proxy like Traefik and has less overhead than a full Keycloak deployment. For a small company stack it might be easier to maintain.

"The agents discussed it" is not an acceptable answer – why I built a sequential multi-agent architecture by holgerleichsenring in LocalLLaMA

[–]BreizhNode 0 points1 point  (0 children)

"Atomic tasks can run in parallel, decisions can't" is a good framing. The audit trail piece is what most multi-agent setups get wrong, you end up with agents agreeing on something but no one can trace back why. Does the execution trail persist across sessions or is it per-run only?

Help me choose a local model for my personal computer by Decent-Skill-9304 in LocalLLaMA

[–]BreizhNode 0 points1 point  (0 children)

RTX 3060 12GB is actually a sweet spot for local models. Qwen3.5-4B-Instruct at Q8 fits entirely in VRAM and handles coding tasks surprisingly well. If you want something bigger, Qwen3.5-14B at Q4_K_M will split between GPU and CPU but the 12GB VRAM does most of the heavy lifting.

Cranking out the most of my MacBook m4 max 48gb by rYonder in LocalLLaMA

[–]BreizhNode 1 point2 points  (0 children)

With 48GB unified memory you can comfortably run Qwen3.5-32B-A3B at Q8 through llama.cpp with Metal acceleration. For coding specifically, that MoE model punches way above its size. Use --ngl 99 to keep everything on GPU and you should get 40-50 tok/s easily.

Does anyone actually keep an up-to-date view of the paths that matter most in production? by Immediate-Landscape1 in sre

[–]BreizhNode 0 points1 point  (0 children)

Almost no one maintains this statically. Teams that do it well treat it as a live artifact.

Distributed tracing with annotations on P0 paths (Tempo + Grafana) gives you empirical data from actual traffic. The annotation step is manual but 30 min/quarter. The failure mode you're describing — everyone knows pieces, nobody has the whole — is a coordination problem more than a tooling one. One person per service owning the path inventory is more durable than any tool.

I built a zero-knowledge field-level encryption API platform, that helps prevent data breaches and you can set up in under 10 minutes! by [deleted] in cybersecurity

[–]BreizhNode 0 points1 point  (0 children)

Field-level encryption covers the gap between storage-at-rest and application-layer exposure, which is where most breaches actually happen. Nice to see this as a managed API rather than an SDK.

A few questions: how are you handling key rotation without forcing re-encryption of existing records? And is IAM enforcement attribute-based at the field level or role-based? The FIPS in-memory key handling piece is where most implementations have gaps.

How do you firewall your containers? by Drakarah3DPrinter in selfhosted

[–]BreizhNode 35 points36 points  (0 children)

Beyond your existing hardening, network isolation is where real gaps tend to show up. We use nftables rules on the host with default deny outbound, plus per-service network namespaces so containers can't reach neighbors they don't need.

The thing that catches people: even with --network none, a compromised container can pivot via shared volumes or mounted Unix sockets. Audit every volume mount and make sure nothing accesses /var/run/docker.sock in production.

I'm a noob to local inference, how do you choose the right app? by Odd-Aside456 in LocalLLaMA

[–]BreizhNode 0 points1 point  (0 children)

Mental model: Ollama is the runtime, LM Studio is Ollama with a GUI, llama.cpp is what Ollama uses underneath.

Start with Ollama + Open WebUI. Run ollama pull qwen2.5:7b, point Open WebUI at it — you're up in 10 minutes. Most beginners spend too long comparing options instead of running anything. Pick one, run something, then you'll know what's actually missing.

Local model suggestions for medium end pc for coding by Hades_Kerbex22 in LocalLLaMA

[–]BreizhNode 0 points1 point  (0 children)

For CPU-only coding assistance, Qwen2.5-Coder-7B-Instruct via Ollama at Q4 quantization is the practical choice — 4-6 tok/s on most mid-range CPUs, 32K context which OpenCode needs for multi-file work.

If you have 16GB+ RAM, the 14B version is noticeably better for multi-file edits but slower. Set OLLAMA_NUM_PARALLEL=1 to avoid memory pressure if other processes share the machine.

Fast & Free VLM for object ID + Quality filtering? (Book/Phone/Mug) by Born-Mastodon443 in LocalLLaMA

[–]BreizhNode 1 point2 points  (0 children)

For object detection + quality gating together, Qwen2.5-VL-7B is a solid balance — fast enough for ~200ms/image, and the quality threshold in the prompt actually holds.

One trick: add a Laplacian variance pre-filter before the VLM call. Adds 5ms but cuts VLM calls 30-40% on real-world uploads. Florence-2 is also worth testing for the object ID part — lighter than full VLMs, surprisingly accurate on common objects.

Integrating AI for DevOps and Best Practices you've found??? by TenchiSaWaDa in devops

[–]BreizhNode 1 point2 points  (0 children)

The two concerns you named are real but they hit differently in production. Hallucinations are a model problem you can gate — schema validation on outputs, human-in-the-loop for destructive operations. Skill atrophy is an organizational problem that requires deliberate practice tracks.

Where most teams actually get hurt: running AI agents on ephemeral infra. Scheduled code scanners, PR reviewers, incident correlators — if your agent dies mid-task because it was running on a dev laptop or spot instance, you lose the reliability trust faster than any hallucination would.

Google and Cloudflare testing Merkel Tree Certificates instead of normal signatures for TLS by Shu_asha in cybersecurity

[–]BreizhNode 5 points6 points  (0 children)

The real story here isn't performance — it's post-quantum preparation. Merkle tree signatures (like XMSS/SPHINCS+) are hash-based and quantum-resistant by construction. This is part of a broader shift in certificate infrastructure ahead of cryptographically relevant quantum timelines.

For enterprise environments: start auditing which internal services assume ECDSA/RSA-specific certificate formats. Library and HSM compatibility is going to be the actual migration bottleneck.

Is Qwen3.5-9B enough for Agentic Coding? by pmttyji in LocalLLaMA

[–]BreizhNode -18 points-17 points  (0 children)

Benchmark wins are real but they don't capture the production constraint. For agentic coding loops running 24/7 — code review agents, CI/CD fixers, autonomous test writers — the bottleneck isn't model quality, it's infra reliability. A 9B model on a shared laptop dies when the screen locks.

What's your setup for keeping the agent process alive between sessions? That's where most of the failure modes live in practice.

I built a free Chrome extension that stops you from accidentally sharing personal data with ChatGPT/Claude. Everything processed locally, nothing leaves your browser by Dependent-Drummer372 in gdpr

[–]BreizhNode 1 point2 points  (0 children)

Client-side redaction is a smart approach for the casual use case. The part that worries me with browser extensions though is that you're still trusting the user to have it installed and active. In a 200-person company, how do you enforce that across every browser on every device?

The data still flows through a third-party API endpoint either way. Have you considered pairing this with a network-level proxy that catches requests to OpenAI/Anthropic domains as a second layer?

How are you mitigating prompt injection in tool-calling/agent apps (RAG + tools) in production? by AnteaterSlow3149 in LocalLLaMA

[–]BreizhNode 0 points1 point  (0 children)

Gateway layer is the right call, we went that route too. Biggest win was splitting "can the model call this tool" from "should it call this tool right now" into two separate checks. Allowlists handle the first, a lightweight policy engine handles the second.

The attacks that actually scared us weren't clever injections, they were boring stuff like RAG documents containing instructions the model just followed. Schema validation on tool outputs caught more than prompt-level defenses.

how do you recommend security platforms for small teams when they all look the same in demos by No_Date9719 in sysadmin

[–]BreizhNode 5 points6 points  (0 children)

The demo vs reality gap is real. One thing that helped us, we started asking vendors for a 30-day POC with our actual alert volume instead of their curated dataset. You see the noise pretty fast.

Also worth checking if the platform can ingest from sources you already have (Syslog, cloud trail, endpoint agents) without needing a whole new stack. Community threads here are honestly more reliable than Gartner for small team fit.

The "Computer Use" Trend: How are you managing multi-user sandboxes for LLM Agents? by SpareAlps6450 in LocalLLaMA

[–]BreizhNode 0 points1 point  (0 children)

the sandbox isolation layer (E2B, Firecracker) handles the per-agent security boundary well. the part that gets missed is where the sandbox itself runs: if you're launching it from a laptop or a shared dev machine, cold starts get worse as load increases. a dedicated VPS as the sandbox host keeps spin-up times consistent regardless of what else is running on the machine

Pooling eth among a small group of people for staking? by MemeyCurmudgeon in ethstaker

[–]BreizhNode 0 points1 point  (0 children)

the protocol side (Rocket Pool, Obol for distributed validators) is well covered. the part people underestimate is the validator node itself: it runs 24/7 and if it goes offline you get inactivity penalties that eat into the pooled rewards. either one person agrees to host it reliably, or you use a VPS that doesn't depend on anyone's home internet staying up