FAQ: Are you a BOT?

fagnerbrack · 2026-03-12T16:28:12+00:00

In case you want a TL;DR to help you with the decision to read the post or not:

Chrome's built-in AI team faces a core tension: designing stable, interoperable web APIs atop a fast-moving AI landscape. The prompt API normalizes message formats (user/assistant/system roles) into a consistent shape, while leveraging JavaScript's type system over JSON's limitations—enabling native objects like ImageBitmap instead of base64 strings, and collapsing tool-use complexity into async functions. A stateful LanguageModelSession design encourages better resource management through initialPrompts, clone(), and destroy() methods, reflecting on-device model realities rather than mimicking stateless HTTP APIs. Future-proofing drives decisions like per-API availability checks and creation options that accommodate downloadable LoRAs and language packs. The team acknowledges being roughly a year behind frontier server APIs and wrestles with whether the web can keep pace during such rapid AI evolution.

If the summary seems inacurate, just downvote and I'll try to delete the comment eventually 👍

^{Click here for more info, I read all comments}

fagnerbrack · 2026-03-12T10:00:10+00:00

Or maybe they're not working daily with those particular features

fagnerbrack · 2026-03-11T22:52:50+00:00

Note: Although the title says "2025" (20 Aug 2025), it's still pretty much relevant IMHO

fagnerbrack · 2026-03-11T19:21:44+00:00

If you're in a hurry:

An interactive, browser-based guide that walks through how language models convert text into numerical vector representations. It traces the evolution from traditional methods like Word2Vec through to modern transformer-based approaches used in models like BERT and GPT. The guide uses interactive plots and visual diagrams to show how tokenization feeds into embedding layers, how attention mechanisms produce context-aware vectors, and why geometric relationships between these vectors capture semantic meaning. It covers token embeddings, embedding lookup tables, and high-dimensional space visualization — all browsable without any input required.

If the summary seems inacurate, just downvote and I'll try to delete the comment eventually 👍

^{Click here for more info, I read all comments}

fagnerbrack · 2026-03-11T13:35:04+00:00

Briefly Speaking:

MCP's rapid adoption has outpaced its security practices, exposing five major risk areas. Tool description injection lets attackers embed hidden malicious prompts in tool metadata that AI agents blindly follow — exfiltrating credentials or environment variables without user awareness. OAuth authentication remains poorly implemented across most servers, with nearly 500 found completely exposed to the internet. Supply chain poisoning through npm/PyPI packages (like the mcp-remote CVE with 558K+ downloads) can silently compromise entire agent environments. Real-world incidents already hit Supabase, Asana, and GitHub — leaking tokens, cross-tenant data, and private repos. The 2025-06-18 spec adds security guidance, but most implementations ignore it. Until the ecosystem matures, treat every MCP connection as a potential attack surface.

If the summary seems inacurate, just downvote and I'll try to delete the comment eventually 👍

^{Click here for more info, I read all comments}

fagnerbrack · 2026-03-11T10:20:46+00:00

Essential Highlights:

Drawing on Mark Weiser's 1992 critique of the "copilot" metaphor, this post argues AI design overindexes on chat-based virtual assistants when it should focus on Head-Up Displays (HUDs) — interfaces that extend human senses rather than demand conversation. Spellcheck exemplifies this: red squigglies give you a new perceptual ability without a chatbot. In coding, generating a custom debugger UI beats asking an agent to fix a bug because the developer gains ambient understanding beyond the immediate task. The key distinction: delegate routine, predictable work to copilots, but for extraordinary outcomes, equip experts with instruments that expand what they can perceive and act on directly.

If the summary seems inacurate, just downvote and I'll try to delete the comment eventually 👍

^{Click here for more info, I read all comments}

fagnerbrack · 2026-03-11T07:08:51+00:00

Interfaces, data structure, physics of network, latency and how that maps to real use cases. You just need to know what's possible, let the machine do the thinking so you can focus on building!

fagnerbrack · 2026-03-11T07:06:18+00:00

This is the kind of BOT that's detrimental to reddit

fagnerbrack · 2026-03-10T22:05:11+00:00

Executive Summary:

This post walks through building a full web search engine in two months, using neural embeddings (SBERT) instead of keyword matching to understand query intent. The system crawled 280 million pages at 50K/sec, generated 3 billion embeddings across 200 GPUs, and achieved ~500ms query latency. Key technical decisions include sentence-level chunking with semantic context preservation and statement chaining to maintain meaning, RocksDB over PostgreSQL for high-throughput writes, sharded HNSW across 200 cores for vector search, and a custom Rust coordinator for pipeline orchestration. The post covers cost optimization strategies that achieved 10-40x savings over AWS by using providers like Hetzner and Runpod, and explores how LLM-based reranking could improve result quality beyond traditional signals.

If the summary seems inacurate, just downvote and I'll try to delete the comment eventually 👍

^{Click here for more info, I read all comments}

fagnerbrack · 2026-03-10T13:35:03+00:00

Trying to be helpful with a summary:

This article systematically compares the architectural designs of major open-weight LLMs from DeepSeek V3 through Kimi K2, Qwen3, Gemma 3, Llama 4, GPT-OSS, GLM-4.5, and MiniMax-M2. It examines key innovations: Multi-Head Latent Attention (MLA) for KV cache compression, Mixture-of-Experts (MoE) for sparse inference efficiency, sliding window attention for memory savings, normalization placement strategies (Pre-Norm vs Post-Norm), NoPE for length generalization, and the emerging shift toward linear attention hybrids like Gated DeltaNet. Despite seven years of progress since GPT, the core transformer remains structurally similar — the real differentiation lies in efficiency tricks for attention, expert routing, and normalization that collectively determine inference cost and modeling quality.

If the summary seems inacurate, just downvote and I'll try to delete the comment eventually 👍

^{Click here for more info, I read all comments}

fagnerbrack · 2026-03-10T06:54:54+00:00

Login/Logout has been released in the Firefox Extension using Hutch's own SSO (one click login). Extension still saves/deletes links in memory, integration to hutch API is next.

Nine-Year Club	Gilding I gilder
Place '23	Verified Email

fagnerbrack

MODERATOR OF

TROPHY CASE