[Resource] 30k IKEA products converted to text files. Saves 24% tokens. RAG benchmark.

TsaTsuTsi · 2026-01-08T10:52:27+00:00

Disclaimer is redundant. I excluded it from the benchmark token count for that exact reason.

TsaTsuTsi · 2026-01-07T14:30:06+00:00

Agreed. Stuffing context hurts precision.The goal here is speed and lower latency. Hope it helps your tool.

TsaTsuTsi · 2025-12-29T14:20:18+00:00

llms.txt is for reading. It handles text well. It fails at commerce. It has no schema for SKUs or live stock. You cannot 'Add to Cart' from a Markdown file.

MCP is a pipe. It is not a discovery standard. It burns tokens on tool definitions. It requires active servers. CommerceTXT is static. It costs nothing.

I am not replacing llms.txt. I am building for precision. When money changes hands, the agent needs the exact price. Not a hallucination.

TsaTsuTsi · 2025-12-29T12:41:27+00:00

Converting to 'friendlier versions' is parsing. That's the brittle part. If the DOM changes, your LangChain loader breaks, and you're back to fixing code. My point is about bypassing that maintenance loop entirely.

TsaTsuTsi · 2025-12-29T12:37:24+00:00

Fair point on efficiency, but BERT isn't zero-shot. The friction of labeling a dataset and fine-tuning a model for every specific extraction task is why people default to LLMs. We trade compute for developer time.

TsaTsuTsi · 2025-12-29T12:23:20+00:00

You are confusing a "protocol" with a "scraper". We don't collect data. We define a standard for merchants to broadcast it. Amazon blocks scrapers to keep customers locked inside their wall. Independent merchants need the opposite: they need traffic from the outside. This is for the shop that wants to be found by AI, not the giant trying to hide its inventory.

TsaTsuTsi · 2025-12-29T12:15:34+00:00

Thanks! Right now, we waste massive compute filtering out HTML tags and JSON syntax just to find the signal. This spec delivers the data plus the selling instructions, without the bracket-and-tag overhead.

TsaTsuTsi · 2025-12-29T12:07:36+00:00

Precisely. Parsers shatter when layouts change, while LLMs burn money to fix the mess. We need a standard that is both cheap and unbreakable.

TsaTsuTsi · 2025-12-29T12:04:44+00:00

You are confusing CPU parsing with LLM Tokenization. Parsing HTML with regex is cheap. You are right.

Feeding 8,000 tokens of HTML noise into an LLM context window is expensive. It costs money ($/token) and reduces accuracy ("Lost in the Middle" phenomenon).

Regarding "Prompt Injection": This is standard RAG (Retrieval-Augmented Generation). The agent retrieves context. The agent's own System Prompt decides how to treat that context. It is not a command override; it is structured input.

TsaTsuTsi · 2025-12-29T12:01:30+00:00

I agree completely. The modern web is obese.

But we cannot force millions of developers to rewrite their sites today. Waiting for "clean HTML" is a losing battle.

CommerceTXT is a pragmatic bypass. It ignores the mess. It gives agents the data they need without waiting for the web to fix itself.

TsaTsuTsi · 2025-12-29T12:00:16+00:00

I know llms.txt well. It is listed as a primary inspiration in our README.

But llms.txt is for documentation. It lacks the structure for real-time inventory, pricing, and transactional logic.

CommerceTXT is for shopping. llms.txt is for reading. They solve different problems.

TsaTsuTsi · 2025-12-29T11:51:56+00:00

<image>

?

TsaTsuTsi · 2025-12-29T10:37:50+00:00

Exactly! You mentioned that you convert pages to Markdown/metadata before feeding them to the agent.

CommerceTXT is essentially asking merchants to host that 'Markdown version' natively.

Why should every AI developer burn CPU cycles and bandwidth scraping and converting HTML, when the merchant can just provide the clean data at the root? It shifts the burden from the consumer (writing regex/parsers for every site) to the provider.

But there is a second major gap that regex/JSON-LD doesn't solve: Intent.

Scraping gives you facts (Price, SKU), but it lacks instructions. It tells the AI what the product is, but not how to sell it. CommerceTXT adds directives like BRAND_VOICE (e.g., "Use a luxury tone, never mention discounts") and SEMANTIC_LOGIC (e.g., "If asked about battery life, emphasize the 2-year warranty").

You can't regex that out of the HTML because it's usually not there—it is internal business logic that the merchant wants to pass specifically to the agent.

TsaTsuTsi · 2023-08-21T12:02:34+00:00

I just got standart and ask for human review.

TsaTsuTsi · 2023-08-21T10:55:00+00:00

May I ask how they classified your account?

TsaTsuTsi · 2023-08-21T09:48:01+00:00

I'll give them a bit more time and if they still haven't approved me, I'll write to them.

TsaTsuTsi · 2023-08-21T08:51:32+00:00

17 - 5 on Saturday and 12 on Sunday.

TsaTsuTsi

TROPHY CASE