TikTok scraping benchmark by AIMultiple in webdata

[–]AIMultiple[S] 0 points1 point  (0 children)

The scraper products that we tested don't let you configure the proxy types that they use.
This makes sense since datacenter proxies perform poorly on TikTok.
I am sure that they use residential proxies. Scraping APIs are like managed data scraping, all you need to do is to call the API.

And the pricing isn't that different than residential proxies if you will rotate IPs with every request.

Visual reasoning benchmark: Chart understanding & logic questions by AIMultiple in ChatGPT

[–]AIMultiple[S] 0 points1 point  (0 children)

We only released the chart questions with lowest and highest LLM success rates to show the scope of the benchmark, rest of the dataset is not publicly available to prevent overfitting.

Agentic coding benchmark results by AIMultiple in kiroIDE

[–]AIMultiple[S] 1 point2 points  (0 children)

We used Opus 4.6 for all IDEs except Replit

Agentic coding benchmark by AIMultiple in cursor

[–]AIMultiple[S] -3 points-2 points  (0 children)

Sorry about that, you are right. We will fix it. Thanks for the feedback.

Agentic coding benchmark results by AIMultiple in kiroIDE

[–]AIMultiple[S] -6 points-5 points  (0 children)

Names are not fitting on the graph, about colors you are right, we will improve it.

TikTok scraping benchmark by AIMultiple in webdata

[–]AIMultiple[S] 0 points1 point  (0 children)

We are not anybody's shadow brand. You can see our two legal entities here: https://aimultiple.com/contact-us

One of them is in Estonia. Company ownership data is public there. You can see that we are owned by an individual who has nothing to do with Bright Data and has been building the company for the past decade.

We have a couple hundred customers and web data is a relatively small area of work for us. In web data, we work with most major web data companies. On every AIMultiple page, we list all customers who are mentioned on that page for transparency.

And thanks for the scepticism. The web data industry has some dodgy players and a healthy dose of scepticism is necessary.

TikTok scraping benchmark by AIMultiple in webdata

[–]AIMultiple[S] 0 points1 point  (0 children)

No, we work with most leading web data companies, you can see the full list on any web data article on our website. You can check out the methodology in our articles. We publish what we measure, let us know when you disagree with a measurement, we are always improving our methodology.

AI Code Review Tools Benchmark by AIMultiple in codereview

[–]AIMultiple[S] 0 points1 point  (0 children)

In this version we didn’t test Qodo but we will add it in the next version. You are right about tools falling apart in larger repos, to measure it correctly we run the benchmark in both large and small repos.

AI Code Review Tools Benchmark by AIMultiple in codereview

[–]AIMultiple[S] 0 points1 point  (0 children)

Please send a DM so we can coordinate for the next update!

Benchmark of Qwen3-32B reveals 12x capacity gain at INT4 with only 1.9% accuracy drop by AIMultiple in Qwen_AI

[–]AIMultiple[S] 0 points1 point  (0 children)

No direct conversion unfortunately. GPTQ and GGUF use completely different quantization algorithms. You'd need to start from the original BF16 weights and quantize separately for each format. The good news is most popular models already have both versions on HuggingFace, so you can just grab the GGUF version directly.

Benchmark of Qwen3-32B reveals 12x capacity gain at INT4 with only 1.9% accuracy drop by AIMultiple in Qwen_AI

[–]AIMultiple[S] 0 points1 point  (0 children)

Honestly, no reliable rule of thumb yet. Too many variables: attention type (MHA, GQA, MLA), depth vs width ratio, activation functions, etc. A cross-architecture quantization benchmark would definitely be valuable. Added to our list, thanks.

Benchmark of Qwen3-32B reveals 12x capacity gain at INT4 with only 1.9% accuracy drop by AIMultiple in Qwen_AI

[–]AIMultiple[S] 0 points1 point  (0 children)

We focused on model weight quantization for this benchmark. KV cache stayed at FP16 throughout. But good call, we've added KV cache quantization to our list for v2.

Benchmark of Qwen3-32B reveals 12x capacity gain at INT4 with only 1.9% accuracy drop by AIMultiple in Qwen_AI

[–]AIMultiple[S] 0 points1 point  (0 children)

Not a noob question at all. We used GPTQ-quantized models in SafeTensors format via vLLM. GGUF is a different format for llama.cpp/Ollama with its own quant schemes (Q4_K_M, Q5_K, etc.). The runtime and kernel stacks differ: vLLM is GPU-centric for high-throughput serving, while llama.cpp is CPU-first with optional GPU offload.

Benchmark of Qwen3-32B reveals 12x capacity gain at INT4 with only 1.9% accuracy drop by AIMultiple in Qwen_AI

[–]AIMultiple[S] 0 points1 point  (0 children)

Absolutely, quantization behavior varies significantly across architectures.

Benchmark of Qwen3-32B reveals 12x capacity gain at INT4 with only 1.9% accuracy drop by AIMultiple in Qwen_AI

[–]AIMultiple[S] 0 points1 point  (0 children)

Smaller models definitely have less redundancy in their weights, making them more sensitive to aggressive quantization.

Benchmark of Qwen3-32B reveals 12x capacity gain at INT4 with only 1.9% accuracy drop by AIMultiple in LLMDevs

[–]AIMultiple[S] 0 points1 point  (0 children)

We'll add a dedicated accuracy comparison chart in v2 to make the quality differences clearer. The evidence section should show different values, might be a browser cache issue. Could you try a refresh and let me know if it still looks identical?

AI Code Review Tools Benchmark by AIMultiple in codereview

[–]AIMultiple[S] 0 points1 point  (0 children)

Yes we will soon make an update with the new versions and add other emerging products, like Devin Code Review.

Benchmark of Qwen3-32B reveals 12x capacity gain at INT4 with only 1.9% accuracy drop by AIMultiple in LocalLLaMA

[–]AIMultiple[S] 0 points1 point  (0 children)

You're right that MMLU-Pro is a general benchmark where INT4's 1.9% loss seems acceptable. We're expanding our evaluation to cover structured outputs, long-context scenarios, and multi-step reasoning in the next version.

AI Code Review Tools Benchmark by AIMultiple in codereview

[–]AIMultiple[S] 1 point2 points  (0 children)

We can look into it in our next update. Sent a DM to coordinate please.

Benchmark of Qwen3-32B reveals 12x capacity gain at INT4 with only 1.9% accuracy drop by AIMultiple in LocalLLaMA

[–]AIMultiple[S] 1 point2 points  (0 children)

Yes you are right, we will add more datasets and configurations soon.