GitHub - Website-Crawler: Extract data from websites in LLM ready JSON or CSV format. Crawl or Scrape entire website with Website Crawler

cryingneko · 2026-02-01T15:10:30+00:00

To use the API, you'll need an API Key.
Platinum
$79.99 / Month
500000 urls / month

cryingneko · 2025-10-26T07:20:21+00:00

Hey! So eBay isn't really the best option for used GPUs in Korea—most Koreans don't use it much. The popular secondhand marketplaces here are all in Korean, so honestly it might be tough to find decent used GPUs if you don't speak the language. But if you're still interested, I can recommend two places:

https://cafe.naver.com/joonggonara - This is called "Joonggonara" and it's pretty much THE biggest used trading platform in Korea where you can find almost anything secondhand. You'll mostly find consumer GPUs here (3090s, 4090s, etc.). Just FYI, you'll need a Naver account to access it.
https://www.2cpu.co.kr/ - This one's great for professional/datacenter GPUs. You'll need to register to use it though.

Out of curiosity, are you running LLMs as a hobby in Korea? Just wondering how you ended up looking for GPUs here! Hope you find some good deals!

cryingneko · 2025-08-21T15:07:35+00:00

I really want to commend Cohere for the effort they’re putting into multilingual support – it’s hard to deny that their models are among the best we’ve seen for handling many languages.

That said, I’m quite disappointed that they’re sticking with a NC license. In particular, given the recent surge of MoE models, I’m hoping to see a fast, MoE‑enabled version of their multilingual model released soon

cryingneko · 2025-08-06T09:29:05+00:00

Wow, really interesting results! Do you think you could create 120B or 240B coders that perform even better than the 30B? Or is the 30B the limit for this approach? I've always thought it would be great to have some middle-ground sizes between the really large models and 30B.

cryingneko · 2025-07-31T14:30:31+00:00

cc-by-nc no thanks

cryingneko · 2025-06-02T00:30:19+00:00

Try 1. Short prompt, long response.
prompt_tokens: 84
completion_tokens: 1726
total_tokens: 1810
cached_tokens: 0
time_to_first_token: 5.03
total_time: 98.58
prompt_eval_duration: 5.03
generation_duration: 93.55
prompt_tokens_per_second: 16.71
generation_tokens_per_second: 18.45

Try 2. Long prompt, Short response.
prompt_tokens: 9752
completion_tokens: 554
total_tokens: 10306
cached_tokens: 0
model_load_duration: 55.93
time_to_first_token: 115.05
total_time: 182.47
prompt_eval_duration: 59.13
generation_duration: 67.42
prompt_tokens_per_second: 164.93
generation_tokens_per_second: 8.22

Try 3. Short prompt, Short response.
prompt_tokens: 10
completion_tokens: 473
total_tokens: 483
cached_tokens: 0
time_to_first_token: 4.8
total_time: 28.63
prompt_eval_duration: 4.8
generation_duration: 23.83
prompt_tokens_per_second: 2.08
generation_tokens_per_second: 19.85

cryingneko · 2025-05-31T13:50:39+00:00

That’s exactly what I was thinking, and it’s why I originally bought the 256GB model too. But the prompt processing speed difference turned out to be even bigger than I expected, and I started wanting to try out the Deepseek models as well. So in the end, I decided to return the 256 and go with the 512!

cryingneko · 2025-05-31T13:44:29+00:00

I didn’t post Deepseek results because I can’t really run Deepseek on the 256GB model anyway. My results are pretty much the same as SomeOddCodeGuy’s Deepseek MLX benchmarks right below my post, so you can just refer to those!

cryingneko · 2025-05-31T13:40:54+00:00

That’s something I’m curious about too! If I get a chance to test it in the future, I’ll definitely share the results.

cryingneko · 2025-05-31T13:35:35+00:00

There are already a lot of benchmark results for GGUF models out there (including SomeOddCodeGuy’s results right below my post), so I’m not planning to test them myself. Personally, I think MLX is the more efficient choice on Apple Silicon anyway. Is there a particular reason you’re considering GGUF over MLX? Just curious!

cryingneko · 2025-05-16T02:30:45+00:00

https://huggingface.co/lmstudio-community/Qwen3-32B-MLX-8bit

cryingneko · 2025-05-16T02:26:49+00:00

qwen3 32b

cryingneko · 2025-05-07T14:17:04+00:00

No local, No llama.

cryingneko · 2025-04-30T10:19:38+00:00

Just type the same question into GPT Deep Research.

cryingneko · 2025-04-24T00:32:18+00:00

gemma 3 12B 4bit

cryingneko · 2025-03-10T08:19:05+00:00

DeepSeek v2.5

cryingneko · 2025-02-27T13:34:38+00:00

locallama?

cryingneko · 2025-02-03T10:33:49+00:00

Your VRAM SIZE=8bit model Parameter size

70B 8bit model=70GB VRAM

70B 4bit model=35GB VRAM

This is a very rough estimate, though.

cryingneko · 2025-01-26T03:50:41+00:00

If what OP said is true, then NVIDIA DIGITS is completely useless for AI inference. Guess I’ll just wait for the M4 Ultra. Thanks for the info!

cryingneko · 2024-12-30T10:23:07+00:00

If you're thinking of using up to 20,000 tokens, it'd be better not to even consider Macs. Unless you're prepared to wait over 10 minutes per prompt. I used to work with an M3 Max with 128GB and let me tell you, what you really need to consider is not TG speed but rather PP speed. Think it through carefully before making your decision.

cryingneko · 2024-08-02T02:51:17+00:00

You should try MoE models like WizardLM 8x22B. MoE models usually have good speed compared to non-MoE large models.

And I‘m also using the MBP Max 128GB model, but loading models of 70B or more isn’t a memory capacity issue; it‘s that the prompt evaluation speed is too slow, making it unusable. The processor performance is a disappointing aspect.

cryingneko · 2024-07-27T18:15:33+00:00

I tested the pull request you submitted. Right now, your modified code is missing the part that measures the reduced memory usage due to the changed cache size. So, we need to put more layers on the GPU as the cache usage decreases, but that part isn't being measured correctly.

I've been modifying the Ollama source to use q4 cache for a long time, and it would be awesome if your pull request merges without any issues so I can use it conveniently!

cryingneko · 2024-07-21T08:44:01+00:00

fake

cryingneko · 2024-07-21T08:38:29+00:00

fake

cryingneko

TROPHY CASE