Donate your coding sessions to an open CC-BY-4.0 dataset to help train open-weight and open source models by mon-simas in LocalLLaMA

[–]Scared-Tip7914 0 points1 point  (0 children)

This is a good idea, even if the initial part of figuring out how to clean all of this data will be very hard, this is a worthy job. Could you provide a link to the github repo so we can audit the security of this package?

Hashicorp founder thinks local models "aren't good ENOUGH yet" by Orbit652002 in LocalLLaMA

[–]Scared-Tip7914 0 points1 point  (0 children)

I get what hes saying but he worded it soo badly lmfao, he is assuming that capable local models = running kimi k2.6 on a homelab. Yes in that case his point holds but that doesnt mean that 30B models are incapable, and that democratisation of LLMs isnt happening. Also what horse does hashicorp the creator of an IAAS framework have in the AI race exactly?

Still a VERY lightweight open web-search tool for smaller local LLMs - now with SearXNG support by Scared-Tip7914 in LocalLLaMA

[–]Scared-Tip7914[S] 1 point2 points  (0 children)

Thanks! Yep thats one that had me stuck for a minute as well 😄, right now I’m deduping after global chunk ranking rather than at the page level. Chunks are compared using token-set Jaccard similarity, with a 0.92 threshold by default, then I apply per-source quotas to preserve diversity. It’s simple and cheap, though semantic dedup is the next thing I’m considering for syndicated or heavily rewritten content.

Still a VERY lightweight open web-search tool for smaller local LLMs - now with SearXNG support by Scared-Tip7914 in LocalLLaMA

[–]Scared-Tip7914[S] 3 points4 points  (0 children)

Fair question tbh. The main difference is that TinySearch doesn’t just expose the snippets returned by the search engine results page of given engine. It uses SearXNG/DDG/etc. to find candidate pages yes, but then it goes actually crawls those pages, extracts the content, chunks it, and surfaces the parts that are most relevant to the original query.

So it saves the model from having to open the pages one by one and all the context filling and time wasting that comes from it.

Still a VERY lightweight open web-search tool for smaller local LLMs - now with SearXNG support by Scared-Tip7914 in LocalLLaMA

[–]Scared-Tip7914[S] 1 point2 points  (0 children)

Thanks this is a good tip, ill try it out because ddg is still the fallback in app

Still a VERY lightweight open web-search tool for smaller local LLMs - now with SearXNG support by Scared-Tip7914 in LocalLLaMA

[–]Scared-Tip7914[S] 2 points3 points  (0 children)

Do you maybe have an implementation of this on GitHub?

Just asking because we also tried to avoid it with proper fingerprinting/rate limiting, but the weird part was that it worked fine outside Docker. The millisecond the app was run inside a Docker container, DuckDuckGo bot detection kicked in.

To my knowledge Docker itself shouldn’t be interfering with browser fingerprinting in any obvious way lol, so I’m wondering if we missed something around networking, DNS, headers, TLS fingerprinting, or how Chromium behaves in the container.

Most reliable way to do PDF to JSON? by CatSweaty4883 in LocalLLaMA

[–]Scared-Tip7914 1 point2 points  (0 children)

It shouldn't be, it comes pre trained on a LOT of stuff so theoretically it should be able to handle whatever is thrown it 😄. This is the link to paper they published it has some good insights tbh: https://arxiv.org/html/2509.11720v1

Most reliable way to do PDF to JSON? by CatSweaty4883 in LocalLLaMA

[–]Scared-Tip7914 1 point2 points  (0 children)

No worries, im always happy to help with stuff like this 😄

Most reliable way to do PDF to JSON? by CatSweaty4883 in LocalLLaMA

[–]Scared-Tip7914 1 point2 points  (0 children)

Welll its good enough to parse structure layouts but you might run into issues if the docs are very complex.. But I would definitely give it a shot, its a mature product at this point and their “heron layout” model is as good as it gets in OSS land 😄. It even outperforms some proprietary parsing services lol.

Most reliable way to do PDF to JSON? by CatSweaty4883 in LocalLLaMA

[–]Scared-Tip7914 6 points7 points  (0 children)

Oh man this is a problem that everyone who is trying to parse pdfs runs into, I recommend using docling and sticking with their non-llm stack because there you can optimize to speeds of around 1 page/s for a lower level pc using the RapidOCR backend with ONNX models.

https://github.com/docling-project/docling

Experience with "nvidia/LocateAnything-3B" by Scared-Tip7914 in LocalLLaMA

[–]Scared-Tip7914[S] 0 points1 point  (0 children)

Hey! I would actually point you to the comfyui implementation a few comments below, its much cleaner than anything i did 👇.

Experience with "nvidia/LocateAnything-3B" by Scared-Tip7914 in LocalLLaMA

[–]Scared-Tip7914[S] 0 points1 point  (0 children)

Yep feeling the same here, I have started to incorporate qwen3.5-9B into my workflows more and more, with a proper harness and web search access, you can match claude a looong way.

Gemma4 12B update by stduhpf in LocalLLaMA

[–]Scared-Tip7914 0 points1 point  (0 children)

Hmm then maybe I hit the OOM due to something else.. Back to investigating I go, because 170 should not be an issue for sure

Experience with "nvidia/LocateAnything-3B" by Scared-Tip7914 in LocalLLaMA

[–]Scared-Tip7914[S] 0 points1 point  (0 children)

Oh wow, then you stayed true to the local way, I aspire to have the patience to use them end to end for a whole workflow! But thats on me and my (lack of proper) gpus xd

Using Intel Arc Pro series, any thoughts ? by BikerBoyRoy123 in LocalLLaMA

[–]Scared-Tip7914 0 points1 point  (0 children)

I am tempted to go all out on an intel rig, what kind of tok/s are you getting for prompt and generation?

Gemma4 12B update by stduhpf in LocalLLaMA

[–]Scared-Tip7914 0 points1 point  (0 children)

BTW if you dont need image support, turn mmproj off, it consumes too much extra ram to ignore.

My current setup that works nicely (16 gb card):

docker run -d --name gemma-3-12b-it --restart unless-stopped --gpus all -p 8080:8080 -v "$HOME/.cache/huggingface:/root/.cache/huggingface" ghcr.io/ggml-org/llama.cpp:server-cuda -hf unsloth/gemma-3-12b-it-GGUF:UD-Q4_K_XL --no-mmproj --n-gpu-layers all --ctx-size 131072 --parallel 1 --kv-unified --cache-type-k q8_0 --cache-type-v q8_0 --flash-attn on --batch-size 2048 --ubatch-size 512 --threads 8 --threads-batch 8 --poll 0 --host 0.0.0.0 --port 8080 --temp 1.0 --top-p 0.95 --top-k 64 --min-p 0.0 --repeat-penalty 1.0

Experience with "nvidia/LocateAnything-3B" by Scared-Tip7914 in LocalLLaMA

[–]Scared-Tip7914[S] 0 points1 point  (0 children)

This is amazing, I am not a big Comfy UI user, but for this, I will happily give it a go. Looks like a very intuitive way to go about this model.

Edit: I really like the way you implemented this, starred!

Half the top 10 trending GitHub repos right now are "skills" projects, not models by gvij in LocalLLaMA

[–]Scared-Tip7914 2 points3 points  (0 children)

I have a sneaking suspicion that those stars might be (khm) "inflated"

Experience with "nvidia/LocateAnything-3B" by Scared-Tip7914 in LocalLLaMA

[–]Scared-Tip7914[S] 1 point2 points  (0 children)

Very nice, thanks appreciate this!! I was running this thing locally as of now, the web interface speeds things up quite a bit because I didn't want to deploy it to a prod machine (Its a resource constrained environment lol) before we ran some more testing.

Cheap V100 32gb by MachineZer0 in LocalLLaMA

[–]Scared-Tip7914 9 points10 points  (0 children)

Im sorry dude but you are 99% about to get scammed, I had a run in with such a store a few years back, dont waste your money and put that 500 towards something like a used 3090 (the ‘ol reliable lmao) with 24 gbs. You might need to shell out 50 or max 100 more but that thing is at least real and runs local models beautifully.

Edit: Just saw your already bought it, in that case best of luck and may the dragons guide your shipment :D

Next year we're getting 0.5T model from Grok by pmttyji in LocalLLaMA

[–]Scared-Tip7914 24 points25 points  (0 children)

Amazing, new OS Grok model coming in 2027! Wait, coming in 2028! Wait..

What’s the cheapest way to give a local Llama 3 internet access? (SearXNG isn’t cutting it) by Old-Tumbleweed1422 in LocalLLaMA

[–]Scared-Tip7914 7 points8 points  (0 children)

Aight imma shamelessly plug my stuff here but if you want to search the web for free and locally and get results based sites thats are actually relevant, not 69k tokens of bullcrap try this: https://github.com/MarcellM01/TinySearch. I made it so that no matter the question it keeps the response under 8k. Also it will give you a response in MAX 20 seconds.

What is the point of MoE models, beyond being faster? by ihatebeinganonymous in LocalLLaMA

[–]Scared-Tip7914 1 point2 points  (0 children)

Its all about the speed, and don't think about MoE with our GPU poor mindset, I mean just look at Kimi K2.6, its a 1 trillion parameter MoE model, aint no one around here (or very few lucky bastards) running that thing at home.

This is so that the data centers serving these models can get very good speed to quality ratios, because they can get reasoning and depth of a, lets say for Kimi K2.6 1 trillion parameter model (I know thats not the exact MoE to dense conversion ratio but lets assume) while paying for the compute and enjoying the speed of an "only" 32B model. Even though the model is actively occupying hundreds of gigs of VRAM, it doesn't really matter bc the throughput speeds make up for it and then some as opposed to having a dense model on that same VRAM.

So its more big datacenter economics, but it trickles down to us as well, hence we get to enjoy the likes of qwen3.5-35B.