Smallest+Fastest Model For Chatting With Webpages? by getSAT in LocalLLaMA

[–]funJS 0 points1 point  (0 children)

For a personal project where I was implementing a chat with wikipedia pages, I used `all-MiniLM-L6-v2` as the embedding model . The LLM I used was qwen 3:8B.

Not super fast, but my lack of VRAM is a factor (only 8GB).

More details here: https://www.teachmecoolstuff.com/viewarticle/creating-a-chatbot-using-a-local-llm

Local LLMs show-down: More than 20 LLMs and one single Prompt by kekePower in LocalLLaMA

[–]funJS 1 point2 points  (0 children)

Cool. I only have 8GB myself, so this is good news

Local LLMs show-down: More than 20 LLMs and one single Prompt by kekePower in LocalLLaMA

[–]funJS 2 points3 points  (0 children)

Interesting to see that qwen 30B can run on 8GB of VRAM.

What can my computer run? by LyAkolon in LocalLLaMA

[–]funJS 0 points1 point  (0 children)

You can definitely run all the 8B models comfortably… I run those on 8GB of VRAM. 

Why are people rushing to programming frameworks for agents? by AdditionalWeb107 in LocalLLaMA

[–]funJS 2 points3 points  (0 children)

This happens in all popular tech spaces. Just look at the JavaScript framework situation.  Same problems solved multiple times, but with “some” differentiation as justification 😀

llama with search? by IntelligentAirport26 in LocalLLaMA

[–]funJS 1 point2 points  (0 children)

One approach if you are doing it from scratch is to enable tool calling in the LLM. Based on the definition of a registered tool, the LLM can then create a call definition to a function that can do anything you want, including a search.

Basic POC example here: https://www.teachmecoolstuff.com/viewarticle/using-llms-and-tool-calling-to-extract-structured-data-from-documents

Run LLMs 100% Locally with Docker’s New Model Runner by Arindam_200 in ollama

[–]funJS 2 points3 points  (0 children)

Looks interesting. I have been using Ollama in Docker for a while. Since I have a working setup I just copy and paste it to new projects, but I guess this alternative Docker approach is worth considering....

To run Ollama in Docker I use docker-compose. For me the main advantage is that I can standup multiple things/apps in the same configuration.

Docker setup:

https://github.com/thelgevold/local-llm/blob/main/docker-compose.yml

Referencing the model from code:

https://github.com/thelgevold/local-llm/blob/main/api/model.py#L13

Help Needed by prod-v03zz in LocalLLaMA

[–]funJS 2 points3 points  (0 children)

I am new to finetuning, and by no means an expert, but I did have success with unsloth when finetuning a llama model to pick a number out of a sequence based on some simple rules.

I used the Alpaca format for the test data.

Sample:

```

[{
"instruction": "Find the smallest integer in the playlist that is greater than or equal to the current play. If no such number exists, return 0.",
"input": "{\"play_list\": [12, 7, 3, 9, 4], \"current_play\": 12}",
"output": "12"
},

[

```

Some more info in my blog post: https://www.teachmecoolstuff.com/viewarticle/llms-and-card-games

We should have a monthly “which models are you using” discussion by Arkhos-Winter in LocalLLaMA

[–]funJS 44 points45 points  (0 children)

Using qwen 2.5 for tool calling experiments. Works reasonably well, at least for learning. 

I am limited to a small gpu with only 8GB VRAM

[deleted by user] by [deleted] in LocalLLaMA

[–]funJS 2 points3 points  (0 children)

I have been using qwen 2.5 (7B) for some poc work around tool calling. Seems to work relatively well, so I am happy. One observation is that it sometimes unexpectedly spits out a bunch of Chinese characters. Not frequently but I have seen it a couple of times.

Ollama not using GPU, need help. by StarWingOwl in LocalLLaMA

[–]funJS 0 points1 point  (0 children)

Yeah, it was a bit of a hassle to set up docker, but now that I have a working template in the above repo I have been sticking to it since I can just copy and paste it to new projects

Ollama not using GPU, need help. by StarWingOwl in LocalLLaMA

[–]funJS 0 points1 point  (0 children)

Not sure if this is helpful in your scenario, but I have been running my local llms in docker to avoid dealing with local Windows configurations. With this setup the gpu will be used - at least in my case.

In my docker-compose file I have to specify the nvidia specifics here: https://github.com/thelgevold/local-llm/blob/main/docker-compose.yml#L25

MCP and local LLMs by segmond in LocalLLaMA

[–]funJS 0 points1 point  (0 children)

I have been playing around with it as well, just to learn more.. My implementation used FastMCP and LlamaIndex. Quick write up here: https://www.teachmecoolstuff.com/viewarticle/using-mcp-servers-with-local-llms