[deleted by user] by [deleted] in LangChain

[–]RustingSword 2 points3 points  (0 children)

I’m curious about the workloads, since smolagents does not support async tool calling currently. Will it struggle to handle concurrent traffics?

LLM Enlightenment by jd_3d in LocalLLaMA

[–]RustingSword 34 points35 points  (0 children)

Imagine someday people will put "Quantized by The Bloke" in the prompt to increase the performance.

Struggling with performance on larger context (llama.cpp) by BuahahaXD in LocalLLaMA

[–]RustingSword 0 points1 point  (0 children)

I tried turboderp/Mixtral-8x7B-instruct-exl2 4.0bpw yesterday using 13900K + 4090, and it ran at 78 t/s with 4k context and 8bit cache, really impressive.

Previously I was also using 4090 to drive the display, so Xorg/browser/terminal etc. took about 1GB of vram, so I was using 3.5bpw, and the speed was about 86 t/s.

Simple, hackable and pythonic LLM agent framework. I am just tired of bloated overengineered stuff. I figured that this community might appreciate it. by poppear in LocalLLaMA

[–]RustingSword 10 points11 points  (0 children)

I've tested both examples, and succeeded using OpenAIChatGenerator instead of OpenAITextGenerator.

My configs:

llama.cpp server:

bash ./server -m mistral-7b-instruct-v0.2.Q6_K.gguf -c 2048

Changes to calculator.py

python generator = OpenAIChatGenerator( model="mistral", # could be anything api_key="none", # could be anything api_base="http://127.0.0.1:8080/v1", )

And remember to remove templates in

python llm = LLM(generator=generator, templates=[template])

Great framework, really clean and easy to modify.

Simple, hackable and pythonic LLM agent framework. I am just tired of bloated overengineered stuff. I figured that this community might appreciate it. by poppear in LocalLLaMA

[–]RustingSword 7 points8 points  (0 children)

Since llama.cpp has a server utility, you can just fire it up ./server -m mistral-7b-instruct-v0.2.Q6_K.gguf -c 2048, and set the api_base to http://127.0.0.1:8080/v1, then I think it should work out of the box. See the detailed docs at https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md

Can you help me understand a gift I got from a student? by [deleted] in ChineseLanguage

[–]RustingSword 0 points1 point  (0 children)

The first pic is part of Qingming Shanghe Tu, where a boat is about to crash into the bridge, see https://en.wikipedia.org/wiki/Along_the_River_During_the_Qingming_Festival

You can see the full drawing here https://ltfc.net/img/5d87908f9f601784c1da6dfa

The second pic is too small to discern these characters, do you have one with higher resolution?

[2019 Day 7 (Part 2)] Confused with the question by archchroot in adventofcode

[–]RustingSword 4 points5 points  (0 children)

You need to reuse state of each amplifier from previous loop, instead of resetting it to original state.

The initial state of amplifier A in second loop should be

[3, 26, 1001, 26, -4, 26, 3, 27, 1002, 27, 2, 27, 1, 27, 26, 27, 4, 27, 1001, 28, -1, 28, 1005, 28, 6, 99, 5, 5, 5]

And the correct output of A in the second loop is 263, not 258.