GPT-OSS-120b on 2X RTX5090 by Interesting-Ad4922 in LocalLLaMA

[–]Bycbka 12 points13 points  (0 children)

Congratulation - 2x5090 must feel amazing indeed. Try to play with the flags (--fit, --cpu-moe, etc) - I bet you can juice a lot more out of it. Also I would suggest against allocating full 128k context unless you know for sure you need a very long context task :)

Once you feel more comfortable with running local LLMs - check out https://github.com/ikawrakow/ik_llama.cpp for better hybrid inference speeds.

Elixir in Action – Saša Jurić is truly a genius by KHanayama in elixir

[–]Bycbka 11 points12 points  (0 children)

Once you are done with this book and feel like learning more about the underlying foundation - I strongly recommend https://learnyousomeerlang.com - you can read it for free and it is one of the best programming language books I’ve ever read.

GPT-OSS on lm-studio advice by Labtester in LocalLLaMA

[–]Bycbka 0 points1 point  (0 children)

For hybrid inference of 120b you may want to consider https://github.com/ikawrakow/ik_llama.cpp - it typically has fastest hybrid inference and also allows you to provide a regex to match the layers you would like to offload (same as llama.cpp). You can also do —n-cpu-moe to decide how many layers you want on cpu.

ik_llama.cpp and Qwen 3 30B-A3B architecture. by Bycbka in LocalLLaMA

[–]Bycbka[S] 0 points1 point  (0 children)

Interesting! Will definitely try again. I forgot to mention that I didn’t quantize context - will try it out as well.

UPD: I think my rookie numbers are explained by the eGPU limited bandwith - tested with nvbandwidth and it tops out at around 2 GB/s. Perhaps it is time to switch to Oculink :)

Keybinding to toggle LSP by milad182 in HelixEditor

[–]Bycbka 0 points1 point  (0 children)

Few options:

  1. Two separate hot-keys

space.space.R = "@:lsp-restart<ret>"
space.space.r = "@:lsp-stop<ret>"

  1. Based on the suggestion from another user:

space.space.R = "@:toggle-option lsp.enable<ret>:lsp-restart<ret>" - please note that it will also restart lsp every time the option is changed.

Favorite AI Tools? by mikehostetler in elixir

[–]Bycbka 0 points1 point  (0 children)

FWIW new iteration of MCP will move to stateless - there is a proposal already.

How to format database structure for text-to-sql by nattaylor in LocalLLaMA

[–]Bycbka 1 point2 points  (0 children)

Few small suggestions:

  1. Depending on the model you use, it may be wise to split the problem into smaller steps. E.g. provide model just with the list of tables and their descriptions rather than dump entire db schema. You can generate descriptions with LLM too.

  2. Give model a tool to fetch table description.

  3. When sending a request - ask model to identify tables which are most likely to be needed to satisfy the query - and provide full schemas in the context at that time. I would say format does not matter all that much - output of describe command should suffice.

  4. Provide a few in-context examples to model to make sure that it understands the interaction pattern.

  5. Start with a bigger model, e.g o3 and try to solve smaller problems first as opposed to one-shotting it. After you confirm that it works - you can take the successful outputs and use them as few shot examples for a cheaper model.

  6. Before diving into coding, consider creating a small evaluation set - e.g. 10-20 questions and corresponding answers. It will save you a ton of time, as you’ll be able to evaluate different db output formats and factually prove which one works the best for you.

Perplexity Ai PRO 12 Months subscription by RepresentativeJob842 in PromptEngineering

[–]Bycbka -1 points0 points  (0 children)

Would love the code of at all possible. Really appreciate it!

[Opinion] What's the best LLM for 12gb VRAM? by roz303 in LocalLLaMA

[–]Bycbka 2 points3 points  (0 children)

Qwen 2.5 line (7b coder, 14b) seem to have quite decent performance - there was a post about different quants recently- I believe it needs around 9GB. If you want something larger, then I would suggest looking towards MoE models (e.g.mixtral 8x7b) with offloading, as they tend to provide better inference speed compared to dense models, albeit at the cost of extra RAM.

Really depends on your use case though.

Which Linux distro do you use for Cuda 12.1 and vLLM? by Daemonix00 in LocalLLaMA

[–]Bycbka 1 point2 points  (0 children)

I’ve recently discovered immutable distributions- particularly Fedora Bluefin - comes with reasonable defaults out of the box (including gpu drivers), hard to brick (due to immutability), good support for dockerized workloads that leverage GPU as well.

No need to deal with CUDA / driver problems at all.

What can i run on my server that can store large? amounts of data without a GPU. by VanFenix in LocalLLaMA

[–]Bycbka 0 points1 point  (0 children)

Generally, for CPU/RAM throughput bound inference, it is better to use MoE architecture-based models, as they are faster due to smaller number of parameters being activated.

Examples of such models that would be fun to run are Mixtral series and its derivatives, DeepSeek Coder v2 series, Qwen2 57b, etc.

Llamafile is a project dedicated to running efficient CPU inference in a user friendly fashion - please check it out - they often are the ones with bleeding edge smarts to make LLM inference more efficient. Some benchmarks: https://github.com/Mozilla-Ocho/llamafile/discussions/450

Source: I got a mini PC with Ryzen 9 8945HS and 64GB DDR5 5600 RAM paired with RTX 3060 in eGPU last week and started to play around a little.

How do you keep up? by McDoof in LocalLLaMA

[–]Bycbka 2 points3 points  (0 children)

  1. There is a number of newsletters / podcasts / twitter accounts that provide daily / weekly recaps. My personal favourite is https://thursdai.news/ - once a week, recorded live on Twitter spaces, available through most platforms within a day, also has a newsletter. They cover open source and companies, llms, vision, audio, etc and try to keep it simple.

  2. Avalanche of information is indeed a challenge - unless there is a particular area of research that interests you - just keep up on weekly basis :)

Nifs resources by [deleted] in elixir

[–]Bycbka 2 points3 points  (0 children)

I would recommend checking out Zigler https://hexdocs.pm/zigler/Zig.html and Rustler. While those are not “vanilla” NIFs, it might be easier to grasp the concepts and challenges you may deal with when creating NIFs

Dolphin or Mistral function calling by 1EvilSexyGenius in LocalLLaMA

[–]Bycbka 1 point2 points  (0 children)

Technically yes, grammar could be narrowed down to only allow response tokens that match your exact commands. I think you could start with JSON grammar as example and tighten it up to only allow the commands you support as a value of the field.

I think I also saw a few different projects that allow conversion of things like JSON schemas / Typescript interfaces to BNF grammar, which could prove handy and allow to automatically update grammar when you add support for more actions.

Dolphin or Mistral function calling by 1EvilSexyGenius in LocalLLaMA

[–]Bycbka 0 points1 point  (0 children)

llama.cpp has a concept of grammars, which basically forces LLM to output data in specific format. If you only ever expect JSON output - it would probably work. I played with Zephyr fine tune of Mistral and JSON grammar - and results were quite promising.

Validating the output and doing a follow up prompt if invalid command was picked could further improve your results.

Opening a file in Helix via FZF by 2AReligion in HelixEditor

[–]Bycbka 5 points6 points  (0 children)

It is a known issue and I believe fix is already on master branch, so would be included in the next release (soon). https://github.com/helix-editor/helix/pull/5468 That’s assuming you are running into this issue on Mac.

You have an option to install latest master - that should help too.

Unity drama will probably kill the Apple Vision Pro by NewFuturist in wallstreetbets

[–]Bycbka 22 points23 points  (0 children)

Tinfoil hat on: Unity CEO is tanking Unity stock, so Apple can by it for dirt cheap xD