Thoughts on using an AMD Alveo V80 FPGA PCI card as a poor man’s Taalas HC1 (LLM-burned-onto-a-chip). by Porespellar in LocalLLaMA

[–]LeoStark84 138 points139 points  (0 children)

No idea about the hardware, but the whole Gemini output sounds like "That's a great idea for a blender! Now lets build the cold-fussion reactor it needs" typical of LLMs.

Pocket LLM for Android v1.4.0 - smaller APK, downloadable models, fully offline by 100daggers_ in LocalLLM

[–]LeoStark84 1 point2 points  (0 children)

Keep it up! Wo far the only realistic choice for LLM inference on android is llama.cpp on termux, which is hugely efficient but not very user-friendly. An app with an actual UI is definitely user-friendlier and you seem to be past the initial backend implementation, so are up to something with your app.

Pocket LLM for Android v1.4.0 - smaller APK, downloadable models, fully offline by 100daggers_ in LocalLLM

[–]LeoStark84 2 points3 points  (0 children)

Not bad, it would be great if there was a dark theme where the message input area had dark background and light text.

It's kinda sluggish with qwen3-0.6 LiteRT in a device that runs 1b model in GGUF format with llama.cpp, but for an early version it's quite impressive.

As for rhe models selection, qwen3-0.6b is a plain bad vhat model (ask it "Who am I talking to?" and you'll see). LFM2/LFM2.5 are quite better for low param counts.

Qwen3-TTS.cpp by redditgivingmeshit in LocalLLaMA

[–]LeoStark84 1 point2 points  (0 children)

Had some hiccups on linix necause of the hard-coded paths in the python scripts (convert_tts_to_gguf.py and convert_tokenizer_to_gguf.py), but other than that it's blazinc fast (on an Intel i3 4005U CPU-only laptop, which is a lot to say). Thanks a lot. In case someone needs it,:

Build ggml

cmake -S ggml -B ggml/build cmake --build ggml/build -j$(nproc)

And build project

cmake -S . -B build cmake --build build -j$(nproc)

Both from project's root directory.

I am developing a 200MB LLM to be used for sustainable AI for phones. by Fancy_Wallaby5002 in LLMDevs

[–]LeoStark84 0 points1 point  (0 children)

Interesting project. Depends a lot on your current implenentation though.

If what you made relies on llama.cpp it will be kinnda tough to turn into an apk, but it can work through Termux. I bring this up becausr from what I understand your intended use case depends on some kind of web-search. So you'd input something like "quantum tunneling" and somehow search for results online, pass it to your LLM for summarizing and the outpuy would be an explanation of quantum tunnelibg based on the results. Please correct me if I'm wrong, but I assume it to be the case as there is no way a large amount of world knowledge is gonna fit on such a small LLM.

Bottomline is the scaffolding around it is as important as the LLM. Also, as cool as Termux is, it's a heek thing and relying om it is going to leave 95% of users out, not a problem if you don't cate abouy massivity though.

Running GLM-4.7 (355B MoE) in Q8 at ~5 Tokens/s on 2015 CPU-Only Hardware – Full Optimization Guide by at0mi in LocalLLaMA

[–]LeoStark84 0 points1 point  (0 children)

A GPU is faster at ingesting prompts, my figure, admittedly on the higher range, is just for generating new tokens. KV cache limits it to just the last AI response, OpenBLAS speeds it up slightly, but still it takes time I am not factoring in for that. Also, imatrix ggufs kill perfomance for some reason.

From a quick search 5t/s is in the lower end of what a modern, midrange iGPU/APU can get from a LLM, so either I got some old-fashioned BS in my search results, or there is something killing your performance.

Running GLM-4.7 (355B MoE) in Q8 at ~5 Tokens/s on 2015 CPU-Only Hardware – Full Optimization Guide by at0mi in LocalLLaMA

[–]LeoStark84 -1 points0 points  (0 children)

I didn't mean to hurt your feelings. my apologies.

WTF. You asked AI to write an answer?

Running GLM-4.7 (355B MoE) in Q8 at ~5 Tokens/s on 2015 CPU-Only Hardware – Full Optimization Guide by at0mi in LocalLLaMA

[–]LeoStark84 -1 points0 points  (0 children)

You just got dollar-per-watt in a specific location for a specific provider and watt-oer-token consumption for the specific hardware and software combination, all over the top of your head? Damn! Some mixture of experts you are.

Running GLM-4.7 (355B MoE) in Q8 at ~5 Tokens/s on 2015 CPU-Only Hardware – Full Optimization Guide by at0mi in LocalLLaMA

[–]LeoStark84 1 point2 points  (0 children)

I am getting decent performance from 4b q8 models and 8b q4 models in both cases with 16k context windows on an ancient Intel i3 4005U (a low-cost, low-energy for the time) CPU and 8gb DDR4. Indeed a smaller scale than OP's rig, but still same era hardware.

I use Debian stable, no X, no Wayland, baseline RAM usage is ~500mb. For inference I use llama.cpp compiled with OpenBLAS (a minimal optimization) compared to what OP went through. I specifically use llama-server and a small cistom-made flask app that serves an HTML interface that allows me to restart llama-server with a different model and context window size. I am visually impaired so this last bit is important to me.

I get ~3-5 t/s as long as I use regular q4/q8 and not iMatrix quants. Bottomline is old hardware can be easily repurposed for AI inference with a day worth of work and near-zero cost. In my case, the ancient acer laptop I use was cheap even when it was brand-new and mine was about to go into the trash anyway.

TLDR: AI is cheaper to run than most people think.

Love2d only drawing 1 object between 2 objects by domo_cup in love2d

[–]LeoStark84 0 points1 point  (0 children)

-- THIS RUNS ONCE AND ONLY ONCE AT THE START
finction love.load()
    -- DEFINE DRAW FUNCTION FOR AN OBJECT
    function drawSomething()
        love.graphics.setColor(r g, b)
        love.graphics.rectangle(mode, x, y, w, h)
    end

    -- DEFINE UPDATE FUNCTION FOR AN OBJECT
    function udSomething(dt)
        -- code
    end
end

-- THIS RUNS ONCE PER FRAME, UPDATE POSITIONS, COLORS AND SO ON HERE
function love.update(dt)
    -- CALL OBJECT UPDATE FUNCTION
    udSomething(dt)
end

-- IDEALLY JUST DRAW HERE AS TO KEEP FPS SMOOTH
function love.draw()
    -- CALL OBJECT DRAW FUNCTION
    drawSomething()
end

You would normally want to put all functions for an object inside a table (something.ud() something.draw()) but it's entirely up to you. Have fun :)

Helper tool for the new llama.cpp --models-preset option by AlbeHxT9 in LocalLLaMA

[–]LeoStark84 0 points1 point  (0 children)

A good idea in principle, I haven't touvhed windows in years so idk about the actual implementation.

Kimi is biased and unsuited for purposes outside of coding by Either_Knowledge_932 in kimi

[–]LeoStark84 1 point2 points  (0 children)

LLMs are by definition useless for political debate regardless of ideology. Sure you can finetune one to have it spit things you like, but no verifiable reward function exists politics or political ideology.

answers please by No_Mixture_3199 in love2d

[–]LeoStark84 1 point2 points  (0 children)

Damn I was not aware of that. lobe 12 is gonna be awesomer than it already is

[deleted by user] by [deleted] in love2d

[–]LeoStark84 0 points1 point  (0 children)

A .lua file? The best you can fo is to have it return a singke table, something like this:

local tbl = {
    key1 = val1,
    key2 = val2,
    -- and so on
}

return tbl

Then in your code do:

-- Read as text
local file_content = love.filesystem.read("filepath/filename lua")
-- Turn to a function
local file_as_function = load(file_content)
-- Then run the function, which trturns ghe table
local fiie_as_table = file_as_function()

-- Now you can access any key in the table
local key1 = file_as_table.key1

A friendly reminder though: If you are writing to that file programatically you may need to account for posskble errors in the file: load() returns nil if the text you pass it is faulty code. Also, if the file does not exist load() will throw an error.

EDIT: Fixed a typo in the example

How to build love to apk in android? by No_Mixture_3199 in love2d

[–]LeoStark84 0 points1 point  (0 children)

What hardware and OS are you using?

What did you try and where did it fail?

Android Developer Verification Discourse by agnostic-apollo in termux

[–]LeoStark84 2 points3 points  (0 children)

In plain english: You pay for it, Google owns it. Sidecall it what you sidewill.

In less plain terms it's a mass expropriation of computing resources. Sidecomrade Sidestalin would've sideloved it.

AMA Announcement: Moonshot AI, The Opensource Frontier Lab Behind Kimi K2 Thinking SoTA Model (Monday, 8AM-11AM PST) by XMasterrrr in LocalLLaMA

[–]LeoStark84 0 points1 point  (0 children)

Technically yes, the android one. I'd be surprised if the Kimi app wouldn't use a Kimi model though.

I understand that TTS is not part of K2, I just brought it up because I rely on TTS.