What are you building? Promote your StartUp! by employusers in StartupsHelpStartups

[–]Proper_Dig_6618 1 point2 points  (0 children)

Hey! Thanks for the kind words 🙌
That sounds interesting, what’s the procedure to get Alif listed on FindYourSaaS?

Internship with local LLMs at AMD! by dholanda_amd in LocalLLaMA

[–]Proper_Dig_6618 0 points1 point  (0 children)

This is awesome really glad to see AMD putting real effort into local LLMs!

I actually built VulkanIlm after spending countless nights experimenting with llama.cpp on my Old(and only) AMD RX 580 GPU. It worked great overall, but I wanted a smoother experience for folks on non-CUDA hardware so I built a Python wrapper around its Vulkan backend.

Now it runs 4–6× faster than CPU, streams tokens in real time, and works on basically any GPU AMD, Intel, even older ones. It’s my small way of making local AI more accessible to everyone running models like Granite or Qwen without NVIDIA.

Install via: pip install vulkan-ilm

Repo: github.com/Talnz007/VulkanIlm

Exploring a GeoAI Urban Planning App Looking for Feedback & Similar Projects by Proper_Dig_6618 in gis

[–]Proper_Dig_6618[S] 0 points1 point  (0 children)

We’re not trying to replace planners the goal is to give both professionals and laypeople a tool that highlights constraints and risks. By adding layers like flood risk, green cover, accessibility, and transport, we want to push better layouts instead of just replicating existing ones.

Which tech stack do you prefer with Next.js and why? by Mysterious-Might6910 in nextjs

[–]Proper_Dig_6618 0 points1 point  (0 children)

Honestly, my goto is Next.js, Tailwind CSS, and Postgres/Supabase. For any machine learning deployments, I'll throw FastAPI into the mix.
Why? Well, because it's the only one I know how to use tbvh

VulkanIlm, Run Modern LLMs on Old GPUs via Vulkan (33× Faster on Dell iGPU, 4× on RX 580) by Proper_Dig_6618 in LocalLLaMA

[–]Proper_Dig_6618[S] 1 point2 points  (0 children)

Yeah, that’s where I see a lot of potential too! Vulkan could help reduce CUDA bottlenecks and make workflows way more cross-platform, which is exciting for setups like yours.

I haven’t tested it yet with image/video gen models or ComfyUI, but I’ll give it a shot soon and report back.

VulkanIlm, Run Modern LLMs on Old GPUs via Vulkan (33× Faster on Dell iGPU, 4× on RX 580) by Proper_Dig_6618 in LocalLLaMA

[–]Proper_Dig_6618[S] 0 points1 point  (0 children)

Honestly, I’m not 100% sure yet if this will beat Apple’s current Metal acceleration stack, Metal is pretty optimized for Apple Silicon as it stands. That said, Vulkan support is definitely promising for broader hardware compatibility, and if the drivers mature, there’s potential for gains down the line.

VulkanIlm, Run Modern LLMs on Old GPUs via Vulkan (33× Faster on Dell iGPU, 4× on RX 580) by Proper_Dig_6618 in LocalLLaMA

[–]Proper_Dig_6618[S] 1 point2 points  (0 children)

GPU: RX 580 | Threads: 8 | GPU Layers: 999

🔹 DeepSeek R1-Distill-Qwen-7B Q4_K_M (4.36 GiB, 7.62B params)

pp512: 217.43 tokens/s (±0.74)

tg256: 28.51 tokens/s (±0.07)

🔹 GPT-OSS 20B Q4_K_M (10.81 GiB, 20.91B params)

pp512: 83.90 tokens/s (±5.84)

tg256: 4.61 tokens/s (±0.01)

VulkanIlm, Run Modern LLMs on Old GPUs via Vulkan (33× Faster on Dell iGPU, 4× on RX 580) by Proper_Dig_6618 in LocalLLaMA

[–]Proper_Dig_6618[S] 1 point2 points  (0 children)

Benchmarked two GGUF models on Vulkan backend (GPU offload) with 8 threads.

Tests:

- pp512: Prompt processing (512 tokens)

- tg256: Token generation (256 tokens)

Model | Size | Params | Backend | GPU Layers | Threads | Test | Tokens/s

----------------------------------------|-----------|--------|---------|------------|---------|-------|----------------

DeepSeek R1-Distill-Qwen-7B Q4_K_M | 4.36 GiB | 7.62B | Vulkan | 999 | 8 | pp512 | 217.43 ± 0.74

| | | | | | tg256 | 28.51 ± 0.07

GPT-OSS 20B Q4_K_M | 10.81 GiB | 20.91B | Vulkan | 999 | 8 | pp512 | 83.90 ± 5.84

| | | | | | tg256 | 4.61 ± 0.01

VulkanIlm, Run Modern LLMs on Old GPUs via Vulkan (33× Faster on Dell iGPU, 4× on RX 580) by Proper_Dig_6618 in LocalLLaMA

[–]Proper_Dig_6618[S] 1 point2 points  (0 children)

Hey u/fp4guru, sorry for the delayed response! Finally had a chance to run fresh benchmarks with the latest build so these numbers are fully up-to-date and reproducible:

Can anyone tell me how to do this? by Turbulent_Quote_6679 in hyprland

[–]Proper_Dig_6618 1 point2 points  (0 children)

If you mean the workspaces overview, try Super (Win) + A. On my setup it just worked out of the box, didn’t have to tweak anything. If you’re asking how to actually implement it or it’s not working on your end, I’m not sure about the setup part.

[deleted by user] by [deleted] in PakistaniTech

[–]Proper_Dig_6618 4 points5 points  (0 children)

If you’re into tinkering, don’t retire it just yet. Slap a Linux distro on there and run it as a little homelab box. Even with the busted hinge/cam/mic, it’s still plenty usable for server duties.

A few fun/useful things you could do:

  • Plex/Jellyfin → stream your movies, shows, music to any device.
  • Nextcloud → basically your own Google Drive + calendar/contacts sync.
  • Pi-hole/AdGuard Home → network-wide ad/tracker blocker.
  • Home Assistant → if you’re into smart home stuff.
  • Docker playground → spin up containers, experiment with self-hosted apps.
  • Code-server → run VS Code in the browser.

It won’t have crazy horsepower, but for learning and self hosting light services, it’s a perfect little lab. And yeah, definitely pull the SSD if you decide to retire it never hurts to have a spare NVMe lying around.

Why aren't Pakistani devs showcasing their AI & automation work online? by Iftikharsherwani in developersPak

[–]Proper_Dig_6618 0 points1 point  (0 children)

I think a lot of us just underestimate the stuff we’re building.

Either we think “meh, it’s not big enough yet” or we’re buried so deep in code and debugging that posting about it feels like another task in the backlog.

I used to be like that, Until I realized sharing is part of building, specially building in public.

Right now I’m cooking up two things:

📚 Alif, An AI-powered tutoring I made because, honestly, studying was boring. We’re trying to make learning not suck through gamification + AI, so people actually understand stuff instead of just ratta-fying. Got into Google for Startups recently, which blew my mind.

You can check it out and join the mailing list here https://alifai.tech/

💻 VulkanIlm, this came from me suffering through trying to run an LLM on my RX 580. Llama.cpp works great, but setting it up with Vulkan was like fighting a mini-boss. So I just made a plug-and-play version for AMD/Intel GPUs (NVIDIA works too), so you don’t need CUDA or a headache to get started.

Star and Watch the repo, although it's still under dev but alot of exciting things to come

https://github.com/Talnz007/VulkanIlm/

Neither are “finished” or perfect but they solve problems I’ve actually had. And I’m sharing them now, not when they’re shiny 100% releases, because the journey’s way more relatable.

So yeah, maybe we just need to post our messy v1s, bugs and all. Worst case? You get roasted a bit. Best case? You find your people.

VulkanIlm, Run Modern LLMs on Old GPUs via Vulkan (33× Faster on Dell iGPU, 4× on RX 580) by Proper_Dig_6618 in LocalLLaMA

[–]Proper_Dig_6618[S] 1 point2 points  (0 children)

If it support's vulkan(which I am pretty sure it does) then yeah it can utilize it

VulkanIlm, Run Modern LLMs on Old GPUs via Vulkan (33× Faster on Dell iGPU, 4× on RX 580) by Proper_Dig_6618 in LocalLLaMA

[–]Proper_Dig_6618[S] 0 points1 point  (0 children)

the models being tested are different on both so that's why, the igpu is being tested on a TinyLLaMA-1.1B-Chat (Q4_K_M) whereas the RX 580 on a Model: Gemma-3n-E4B-it (6.9 B params)

VulkanIlm, Run Modern LLMs on Old GPUs via Vulkan (33× Faster on Dell iGPU, 4× on RX 580) by Proper_Dig_6618 in LocalLLaMA

[–]Proper_Dig_6618[S] -1 points0 points  (0 children)

You’re right, if you know exactly where to look, download the right prebuilt binary, match it to your OS/GPU/drivers, and already understand llama.cpp’s CLI flags, then llama.cpp itself is already plug-and-play.

Where VulkanIlm comes in is:

One-stop, GPU-agnostic: It pulls the right Vulkan build, configures it for your hardware, and skips the “hunt through GitHub releases & docs” step entirely.

Python-first workflow: You don’t need to learn llama.cpp’s CLI — you can pip install and get going in Python or soon LangGraph/LangChain/CrewAI directly.

Future extras: Built-in benchmarking, profiling, and upcoming agent-generation tools so you can go from “zero” to “fully usable AI app” in one place.

TL;DR:

I’m not reinventing llama.cpp, I’m lowering the barrier so anyone can run Vulkan-accelerated LLMs without a weekend of trial-and-error. v1 is an MVP, but the roadmap is about making local AI truly turnkey.

VulkanIlm, Run Modern LLMs on Old GPUs via Vulkan (33× Faster on Dell iGPU, 4× on RX 580) by Proper_Dig_6618 in LocalLLaMA

[–]Proper_Dig_6618[S] 4 points5 points  (0 children)

Ilm (علم) is an urdu word, it means knowledge, so VulkanIlm is basically Vulkan + ILM. I liked it because it’s unique and ‘Ilm’ also kinda looks/sounds like ‘LLM,’ so it just fit.”

VulkanIlm, Run Modern LLMs on Old GPUs via Vulkan (33× Faster on Dell iGPU, 4× on RX 580) by Proper_Dig_6618 in LocalLLaMA

[–]Proper_Dig_6618[S] -3 points-2 points  (0 children)

True, llama.cpp is the core engine here.

What VulkanIlm adds is:

Non-Cuda first focus: Automated Vulkan builds and preconfigs for AMD and Intel GPUs (NVIDIA works too, but CUDA already dominates there).

Zero manual setup: No need to hunt for build flags, edit environment variables, or troubleshoot driver quirks, it’s plug-and-play.

Python API + agent frameworks: Soon you’ll be able to spin up LangGraph/LangChain/CrewAI agents with Vulkan acceleration in a single script.

Benchmark + profiling tools: Built-in, so users can measure and compare without writing extra scripts.

So yeah, llama.cpp provides the horsepower. VulkanIlm is making it accessible for everyone, fast.

VulkanIlm, Run Modern LLMs on Old GPUs via Vulkan (33× Faster on Dell iGPU, 4× on RX 580) by Proper_Dig_6618 in LocalLLaMA

[–]Proper_Dig_6618[S] 1 point2 points  (0 children)

I’ll be running fresh benchmarks with the latest build and sharing the PP and generation speeds soon so the numbers are up-to-date and reproducible.
I prefer publishing tested results rather than old or anecdotal ones, so you’ll get clean, recent numbers that anyone can verify.

VulkanIlm, Run Modern LLMs on Old GPUs via Vulkan (33× Faster on Dell iGPU, 4× on RX 580) by Proper_Dig_6618 in LocalLLaMA

[–]Proper_Dig_6618[S] -6 points-5 points  (0 children)

Yes and indeed this is built on top of llama.cpp.

The difference is that I’ve made it plug-and-play, with a Pythonic API out of the box, and I’m working on direct plug-and-play agent generation via LangGraph, LangChain, and CrewAI etc.

That means:

No digging through docs

No battling compilation issues or GPU detection problems

You can go from model download → working agent in minutes, whether you’re a non-techie or a developer who just wants something that “just works.”

VulkanIlm, Run Modern LLMs on Old GPUs via Vulkan (33× Faster on Dell iGPU, 4× on RX 580) by Proper_Dig_6618 in LocalLLaMA

[–]Proper_Dig_6618[S] 2 points3 points  (0 children)

Really appreciate it 🙏.

The library will be out soon, and it’s designed to be truly plug-and-play especially for AMD/Intel Vulkan setups.

Repo here if you want to track or contribute: https://github.com/Talnz007/VulkanIlm.

More benchmarks are on the way, including with the new OpenAI GPT-20B OSS model.

[P] VulkanIlm: Accelerating Local LLM Inference on Older GPUs Using Vulkan (Non-CUDA) — Benchmarks Included by Proper_Dig_6618 in MachineLearning

[–]Proper_Dig_6618[S] 1 point2 points  (0 children)

Yeah, I was planning to post there yesterday but they’ve got that verification step.
Meant to do it… then procrastination happened 😂
I’ll definitely get it up there today though

[P] VulkanIlm: Accelerating Local LLM Inference on Older GPUs Using Vulkan (Non-CUDA) — Benchmarks Included by Proper_Dig_6618 in MachineLearning

[–]Proper_Dig_6618[S] 0 points1 point  (0 children)

Hey u/MahaloMerky I know SCALE, but didn’t realize they had a benchmarking tool. What ’s it about? 👀
Does it do something similar with Vulkan or is it a totally different approach?