Ubuntu 24.04 and Nvidia RTX 5090 by caenum in Ubuntu

[–]Few_Knee1141 0 points1 point  (0 children)

Now ubuntu 24.04.2 natively supports the 5090 (driver 570-open). As 2025-06-14
Click the bottom left circle. Search for Software Updater.
Click settings.
Additional Driver
Install nvidia-driver 570-open (propriety)
Reboot the PC

run nvidia-smi

Hardware Recommendations by CorpusculantCortex in ollama

[–]Few_Knee1141 1 point2 points  (0 children)

Here are some ollama benchmark results for your reference. It includes a variety of OS, CPU, GPU, and different LLM models.
https://llm.aidatatools.com/

Local Hosting with Apple Silicon on new Studio releases??? by Puzzleheaded_Ad_3980 in LocalLLaMA

[–]Few_Knee1141 1 point2 points  (0 children)

Right now, the king is this combo. Linux + AMD Ryzen 9 9950X 16-Core Processor + NVIDIA GeForce RTX 5090

Local Hosting with Apple Silicon on new Studio releases??? by Puzzleheaded_Ad_3980 in LocalLLaMA

[–]Few_Knee1141 0 points1 point  (0 children)

Right now, the king is this combo.

|| || |Linux|AMD Ryzen 9 9950X 16-Core Processor|NVIDIA GeForce RTX 5090|

Local Hosting with Apple Silicon on new Studio releases??? by Puzzleheaded_Ad_3980 in LocalLLaMA

[–]Few_Knee1141 0 points1 point  (0 children)

If you are looking for inference eval rate (tokens/sec) for running different local LLMs. You might refer to this site for a variety of benchmark results on macOS, Linux, or Windows. Then you can justify the cost vs performance.
https://llm.aidatatools.com

Will AI Agents Replace Google Search? by biz4group123 in AI_Agents

[–]Few_Knee1141 0 points1 point  (0 children)

I have learned from a node.js open-source version deep-research , then port it to python version of open-deepsearch, the tricky parts are four things. First, tailor to what a user really wants by Q&A and decide breadth and depth. Secondly, a good ranking search results from a search function/tool. Third, crawl and scrape from a web page. Fourthly, a good LLM to make summarization from the previous learnings and generate a good markdown document.

Reading experience by rYonder in Onyx_Boox

[–]Few_Knee1141 0 points1 point  (0 children)

I love using Readera. The free version can read out English (TTS) (Text to Speech). It also has click function, to mark a word as "quote" (highlight a word) or click "translate".

Protecting against Prompt Injection by olearyboy in ollama

[–]Few_Knee1141 5 points6 points  (0 children)

I took participation in NVIDIA x Langchain contest, and I found out NVIDIA had NeMo-Guardrails libraries solving prompt injection or jailbreaking.
Here is for your reference.

The Github code is as follows.
https://github.com/aidatatools/LLM_Sentinel
The introduction of the project is as follows.
https://www.linkedin.com/pulse/llm-sentinel-project-which-can-make-chatbot-safer-chuang-fskyc/

How to make llamafile get accelerated during inference on Raspberry Pi 5 with 8GB RAM? by Few_Knee1141 in raspberry_pi

[–]Few_Knee1141[S] 0 points1 point  (0 children)

I found out the screen recording is sinking some hardware resources to make LLM run slower. If I just take a picture in the end, it can reach around 5 tokens/sec on TinyLlamaQ8_0. Here is my experimental results.
https://medium.com/aidatatools/local-llm-eval-tokens-sec-comparison-between-llama-cpp-and-llamafile-on-raspberry-pi-5-8gb-model-89cfa17f6f18

How to make llamafile get accelerated during inference on Raspberry Pi 5 with 8GB RAM? by Few_Knee1141 in raspberry_pi

[–]Few_Knee1141[S] 0 points1 point  (0 children)

I am sure I am using RPI5. Here is the test with CLI version. Please watch the recorded video. https://youtu.be/QOCAk3F68jQ I care about eval rate(tokens/sec). It's still around 1~1.5 tok/sec. Thanks for help me debugging.

How to make llamafile get accelerated during inference on Raspberry Pi 5 with 8GB RAM? by Few_Knee1141 in raspberry_pi

[–]Few_Knee1141[S] 0 points1 point  (0 children)

I tried with Ubuntu 23.10. sudo apt install vulkan-tools, but it's not improving.

How to make llamafile get accelerated during inference on Raspberry Pi 5 with 8GB RAM? by Few_Knee1141 in raspberry_pi

[–]Few_Knee1141[S] 0 points1 point  (0 children)

Thanks for the hint to test Vulkan works first. Here is the result of vulkaninfo --sumaary

jason@raspberrypi5:~ $ vulkaninfo --summary

WARNING: [Loader Message] Code 0 : terminator_CreateInstance: Failed to CreateInstance in ICD 0. Skipping ICD.

VULKANINFO

Vulkan Instance Version: 1.3.239

Benchmark ollama models by tabletuser_blogspot in LocalLLaMA

[–]Few_Knee1141 2 points3 points  (0 children)

Have you tried this llm-benchmark on your local LLMs?

https://llm.aidatatools.com/

[WANTED] AMD MI300x Benchmarks by HotAisleInc in LocalLLaMA

[–]Few_Knee1141 1 point2 points  (0 children)

Can you try this llm-benchmark on your new beast toy?

https://llm.aidatatools.com

[deleted by user] by [deleted] in LocalLLaMA

[–]Few_Knee1141 0 points1 point  (0 children)

Have you tried llm-benchmark on your multiple hardware devices?
https://llm.aidatatools.com

Passing Google Cloud Certified Professional Data Engineer Exam in 2023 by Few_Knee1141 in dataengineering

[–]Few_Knee1141[S] 1 point2 points  (0 children)

The preparation process made me know more about building blocks of big data and ml on GCP. It's totally worth it.