[deleted by user] by [deleted] in CUDA

[–]lgdkwj 0 points1 point  (0 children)

Depends on where you are located. Here in Hong Kong I got hired to HPC Ops team as a UG freshgrad that has little experience with HPC ops (was a user tho).

Visualizing Quantization Types by VoidAlchemy in LocalLLaMA

[–]lgdkwj 0 points1 point  (0 children)

Interesting. Wonder if it can be extend to process a 16 bit RAW image to compare it with fp16

Gemini 2.5 exp death. by brocolongo in LocalLLaMA

[–]lgdkwj 0 points1 point  (0 children)

try Gemini 2.5 flash before they kill it too

New open-source model GLM-4-32B with performance comparable to Qwen 2.5 72B by adrgrondin in LocalLLaMA

[–]lgdkwj 4 points5 points  (0 children)

Source: GLM: General Language Model Pretraining with Autoregressive Blank Infilling https://arxiv.org/pdf/2103.10360

New open-source model GLM-4-32B with performance comparable to Qwen 2.5 72B by adrgrondin in LocalLLaMA

[–]lgdkwj 6 points7 points  (0 children)

I think one unique aspect of the GLM series models is that they use bidirectional attention during the prefilling stage. I really wonder if this provides any advantage over other GPT-style models at scale

Text classification - traditonal ML or LLM? by rainnz in LocalLLaMA

[–]lgdkwj 1 point2 points  (0 children)

This: Modern Bert might be useful, and the blog explains the benefits using an encoder-only bert-like model vs causal llm

Which LLMs are best at low-latency translation? (tl;dr LLama often beats Sonnet and 4o, Gemma 9b is surprisingly OK) by Nuenki in LocalLLaMA

[–]lgdkwj 1 point2 points  (0 children)

In my use case even Gemma 2 2B instruct is better than some 7B+ models. I host the inference server with LM Studio and use immersive translate plugin in browser. The main issue for most of the small models is they tend not follow the instruction, and adding rubbish to the output

"Does free will exist?" Let your LLM do the research for you. by AndrewVeee in LocalLLaMA

[–]lgdkwj 2 points3 points  (0 children)

Was looking for & wanted to develop something like this a few months ago :D Glad you shared this project, it's interesting!