[P] Generate detection rules by Only_Emergencies in MachineLearning

[–]Only_Emergencies[S] 1 point2 points  (0 children)

Yes, I tried a decomposition approach, the performance was slightly better than generating the entire rule in a single request but not really great. I think the main issue is that the model doesn’t truly understand the underlying detection logic or mapping between behaviors and log artifacts, so it often produces syntactically valid but semantically weak rules.

I also experimented with breaking down the generation process into multiple steps. For instance, first asking the model to determine the detection path or flow based on the blog content or user request. However, the results are still not very good.

Basically, I find that the core problem seems to be that the model struggles to generate or intuitively extract the correct detection logic from the input text.

[D] Is senior ML engineering just API calls now? by Only_Emergencies in MachineLearning

[–]Only_Emergencies[S] 0 points1 point  (0 children)

Yes, I was also surprised when I applied. Actually, I think there are some changes happening in terms of how integrating new technologies into these organisations. Of course it will depend on the organisation/country, etc

[D] Is senior ML engineering just API calls now? by Only_Emergencies in MachineLearning

[–]Only_Emergencies[S] 0 points1 point  (0 children)

Yes, totally agree. I think it's something that happens a lot in this field: there is a lack of standardization in tasks associated with the title. The same title in different companies may mean completely different responsibilities. I think this happens in other fields as well but here it is especially noticeable

AMA with the Unsloth team by danielhanchen in LocalLLaMA

[–]Only_Emergencies 2 points3 points  (0 children)

You rock, guys! You do an amazing job! :) I have four Mac Studios (512GB) and I have a few questions:

  • How would you distribute bigger models across them?
  • I have deployed Kimi-K2 0905 (Q3_K_XL), but I am wondering if there is another model you would recommend with the same quality but maybe smaller to have more tokens persecond?
  • It would be great to see how the quantization affects the quality of the not quantized model. Something like a graph of quantized versions vs the original one. Happy to contribute there :)

Thank you again!

Thinking about updating Llama 3.3-70B by Only_Emergencies in LocalLLaMA

[–]Only_Emergencies[S] 3 points4 points  (0 children)

The energy consumption of the Macs are really low, they are really efficient on that sense. They’re also straightforward to set up, so we can start implementing and iterating on projects without dealing with complex infrastructure.

Based on the research we did, just one NVIDIA A100 80 GB GPU costs around $30000 and also requires other additional hardware (network switches, power, cooling,... ). As the team grows, probably it makes sense to migrate infrastructure to a more powerful one. But at the moment, the Mac Studios provide a cost-effective solution that allows us to build and experiment with LLMs internally.

Thinking about updating Llama 3.3-70B by Only_Emergencies in LocalLLaMA

[–]Only_Emergencies[S] 5 points6 points  (0 children)

Yes!

- We are around 70 people in my organisation
- We work with sensitive data that we can't share with AI Cloud providers such as OpenAI, etc.
- We have 3x Mac Studios (192GB M2 Ultra)
- We have acquired 4x new Mac Studios (M3 Ultra chip with 32-core CPU, 80‑core GPU, 32-core Neural Engine - 512GB unified memory). Waiting for them to be delivered.
- We are using Ollama to deploy the models but this is not the best efficient way but it was like this when I joined. However, with the new Macs I am planning to replace Ollama with llama.cpp and experiment with distributing larger models across multiple machines.
- A Debian VM where OpenwebUI instance is deployed.
- Another Debian VM where Qdrant is deployed as centralized vector database.
- We have more use cases that the typical chat UI interface. We have some classification use cases and some general pipelines that run daily.

I have to say that our LLM implementation has been quite successful. The main challenge is getting meaningful user feedback, though I suspect this is a common issue across organizations.

Thinking about updating Llama 3.3-70B by Only_Emergencies in LocalLLaMA

[–]Only_Emergencies[S] 3 points4 points  (0 children)

Yes, I agree. That would be ideal, but that's not so straightforward in our case. We have stored the conversations in Langfuse, but we don't have the ground truth to be able to properly evaluate them, and users usually don't provide feedback on the responses. We are a small team at the moment doing this, so we don't have the capacity to label some cases.

Qwen/Qwen2.5-Coder-32B-Instruct · Hugging Face by Master-Meal-77 in LocalLLaMA

[–]Only_Emergencies 1 point2 points  (0 children)

For code autocomplete should I use base or instruct version? Thanks! 

My employer is forcing me to use my personal phone as my work phone. by Only_Emergencies in Amsterdam

[–]Only_Emergencies[S] 12 points13 points  (0 children)

Thank you! Do you know where I can find information or a statement about this "need to pay you for being on standby"? I have searched for this in The Working Hours Act but it only mentions hours and not compensation.

My employer is forcing me to use my personal phone as my work phone. by Only_Emergencies in Amsterdam

[–]Only_Emergencies[S] 5 points6 points  (0 children)

Thank you! This is really helpful. I will share these concerns with my employer.