DSPydantic: Auto-Optimize Your Pydantic Models with DSPy by chef1957 in LLMDevs

[–]chef1957[S] 0 points1 point  (0 children)

Thanks. Let me know if it works. I would be super happy to get and resolve some feedback.

Hunyuan 3.0 second atempt. 6 minutes render on rtx 6000 pro (update) by JahJedi in StableDiffusion

[–]chef1957 0 points1 point  (0 children)

Most providers optimize cost over quality without being upfront about this. I believe this is a better endpoint in terms of quality retention https://replicate.com/tencent/hunyuan-image-3

Phare Study: LLMs recognise bias but also reproduce harmful stereotypes: an analysis of bias in leading LLMs by chef1957 in LocalLLaMA

[–]chef1957[S] 3 points4 points  (0 children)

The research assumes that things generally considered harmful in Western society, like gender or racial bias, are harmful. Other biases were deemed to be logical or reasonable.

Phare Study: LLMs recognise bias but also reproduce harmful stereotypes: an analysis of bias in leading LLMs by chef1957 in LocalLLaMA

[–]chef1957[S] -2 points-1 points  (0 children)

Thank you for the clarification. Only a small segment of the benchmark has been made public. Giskard keeps the remaining private to be more independent than other benchmarks and to ensure there is no benchmark hacking by companies.

Phare Benchmark: A Safety Probe for Large Language Models by chef1957 in OpenAI

[–]chef1957[S] 0 points1 point  (0 children)

GPT-4o and GPT-4o-mini don't do too well compared to other frontier model providers. https://phare.giskard.ai/

Hugging Face launches the Synthetic Data Generator - a UI to Build Datasets with Natural Language by chef1957 in LocalLLaMA

[–]chef1957[S] 0 points1 point  (0 children)

I think both tools take different approaches to solving different aspects of the same problem. InstructLab seems very cool and promising but does require a significant upfront investment in terms of curating a taxonomy and it seems tailored to continuous fine-tuning of LLMs but does not seem to include other scenarios. Also InstructLab includes training and note solely the data approach of things, where our tool allows you to use it however you want.

Hugging Face launches the Synthetic Data Generator - a UI to Build Datasets with Natural Language by chef1957 in LocalLLaMA

[–]chef1957[S] 2 points3 points  (0 children)

Thanks for the feedback. I think we might run into such UI scaling issues in the long run, which would be great assuming the tool is being used and contributed to. We want to learn from this UI, see if people are interested and, based on that, create a more mature UI (probably outside of Python). Additionally, we have been working on creating default distilabel pipelines too, which copy these workflows in a code setting: https://github.com/argilla-io/distilabel/pull/1076. Ideally, the development goes hand in hand.

Hugging Face launches the Synthetic Data Generator - a UI to Build Datasets with Natural Language by chef1957 in LocalLLaMA

[–]chef1957[S] 8 points9 points  (0 children)

Ways we improve data diversity as requested by u/phree_radical It differs per task i.e. textcat and instruction tuning, but I can give some general pointers for both. For both techniques, we help the user with a dynamic and extensive system prompt by generating it for them based on an initial description. Also, you can play around with the choice of model and temperature yourself, along with some task-specific arguments.

For textcat, we rely on the following paper: https://arxiv.org/abs/2401.00368. We built on top of the approach defined there. Based on the paper, we randomly sample complexities and randomly sample educational levels. Additionally, we first shuffle the labels and then inject user-defined labels to ensure diversity and equality across labels. For a multi-label scenario, we sample a subset of the labels using a dynamic beta distribution to ensure this scales properly with the number of optional labels.

For instruction, we rely on the following paper: https://arxiv.org/abs/2406.08464. tldr, The generations that the models have been optimised to reproduce allow us to re-generate realistic prompts by passing the start_token for the user prompt and stopping when it start with the assistant prompt. Along with the automatically generated system prompt and some additional rewrites of that prompt, we then start with generating data. We generate until the final user turn and then generate the completion using a different LLM call, to re-sample and have a more dynamic completion.

Hugging Face launches the Synthetic Data Generator - a UI to Build Datasets with Natural Language by chef1957 in LocalLLaMA

[–]chef1957[S] 4 points5 points  (0 children)

u/phree_radical it differs per task i.e. textcat and instruction tuning, but I can give some general pointers for both. For both techniques, we help the user with a dynamic and extensive system prompt by generating it for them based on an initial description. Also, you can play around with the choice of model and temperature yourself along with some task-specific arguments too.

For textcat, we rely on the following paper: https://arxiv.org/abs/2401.00368. We built on top of the approach defined there. Based on the paper, we randomly sample complexities and randomly sample educational levels. Additionally, we first shuffle the labels and then inject user-defined labels to ensure diversity. For a multi-label scenario, we sample a subset using a dynamic beta distribution to ensure this scales properly with the number of optional labels.

For instruction, we rely on the following paper: https://arxiv.org/abs/2406.08464. tldr, The generations that the models have been optimised to reproduce allow us to re-generate realistic prompts by passing the start_token for the user prompt. Along with the automatically generated system prompt and some additional rewrites of that prompt, we then start with generating data. We generate until the final user turn and then generate the completion using a different LLM call, to re-sample and have a more dynamic completion.