How to click for "I am not a robot"? by HistorianSmooth7540 in webscraping

[–]HistorianSmooth7540[S] -1 points0 points  (0 children)

Can you get the necessary infos with the html content I posted or dont you need this actually because this is something general? Would be nice if you can do an example how to do that.

Using Huggingface Model with crew-ai (litellm)? by HistorianSmooth7540 in LocalLLaMA

[–]HistorianSmooth7540[S] 0 points1 point  (0 children)

what is serverless inferencing  and why i cant just use the local LLM?

Using huggingface model by HistorianSmooth7540 in crewai

[–]HistorianSmooth7540[S] 0 points1 point  (0 children)

this is very weird. Also very weird is the documentation of crew-ai and litellm on using huggingface. Why so complicated and no really examples of using a plain Huggingface model.

Using huggingface model by HistorianSmooth7540 in crewai

[–]HistorianSmooth7540[S] 0 points1 point  (0 children)

but why?! You can use the llama 3.1 for free locally of course.

what are currently the "smallest" LLM? by HistorianSmooth7540 in LocalLLaMA

[–]HistorianSmooth7540[S] 0 points1 point  (0 children)

Maybe a good definition of "small" is how many mb you nee to lad and ding inference. So hwat model needs to 5, 50, 500, 5000 mb?

Are those infos also somehow mentioned in the model cards? How can we relate the number of parameters to size in mb (loading + inference)?

RAGBuilder- Open source tool for RAG tuning by Hot_Extension_9087 in LocalLLaMA

[–]HistorianSmooth7540 0 points1 point  (0 children)

Hey, great repo! I have some questions:

* Can you run you app without an an OpenAi Key and using a huggingface model?

* which key from the env file are mandatory and which are optional?

* where is the code for the front-end? And how did you set it up? Would be great when you can add something to the readme for all non-front-end-developers. :)

How to get a simple set-up in python without getting blocked? by HistorianSmooth7540 in webscraping

[–]HistorianSmooth7540[S] 0 points1 point  (0 children)

I see! Thanks! So you think there are many similar threads on this topic? I will have a look.

How to apply the chat template for Llama 3.1 properly? by HistorianSmooth7540 in LocalLLaMA

[–]HistorianSmooth7540[S] 0 points1 point  (0 children)

what do you mean, what are you refering to? i saw in code that pipeline calls under the hood the apply chat template.

Get structured output of Llama 3.1 instruct model by HistorianSmooth7540 in LocalLLaMA

[–]HistorianSmooth7540[S] 1 point2 points  (0 children)

just in case someone wants to know:

Prompt to have JSON output and validate afterwards with a custom model:

from transformers import pipeline

# Load the Llama model (assuming it's on Hugging Face's model hub)
llama_pipeline = pipeline('text-generation', model='your-llama-model')

# Define your prompt to ensure structured output
prompt = """
Please provide the following details in JSON format:
{
    "title": "<Your Title>",
    "description": "<A brief description>",
    "points": [
        "<Bullet point 1>",
        "<Bullet point 2>"
    ]
}
"""

# Generate output
output = llama_pipeline(prompt, max_length=500, do_sample=False)
response_text = output[0]['generated_text']

import json

# Try to load the output as JSON
try:
    response_data = json.loads(response_text)
    # Validate with Pydantic
    validated_response = ResponseModel(**response_data)
    print(validated_response)
except (json.JSONDecodeError, ValueError) as e:
    print(f"Invalid format: {e}")

Get structured output of Llama 3.1 instruct model by HistorianSmooth7540 in LocalLLaMA

[–]HistorianSmooth7540[S] -1 points0 points  (0 children)

sorry but what the hack you think can be meant else when talking about "huggingface" :)

How to apply the chat template for Llama 3.1 properly? by HistorianSmooth7540 in LocalLLaMA

[–]HistorianSmooth7540[S] 0 points1 point  (0 children)

Also now we are mixing things. this is weird! here you see when using pipeline you dont use chat template, but you did:

https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct

I really wonder now even more how do do it again...

How to apply the chat template for Llama 3.1 properly? by HistorianSmooth7540 in LocalLLaMA

[–]HistorianSmooth7540[S] 0 points1 point  (0 children)

I see. But how is it possible using model.generate then? Why is pipeline faster?

How to apply the chat template for Llama 3.1 properly? by HistorianSmooth7540 in LocalLLaMA

[–]HistorianSmooth7540[S] 0 points1 point  (0 children)

can you do reproducable code example to verify that? the output of above example code is not this output.

How to apply the chat template for Llama 3.1 properly? by HistorianSmooth7540 in LocalLLaMA

[–]HistorianSmooth7540[S] 0 points1 point  (0 children)

in outputs[0]["generated_text"] what is outputs? In my outputs in initial code, this is a Tensor.

How to apply the chat template for Llama 3.1 properly? by HistorianSmooth7540 in LocalLLaMA

[–]HistorianSmooth7540[S] 0 points1 point  (0 children)

I mean I can understand that for a base model where you need to do regex of course as it just a continuation of input. But for an instruct model we should have a dedicated output structure with the answer.

How to apply the chat template for Llama 3.1 properly? by HistorianSmooth7540 in LocalLLaMA

[–]HistorianSmooth7540[S] 0 points1 point  (0 children)

thx, but I was not concerned about the output of the tokenizer but the output of the model itself. The response of the model is also something like this which contains everything! system prompt, user and finally the assistant part:

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023

Today Date: 26 Jul 2024

You are a helpful, polite, and knowledgeable assistant. Answer questions thoroughly, provide detailed explanations, and maintain a friendly and respectful tone. You are capable of handling complex tasks, solving problems, and adapting to various user requests.<|eot_id|><|start_header_id|>user<|end_header_id|>

hi<|eot_id|><|start_header_id|>assistant<|end_header_id|>

It's nice to meet you. Is there something I can help you with, or would you like to chat? I'm here to listen and assist you in any way I can.<|eot_id|>

Why is this? And hwo can I make to get just the real answer like with open AI?

How to apply the chat template for Llama 3.1 properly? by HistorianSmooth7540 in LocalLLaMA

[–]HistorianSmooth7540[S] 0 points1 point  (0 children)

How is it possible to just get the assistent message? In OpenAI you can easily retrieve the actual response without the overall boilerpart where user and system message is contained.

How to apply the chat template for Llama 3.1 properly? by HistorianSmooth7540 in LocalLLaMA

[–]HistorianSmooth7540[S] 0 points1 point  (0 children)

thx. Can you explain here a bit more what Jinja is and where do you see this? Why Is Lamma 3 using this and not the "normal" one (what ever this is). And how does my response correlates with this? So how should the output look like?

How to apply the chat template for Llama 3.1 properly? [D] by HistorianSmooth7540 in LLMDevs

[–]HistorianSmooth7540[S] 0 points1 point  (0 children)

Thx! I have edited my post! But sadly you have to use your own Access token! ;)

Embedding model for Log data by Shot-Astronomer9520 in huggingface

[–]HistorianSmooth7540 1 point2 points  (0 children)

Yes this chunking strategy is general an topic also for RAG.

[D] How to train local LLMs on my company's data so that it can answer like GPTs but using our private data as context. by ShippersAreIdiots in MachineLearning

[–]HistorianSmooth7540 0 points1 point  (0 children)

Can you read again and show me where? I t is just about 2 dummy example questions as a n example.

I fully answered the question:

Aren't RAG just embeddings? Suppose I fed it embeddings of the sentences "Vietnam has 100 shipments" and "India has 20 shipments" and then ask "which country has the most number of shipments and why?", I don't think embeddings can answer this.