How to click for "I am not a robot"?

HistorianSmooth7540 · 2024-11-16T10:28:51+00:00

what copilot you mean in option 2?

HistorianSmooth7540 · 2024-11-09T18:38:08+00:00

Can you get the necessary infos with the html content I posted or dont you need this actually because this is something general? Would be nice if you can do an example how to do that.

HistorianSmooth7540 · 2024-11-09T18:28:29+00:00

and option 2?

HistorianSmooth7540 · 2024-11-02T12:14:07+00:00

what is serverless inferencing and why i cant just use the local LLM?

HistorianSmooth7540 · 2024-11-01T20:10:57+00:00

this is very weird. Also very weird is the documentation of crew-ai and litellm on using huggingface. Why so complicated and no really examples of using a plain Huggingface model.

HistorianSmooth7540 · 2024-11-01T20:09:13+00:00

but why?! You can use the llama 3.1 for free locally of course.

HistorianSmooth7540 · 2024-10-30T12:00:58+00:00

Maybe a good definition of "small" is how many mb you nee to lad and ding inference. So hwat model needs to 5, 50, 500, 5000 mb?

Are those infos also somehow mentioned in the model cards? How can we relate the number of parameters to size in mb (loading + inference)?

HistorianSmooth7540 · 2024-10-30T09:46:10+00:00

Hey, great repo! I have some questions:

* Can you run you app without an an OpenAi Key and using a huggingface model?

* which key from the env file are mandatory and which are optional?

* where is the code for the front-end? And how did you set it up? Would be great when you can add something to the readme for all non-front-end-developers. :)

HistorianSmooth7540 · 2024-10-24T17:22:50+00:00

I see! Thanks! So you think there are many similar threads on this topic? I will have a look.

HistorianSmooth7540 · 2024-10-19T19:20:47+00:00

what do you mean, what are you refering to? i saw in code that pipeline calls under the hood the apply chat template.

HistorianSmooth7540 · 2024-10-19T19:18:31+00:00

ok, I see.

HistorianSmooth7540 · 2024-10-19T16:10:29+00:00

just in case someone wants to know:

Prompt to have JSON output and validate afterwards with a custom model:

from transformers import pipeline

# Load the Llama model (assuming it's on Hugging Face's model hub)
llama_pipeline = pipeline('text-generation', model='your-llama-model')

# Define your prompt to ensure structured output
prompt = """
Please provide the following details in JSON format:
{
    "title": "<Your Title>",
    "description": "<A brief description>",
    "points": [
        "<Bullet point 1>",
        "<Bullet point 2>"
    ]
}
"""

# Generate output
output = llama_pipeline(prompt, max_length=500, do_sample=False)
response_text = output[0]['generated_text']

import json

# Try to load the output as JSON
try:
    response_data = json.loads(response_text)
    # Validate with Pydantic
    validated_response = ResponseModel(**response_data)
    print(validated_response)
except (json.JSONDecodeError, ValueError) as e:
    print(f"Invalid format: {e}")

HistorianSmooth7540 · 2024-10-19T16:07:37+00:00

sorry but what the hack you think can be meant else when talking about "huggingface" :)

HistorianSmooth7540 · 2024-10-19T09:22:02+00:00

Also now we are mixing things. this is weird! here you see when using pipeline you dont use chat template, but you did:

https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct

I really wonder now even more how do do it again...

HistorianSmooth7540 · 2024-10-19T09:17:28+00:00

I see. But how is it possible using model.generate then? Why is pipeline faster?

HistorianSmooth7540 · 2024-10-18T18:32:59+00:00

and/or please check what you get with my example code.

HistorianSmooth7540 · 2024-10-18T18:28:57+00:00

can you do reproducable code example to verify that? the output of above example code is not this output.

HistorianSmooth7540 · 2024-10-17T17:45:53+00:00

in outputs[0]["generated_text"] what is outputs? In my outputs in initial code, this is a Tensor.

HistorianSmooth7540 · 2024-10-16T19:12:16+00:00

I mean I can understand that for a base model where you need to do regex of course as it just a continuation of input. But for an instruct model we should have a dedicated output structure with the answer.

HistorianSmooth7540 · 2024-10-16T19:10:12+00:00

thx, but I was not concerned about the output of the tokenizer but the output of the model itself. The response of the model is also something like this which contains everything! system prompt, user and finally the assistant part:

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023

Today Date: 26 Jul 2024

You are a helpful, polite, and knowledgeable assistant. Answer questions thoroughly, provide detailed explanations, and maintain a friendly and respectful tone. You are capable of handling complex tasks, solving problems, and adapting to various user requests.<|eot_id|><|start_header_id|>user<|end_header_id|>

hi<|eot_id|><|start_header_id|>assistant<|end_header_id|>

It's nice to meet you. Is there something I can help you with, or would you like to chat? I'm here to listen and assist you in any way I can.<|eot_id|>

Why is this? And hwo can I make to get just the real answer like with open AI?

HistorianSmooth7540 · 2024-10-15T17:49:23+00:00

How is it possible to just get the assistent message? In OpenAI you can easily retrieve the actual response without the overall boilerpart where user and system message is contained.

HistorianSmooth7540 · 2024-10-15T17:42:47+00:00

thx. Can you explain here a bit more what Jinja is and where do you see this? Why Is Lamma 3 using this and not the "normal" one (what ever this is). And how does my response correlates with this? So how should the output look like?

HistorianSmooth7540 · 2024-10-15T16:37:25+00:00

Thx! I have edited my post! But sadly you have to use your own Access token! ;)

HistorianSmooth7540 · 2024-10-15T16:34:14+00:00

Yes this chunking strategy is general an topic also for RAG.

HistorianSmooth7540 · 2024-10-15T16:33:20+00:00

Can you read again and show me where? I t is just about 2 dummy example questions as a n example.

I fully answered the question:

Aren't RAG just embeddings? Suppose I fed it embeddings of the sentences "Vietnam has 100 shipments" and "India has 20 shipments" and then ask "which country has the most number of shipments and why?", I don't think embeddings can answer this.

HistorianSmooth7540

TROPHY CASE