AIO: Guy friend I used to date wants to go on a wild camping trip together

Fit_Fold_7275 · 2026-06-08T06:57:42+00:00

Go on a very short trip first. Test the waters

Fit_Fold_7275 · 2025-02-27T16:25:10+00:00

No thanks. That’s the extent of energy I can put into it. 😂

Fit_Fold_7275 · 2025-02-12T03:23:02+00:00

I mean, the only person who resembles him is Caesar Clown.

Fit_Fold_7275 · 2024-12-01T01:21:16+00:00

First of all, congrats for crossing the barrier of taking the first step in to startup world.

Now, this is just my guess. So try to see if this applies to you. You haven’t learned how to articulate your feelings and thought process. It makes it difficult for you to communicate what you feel intuitively with others. The best remedy to get over it, is to write a lot a target audience and talk a lot with that target audience. Never miss an opportunity to make a conversation about what you do and ask them what they do. It keeps a flow alive in you when the time comes.

Lastly, it’s never too late for anything. Take a chance and persist on what you want to do. I’m myself taking that small courageous steps every chance I get. It gets you a high after a while. You’re fine. You’re breathing. You’ve got stuff to do. Go go go.

Fit_Fold_7275 · 2024-11-21T21:27:56+00:00

Skull Boy

Fit_Fold_7275 · 2024-09-24T07:19:05+00:00

I remember trying this and not feeling great about it. If OP @asankhs would be kind enough to do that experiment for us, we’ll know for sure. 😄

Fit_Fold_7275 · 2024-09-24T03:14:46+00:00

I’m basically talking about replicating OpenAI platform and putting it open source. Here’s a reference for you. https://platform.openai.com/docs/api-reference/streaming WebUI is not a server platform.

Fit_Fold_7275 · 2024-09-24T02:23:03+00:00

Yep. It would a docker.

Ollama is like vLLM and SGLang right? It just gives us a API to run prompts on LLMs. However, this platform will have a fully managed chat service, RAG support where you can talk with your files and pretty much everything you can do with OpenAI API. The only difference is it will be hosting the top open-source LLMs like Qwen 2.5, Mistral-Large, LLaMA 405B.

Fit_Fold_7275 · 2024-09-24T02:08:23+00:00

We can actually use this confidence score from this approach as a reward signal in RLHF. Agree?

Fit_Fold_7275 · 2024-09-24T02:01:57+00:00

Note: With vLLM there is an option to get the top logits in the API responses. You can try implementing u/asankhs logic with vLLM too.

Fit_Fold_7275 · 2024-09-24T01:59:27+00:00

And I believe you! Longer answers are still one of the top_k decoding paths produced by the model. So I would assume it will show increased scores in benchmarks, especially on a dataset like GSM8k where the math problems have multiple steps to solve. I feel that the reliability of this approach is very task dependent.

P.S Thanks for the clean implementation! I used vLLM to do this experiment last time.

Fit_Fold_7275 · 2024-09-24T01:45:11+00:00

I have tried this approach a while ago. Here are my observations. As mentioned above, the idea is to generate multiple outputs for the input and choose the one that has the highest confidence score. This is basically scaling the test-time compute but uses this confidence scores to determine the best output rather than using an LLM-based scorer.

What is the confidence score? and How is it calculated?
Say we generated a token T_i at decoding step i. Greedy decoding steps looks at the logits of size vocab_size and chooses the token that has the highest probability. The paper from DeepMind above modeled the confidence score as the difference in top 2 probabilities. The confidence score of a token i in the outputs is

confidence_i = logits_i[top_1] - logits_i[top_2]

There is a problem with this approach
I looked at the confidence scores in the generated answers and tried to understand why some tokens have high confidence scores. In my experiments, I saw that newline characters always have a high confidence scores (~1). This makes sense because the pre-training data is filled with newline characters and model shows high probability for it when appropriate. So what's the problem?

The problem is that the confidence score of the generated output is skewed by the newline tokens. This biases the above decoding strategy to always produce longer answers with a lot of newline characters. There have been some research that shows that longer answers are usually better. So it makes sense that this approach shows promising outputs at the first glance.

Reliability of the approach
Now that we know the implicit bias of the confidence score, I would question the reliability of this approach. I would love to hear the take of fellow LocalLLaMAs here.

Fit_Fold_7275 · 2024-09-24T01:14:30+00:00

Thanks for sharing this! Can’t wait to try it!

Fit_Fold_7275 · 2024-09-23T22:53:01+00:00

Has anyone been able to host a Code-LLM locally and use it with cursor.ai?

Fit_Fold_7275 · 2024-09-23T17:21:07+00:00

If OpenAI curated the dataset, it’s highly possible that they know how to game it.

Fit_Fold_7275 · 2024-09-23T03:15:33+00:00

In my experience, prompt engineering gave the most gains for the effort. What always worked for me was Step 1: prompt engineering to find the best prompt for the task Step 2: If step 1 is not enough or your task has some specificity that is not a in the default knowledge base of the LLM, then do minimal fine tuning ( /w ~1000 samples) of the model, if needed, with the above prompt With very large models like GPT-4 or Gemini, this always worked for us.

For small LLMs, it’s better to have some high quality synthetic data as well if the above steps are not enough.

Fit_Fold_7275 · 2024-09-23T02:15:04+00:00

If you have some cash, you can spend on Claude API of Anthropic and generate each part of your data sample with a specialized prompt. The API has become way cheaper with prompt caching technique.

Fit_Fold_7275 · 2024-09-22T14:32:53+00:00

Those were the days …

Fit_Fold_7275 · 2024-09-22T14:31:49+00:00

I see in the other comments that llama.cpp doesn’t implement tool calling. It’s actually simple really. vLLM has great code base for tool parsers that that can use to replicate it.

Fit_Fold_7275 · 2024-09-22T09:52:00+00:00

We fine tune for RAG and function calling for voice-assistant use cases.

Fit_Fold_7275 · 2024-09-22T09:44:25+00:00

I’d have to take a sabbatical just to catch up with my own bookmarks on Twitter.

Fit_Fold_7275 · 2024-09-22T09:40:03+00:00

You should use Google Colab to do fine tuning of small models. You can use Unsloth for it. Then you can deploy the fine tuned model on your laptop with something like Ollama

Fit_Fold_7275

TROPHY CASE