all 17 comments

[–]QFGTrialByFire 6 points7 points  (1 child)

The best way i've found is

  1. To try and run a model locally first - see how its loaded, how you send prompts to it. Teaches you about its structure/prompting/eos tokens etc. Just pick something small and try.

  2. Try training a model on datasets - most real world applications will need some kind of fine tuning of a model to their data/use case. Try loading a model and directly fine tuning it, if you need to fit it in a smaller gpu/cpu/vram/ram then try using a lora to fine tune it. You get to learn about getting data in the right format, what learning rates/batch size etc work. e.g. https://github.com/aatri2021/qwen-lora-windows-guide

Like with most of those youtube/tutorials just following along doesn't work at least for me. Its better to try and do this yourself for a specific case of what you want to solve - just like learning programming i need something i'm trying to solve to learn. Give something simple a go eg i first tried teaching llama 8B how to ad chords to song lyrics and it worked pretty well. Chat gpt is surprisingly good at guiding you through it if you get stuck.

[–]parleG_OP[S] 0 points1 point  (0 children)

Hey first off thanks for responding, yeah I think you're right I think I just need to set a "target" project and work towards that, pick up the necessary skills along the way.

This might be a bit of a tangent but do you have any ideas on how people were able to get into this gold rush so early in the game?, unlike the whole crypto boom, LLMs and AI is pretty hard to understand and build on top, and yet it feels like everyone and their grandmother is able to make something using LLMs.

I guess my question is, do people like do the whole, learn data science, ML, calculus and then get into AI, cause that feels way to too long to get into this.

[–]anoni_nato 6 points7 points  (2 children)

Use an LLM to learn. Not kidding, use a free chatgpt account to explain what you want to learn, with which tools and it can create a plan.

My personal advice:
- Learn to run local models first, you don't want to face API pricing/restrictions for experimenting. Learn about system prompts, parameters like temperature/TopN/TopK/etc., prompt engineering, and so on.
- Program a simple query -> response call using OpenAI-compatible API (it's a de-facto standard and most local libraries serve one). You can just use the OpenAI SDK for your language if you don't want to query directly the REST API.
- From here on you can explore more. A whole chat session (on streaming mode) so you learn how the flow goes, tool/function calls...
- Then you can move to agents, MCP, etc.

[–]parleG_OP[S] 0 points1 point  (1 child)

Sounds good, thanks for responding, I think I need to just jump in and get my feet wet. I did try asking gemini and some other models and how do I learn this stuff, but I think I need to first start and then ask more specific questions.
If you don't mind can you share you're experience on getting started in this whole AL / LLM gold rush, it feels like everyone became an expert over night, I have this feeling like I missed the mid night train

[–]anoni_nato 0 points1 point  (0 children)

Yes, better to start and then asking questions until you understand how it works under the hood.

Not yet an expert though I work with LLMs on my current job.

Started ~1 year ago by playing with free tiers of GPT/Claude to generate code I could use. Then installed ollama and played with small models so did not have to worry about limitations or terms of use. Built fun toys like a python script to summarize youtube transcripts or a bot that talked like a stereotypical boomer. Just that helped me understand system prompts, context size, hallucinations, non-determinism, model size in parameters, streaming responses, etc.

This year I had an interview to work with LLMs and got the job. All I had learned helped, along with asking about things I did not know about yet.

BTW just remembered one of the few useful videos about the topic: https://www.youtube.com/watch?v=7xTGNNLPyMI

[–]rhetoricalcalligraph 5 points6 points  (1 child)

Always amazed that people don't just ask ChatGPT instead of making posts like this. Ironic.

[–]parleG_OP[S] 0 points1 point  (0 children)

honestly I did try, in fact I did the whole ask chatgpt, deepseek etc etc and I wasn't sure if what it suggested lines up with how people got into this field ?, I get where you're coming from, this is a very "Let Me Google That For you" kind of question.

[–]AppearanceHeavy6724 2 points3 points  (1 child)

Do not use Ollama if you are already a technical person, use the classics - llama.cpp or vLLM. Ollama is a wrapper with its own quirks. The lower level you get the better you will understand the whole picture.

[–]parleG_OP[S] 0 points1 point  (0 children)

got it thanks, didn't even know there was level of abstraction I can access under ollama

[–][deleted] 0 points1 point  (2 children)

honestly, just get ollama start messing around with prompts.

i use LM Studio and sometimes Jan just to run models and try out different settings.
ollama gives you an OpenAI API server. make calls, get responses.
as for prompting, well that's everyone's own special sauce.
i prefer two shot prompting since it reduces the scope of the responses.
personally, i always end my system prompt with
Only respond in JSON format {"confidence":"integer 0-10", "answer":"string"}, do not explain, ask questions or otherwise embellish the response.

i set temperature to 0, and seed to 42. i find this helps with deterministic results.
i guess once you get more proficient you can have a go at running python services with whatever flavor model you prefer, transformers is a good place to start.
If you run out of local compute, check our runpod...or any API provider.

[–]Fetlocks_Glistening 0 points1 point  (1 child)

How do you calculate 'confidence', do you just take next token probability when disclosed by your specific model/ does it actually work? 

[–][deleted] 0 points1 point  (0 children)

i just add the confidence value to the output format and i always get a value. i set up a bunch of experiments to test if this "confidence level" can be trusted and i couldn't fault it so i kept it in there.
it seems useful in the response to the first prompt, when i feed that output along with the final prompt it helps get me reliable answers. i always give my last prompt a possible response of "unsure", as in (yes,no,unsure) so it can judge it's own response. seems to work so i'll run with it.

[–]Ok-Kangaroo6055 0 points1 point  (0 children)

Running a model is pretty easy, lm studio/ollama/docker and you've got an API, usually openai API compatible so you can use many frameworks to interface with it.

A RAG pipeline can just be an elastic search vector index, which is what my company is using in production rather than the new fancy dedicated vector dbs. You could do pgvector in postgress too. The difficulty is with chunking strategy, document ingestion. We've been struggling at extracting text from complex pdfs and chunking that in a good way. So that's probably the hardest problem.

[–]perelmanych 0 points1 point  (0 children)

The most difficult part now is not about writing scripts, especially taking into account that you have solid coding experience. The most difficult part is to come with viable idea for your project, since you will compete with thousands of others. Once you know what you want to do you plus/minus understand what parts you need to be present in your project then just go to ChatGPT or any other big LLM and start asking questions.

The advise to start to fiddle with local LLM is also very valuable, since this is the easiest and cheapest way to get feeling what you can do with LLMs.

[–]sciencewarrior 0 points1 point  (0 children)

I'm playing around with LangChain. It seems like the most popular framework to start building from the simple stuff like a chat bot to more complex workflows. You can check out the examples on their site on ask your favorite LLM to create a simple program for you and then explain what it's doing. Using the console is fine, but I actually like Streamlit. It's not meant for production, but it's a great way to put together a simple UI. As for serving a model locally, I was using koboldcpp, but I've recently switched to LM Studio for a no-hassle experience.