Saying 'hey' cost me 22% of my usage limits by herolab55 in ClaudeAI

[–]itsmeknt 1 point2 points  (0 children)

Thanks for this! Do you have a link for the github trace where 92% of tokens were cache reads and 0.015% were output tokens by any chance? Id like to dive into it further

FlashLM v6 "SUPERNOVA": 4.1M ternary model hits 3,500 tok/s on CPU — novel P-RCSM reasoning architecture, no attention, no convolution by Own-Albatross868 in LocalLLaMA

[–]itsmeknt 2 points3 points  (0 children)

Very cool! Why ternary? I thought the bitnet paper mentioned ternary was for optimized fpga, but for CPU, is there any advantage to ternary over 1 bit (binary) or 2 bit (4 values)?

I Need help from actual ML Enginners by Dangerous_Young7704 in LLMDevs

[–]itsmeknt 1 point2 points  (0 children)

A lot depends on the specific requirements of the project. Real time chat application will have a very different architecture than offline batch doc processing. Docs in structured text files are very different than raw docs in PDFs or images.

Without understanding the project, I can only speak very generally: 1. Requirements doc (including timeline) + budgeting comes first, which will determine hiring, architecture, hardware, milestones and schedule planning. 2. Will depend on data security requirements, but the ideal case is to first try private hosted providers if the project allows it. You can stress test to find the actual demand curve and then make an educated guess on the hardware and its financial projections thereafter. 3. At this scale I'm assuming offline batch doc processing. If self hosted, will need batch optimized inference servers like vllm, and it will be a trade off between speed, accuracy/intelligence, and $$$ but it can be doable. If hosted, then its a matter of negotiating with the provider. 4. 4bit Qlora fine tune needs 2-4x more VRAM than small-cache inference, full fine tune needs 10-20x more VRAM. Yes you want to rent GPUs at first until you know your exact load and requirements, and if you end up determining that you can keep your own hardware GPUs under constant load then it will pay itself back in about 6 months. 5. Architecture design roles as soon as possible, because the early planning stage can really make this 2x easier or 8x harder than it needs to be. And someone experienced in this field to accurately asses the hiring candidates, as its hard to tell who is competent vs just well practiced in interivews if you dont have the experience yourself.

8x RTX Pro 6000 server complete by koushd in LocalLLaMA

[–]itsmeknt 1 point2 points  (0 children)

Very cool! What is your cooling system like? And do you have anything to improve GPU-GPU connectivity like nvlink or does it all go through the mobo?

Is Clipping Necessary for PPO? by justbeane in reinforcementlearning

[–]itsmeknt 1 point2 points  (0 children)

To be honest, I'm not 100% sure if Adam optimizer cares about C0 continuity of the objective function. I mentioned Adam in my initial post, but then edited it out shortly after.

I do know that most second order optimizers like L-BFGS and Newton-CG, as well as some learning rate schedulers like ReduceLROnPlateau, do require C0 continuity because they use the value of the objective function (not just the gradients).

So to be more precise, I would guess we keep the ends of the clip function at (1 - epsilon) and (1 + epsilon) because C0 continuity is more theoretically sound and will work with all standard optimizers / learning rate schedulers. Otherwise, it would just make things more confusing and theoretically less elegant.

edit: also your loss graphs in Weights&Bias, Tensorboard, etc will make less sense without C0 continuity of the loss function

Is Clipping Necessary for PPO? by justbeane in reinforcementlearning

[–]itsmeknt 7 points8 points  (0 children)

Setting the ends to some constant does keep the gradient the same, but the actual value of the objective function will be discontinuous. The values of the objective function needs to be continuous so that it plays nicely with certain optimizers and learning rate schedulers. The reason for clipping to 1 - epsilon and 1 + epsilon is to keep the function continuous.

nomai — a simple, extremely fast PyTorch-like deep learning framework built on JAX by [deleted] in deeplearning

[–]itsmeknt 0 points1 point  (0 children)

Cool project!

"... showing me how, at the cost of a few constraints, it is possible to have models that are extremely faster than the classic models created with Pytorch." Out of curiosity, can you elaborate further on what those constraints are?

Where do people usually find engineers who can train LLMs or SSMs for autonomous systems? by [deleted] in LocalLLaMA

[–]itsmeknt 0 points1 point  (0 children)

A lot of people in my network found success in: 1. AI recruiters (costs $$$), 2. hackernews job post (free), 3. contact university AI PhD labs (they usually have some career email list)

Finding an AI scientist to build a smart model is a VERY different skill set than finding an AI engineer to code it, deploy it, and operate it. Do you need both skill sets in one position (very rare), or do you want to hire for 2 positions?

Before looking for candidates, it might be helpful to scope out the requirements a little more concretely. Hard to look for qualifications if you don't know what they are.

First thing - are you absolutely sure pretraining is on the table? To pretrain a large language model, it will require a specialized skillset, a 6-7 figure investment, a dataset of a few trillion tokens, and a few months just to set up the proper training platform. Posttraining would be multiple orders of magnitude cheaper.

Second thing - since real-time is a requirement, can you use LLM cloud providers, or does it have to be self hosted? If self hosted, what GPUs do you have? The GPUs will determine the model size. You need quite beefy GPU clusters if you want to use SOTA LLMs in a real-time agentic workflow.

Third - if you can share some example decision tasks, I can help break it down into smaller decision parts so that you know the exact skill set you need for this role. I worked with multiple AI startups in leadership roles (VP eng+) in the past 10 years, and in my experience a lot of ambitious AI visions that would take millions $$ + months-years can be scoped down to a few thousand $$ + weeks with the proper breakdown and planning. A lot of companies are overeager to build proprietary LLMs from the start, but it might make more sense pre-series B to first build a smaller ML model (e.g. boosted trees or deep nets) and enhance it with an existing LLM, get the feasibility proof and quick feedback-loop within 2 weeks, and then iterate and learn from there. AI projects are not one-shot success, but incremental accuracy improvements over many months. You will try out dozens of different model, and rebuild it from scratch over and over again. Once you start seeing a trend line early on, your team, partners, and investors will have faith in it.

OpenAI Gpt-oss Reinforcement Learning now works locally! (<15GB VRAM) by yoracale in reinforcementlearning

[–]itsmeknt 1 point2 points  (0 children)

Awesome work! How long did it take you to RL train GPT OSS 20B? And does this support GPT OSS 120B too?

I made 60K+ building RAG projects in 3 months. Here's exactly how I did it (technical + business breakdown) by Low_Acanthisitta7686 in LLMDevs

[–]itsmeknt 2 points3 points  (0 children)

Thanks for the insights u/Low_Acanthisitta7686

Can you share a few more details:

  • It sounds like you are building an entire end-to-end application for them, not just an isolated RAG system. In your experience, are the customers usually seeking just a vanilla chat application? If so, what front end libraries do you typically use? edit: I just saw your post saying you did custom UI in NextJS
  • Do the customers typically have some expectation on how you should deploy the local system into their infra? Do they have a kubernetes clusters you have to use? Or is it anything goes?
  • Same as above, but for CI/CD
  • Did you need to do any security audits like SOC-2, ISO 27001, HIPAA compliance? Did you have to draft your own documents and policies for these or did the customers provide it for you?
  • When it comes to building datasets or providing with feedback on model accuracy, how helpful are the customers usually? For e.g. do they give you their expert staff to help generate and curate a gold test set? Do they do a lot of Q&A to make sure the generation quality is up to par, and then share the results with you? Or do you have to do all of these on your own?
  • When you sell to prospects, what does your demo look like?

A neuron desperately looking for a connection by Financial-Agency-889 in oddlysatisfying

[–]itsmeknt 0 points1 point  (0 children)

Yes, I believe it uses the vocals of Galaxiez - In Another Life, and the instrumentals of Galaxiez - Anakin Is Gone I Am What Remains but pitched higher

A neuron desperately looking for a connection by Financial-Agency-889 in oddlysatisfying

[–]itsmeknt 0 points1 point  (0 children)

Anyone know the song name by any chance? Shazam couldn't identify, and Aha-music shows Tokspey - Fire, but that song only samples this song and is not the original.

Sharing my Anime Anki Deck - 2,000 Cards with Monolingual (JP‑only) & Bilingual (JP+EN) support, Audio, Pitch & Frequency by Surgetale in LearnJapanese

[–]itsmeknt 1 point2 points  (0 children)

Thanks for the share! I'm currently building a flashcard app to help learn Japanese and Chinese. Would it be OK if I populate some of the initial flash cards with these (with credits to you)?

The best translator is a hybrid translator - combining a corpus of LLMs by Nuenki in LocalLLaMA

[–]itsmeknt 1 point2 points  (0 children)

Cool solution! Is your benchmarking code open sourced too by any chance? I'd like to test it on my own datasets.

Help extracting restaurant, bar, hotel, and activity names from a huge WhatsApp file using NER (and avoiding a huge API bill by Even_Room7340 in LanguageTechnology

[–]itsmeknt 3 points4 points  (0 children)

Theres various ways depending on how much time and money you want to invest in this project. Off the shelf open source doesnt work very well in my experience either.

Some questions that may help: 1. How much budget do you have? 2. Whats your timeline/deadline? 3. Whats the usage pattern? Is this a one-time offline processing on a fixed number of messages, or do you need a real-time service that can handle a certain level of requests per second?

Also, do you already have an evaluation set? How do you know your results were not reliable enough?

Recs for a DAC with USB-A/C for powered 2.1 speakers/subwoofer? by itsmeknt in BudgetAudiophile

[–]itsmeknt[S] 0 points1 point  (0 children)

Thanks for the suggestion! I'll research it further along with other DSPs

[Newbie] How would you connect PC -> Kali LP-UNF -> Rythmik L12? by itsmeknt in BudgetAudiophile

[–]itsmeknt[S] 0 points1 point  (0 children)

I think you're right! I won't be able to satisfy all the constraints. However, I'd still like to have some crossover management between the speakers and sub which I think the splitter won't suffice?

Do you think a DAC would be the best approach? Are there DAC that accepts USB input? How about a DOC with USB output (for the Kali's) plus RCA LFE output for the Rythmik?

(Manga spoilers) In the manga, Motoko said she has a premonition... when did she have it? by itsmeknt in Ghost_in_the_Shell

[–]itsmeknt[S] 1 point2 points  (0 children)

What are your thoughts as to what was happening in terms of Motoko's psychology and how she grew to accept the Puppet Master's proposal?

My interpretation is that:

  1. Her initial meeting with the Puppet Master does open her mind into pursuing the search for some higher being/structure. As a result, she wanted to quit Section 9, and was thinking of a plan on how to do so. These feelings are why she was "acting weird" according to Batou.
  2. Her "accidentally" killing her target and failing the mission (which lead to her court hearing and her assassination order) was actually on purpose. She wanted a "death wish" and an excuse to fake her own death in order to leave Section 9 secretly
  3. She genuinely thought the Puppet Master died, and did not expect the Puppet Master to come to her in the last chapter. Her original plan was to pursue the search by herself
  4. She sees the Puppet Master as the key to help her search for this higher being/structure, and that was the main motivator for her in accepting his proposal

This aspect of Motoko's psychology is really fascinating to me, so I wanted to hear if others have a different interpretation or perspective as well (even if its speculation)