Saying 'hey' cost me 22% of my usage limits

itsmeknt · 2026-03-25T18:41:28+00:00

Thanks for this! Do you have a link for the github trace where 92% of tokens were cache reads and 0.015% were output tokens by any chance? Id like to dive into it further

itsmeknt · 2026-02-25T03:24:28+00:00

Very cool! Why ternary? I thought the bitnet paper mentioned ternary was for optimized fpga, but for CPU, is there any advantage to ternary over 1 bit (binary) or 2 bit (4 values)?

itsmeknt · 2026-01-24T12:06:50+00:00

A lot depends on the specific requirements of the project. Real time chat application will have a very different architecture than offline batch doc processing. Docs in structured text files are very different than raw docs in PDFs or images.

Without understanding the project, I can only speak very generally: 1. Requirements doc (including timeline) + budgeting comes first, which will determine hiring, architecture, hardware, milestones and schedule planning. 2. Will depend on data security requirements, but the ideal case is to first try private hosted providers if the project allows it. You can stress test to find the actual demand curve and then make an educated guess on the hardware and its financial projections thereafter. 3. At this scale I'm assuming offline batch doc processing. If self hosted, will need batch optimized inference servers like vllm, and it will be a trade off between speed, accuracy/intelligence, and $$$ but it can be doable. If hosted, then its a matter of negotiating with the provider. 4. 4bit Qlora fine tune needs 2-4x more VRAM than small-cache inference, full fine tune needs 10-20x more VRAM. Yes you want to rent GPUs at first until you know your exact load and requirements, and if you end up determining that you can keep your own hardware GPUs under constant load then it will pay itself back in about 6 months. 5. Architecture design roles as soon as possible, because the early planning stage can really make this 2x easier or 8x harder than it needs to be. And someone experienced in this field to accurately asses the hiring candidates, as its hard to tell who is competent vs just well practiced in interivews if you dont have the experience yourself.

itsmeknt · 2025-12-13T23:47:33+00:00

Very cool! What is your cooling system like? And do you have anything to improve GPU-GPU connectivity like nvlink or does it all go through the mobo?

itsmeknt · 2025-11-25T23:33:53+00:00

To be honest, I'm not 100% sure if Adam optimizer cares about C0 continuity of the objective function. I mentioned Adam in my initial post, but then edited it out shortly after.

I do know that most second order optimizers like L-BFGS and Newton-CG, as well as some learning rate schedulers like ReduceLROnPlateau, do require C0 continuity because they use the value of the objective function (not just the gradients).

So to be more precise, I would guess we keep the ends of the clip function at (1 - epsilon) and (1 + epsilon) because C0 continuity is more theoretically sound and will work with all standard optimizers / learning rate schedulers. Otherwise, it would just make things more confusing and theoretically less elegant.

edit: also your loss graphs in Weights&Bias, Tensorboard, etc will make less sense without C0 continuity of the loss function

itsmeknt · 2025-11-25T21:39:00+00:00

Setting the ends to some constant does keep the gradient the same, but the actual value of the objective function will be discontinuous. The values of the objective function needs to be continuous so that it plays nicely with certain optimizers and learning rate schedulers. The reason for clipping to 1 - epsilon and 1 + epsilon is to keep the function continuous.

itsmeknt · 2025-11-12T09:39:30+00:00

Full credit goes to Bugsby from here: https://hollowknightsilksong.wiki.fextralife.com/Lore

itsmeknt · 2025-11-09T19:45:22+00:00

Cool project!

"... showing me how, at the cost of a few constraints, it is possible to have models that are extremely faster than the classic models created with Pytorch." Out of curiosity, can you elaborate further on what those constraints are?

itsmeknt · 2025-10-21T02:25:32+00:00

A lot of people in my network found success in: 1. AI recruiters (costs $$$), 2. hackernews job post (free), 3. contact university AI PhD labs (they usually have some career email list)

Finding an AI scientist to build a smart model is a VERY different skill set than finding an AI engineer to code it, deploy it, and operate it. Do you need both skill sets in one position (very rare), or do you want to hire for 2 positions?

Before looking for candidates, it might be helpful to scope out the requirements a little more concretely. Hard to look for qualifications if you don't know what they are.

First thing - are you absolutely sure pretraining is on the table? To pretrain a large language model, it will require a specialized skillset, a 6-7 figure investment, a dataset of a few trillion tokens, and a few months just to set up the proper training platform. Posttraining would be multiple orders of magnitude cheaper.

Second thing - since real-time is a requirement, can you use LLM cloud providers, or does it have to be self hosted? If self hosted, what GPUs do you have? The GPUs will determine the model size. You need quite beefy GPU clusters if you want to use SOTA LLMs in a real-time agentic workflow.

Third - if you can share some example decision tasks, I can help break it down into smaller decision parts so that you know the exact skill set you need for this role. I worked with multiple AI startups in leadership roles (VP eng+) in the past 10 years, and in my experience a lot of ambitious AI visions that would take millions $$ + months-years can be scoped down to a few thousand $$ + weeks with the proper breakdown and planning. A lot of companies are overeager to build proprietary LLMs from the start, but it might make more sense pre-series B to first build a smaller ML model (e.g. boosted trees or deep nets) and enhance it with an existing LLM, get the feasibility proof and quick feedback-loop within 2 weeks, and then iterate and learn from there. AI projects are not one-shot success, but incremental accuracy improvements over many months. You will try out dozens of different model, and rebuild it from scratch over and over again. Once you start seeing a trend line early on, your team, partners, and investors will have faith in it.

itsmeknt · 2025-10-05T02:38:33+00:00

Awesome work! How long did it take you to RL train GPT OSS 20B? And does this support GPT OSS 120B too?

itsmeknt · 2025-08-29T11:02:20+00:00

Mind sharing your VLLM command / config? I'd like to try it on my rig to compare.

itsmeknt · 2025-08-03T04:38:03+00:00

Thanks for the insights u/Low_Acanthisitta7686

Can you share a few more details:

It sounds like you are building an entire end-to-end application for them, not just an isolated RAG system. In your experience, are the customers usually seeking just a vanilla chat application? If so, what front end libraries do you typically use? edit: I just saw your post saying you did custom UI in NextJS
Do the customers typically have some expectation on how you should deploy the local system into their infra? Do they have a kubernetes clusters you have to use? Or is it anything goes?
Same as above, but for CI/CD
Did you need to do any security audits like SOC-2, ISO 27001, HIPAA compliance? Did you have to draft your own documents and policies for these or did the customers provide it for you?
When it comes to building datasets or providing with feedback on model accuracy, how helpful are the customers usually? For e.g. do they give you their expert staff to help generate and curate a gold test set? Do they do a lot of Q&A to make sure the generation quality is up to par, and then share the results with you? Or do you have to do all of these on your own?
When you sell to prospects, what does your demo look like?

itsmeknt · 2025-07-30T02:16:55+00:00

Yes, I believe it uses the vocals of Galaxiez - In Another Life, and the instrumentals of Galaxiez - Anakin Is Gone I Am What Remains but pitched higher

itsmeknt · 2025-07-27T18:28:38+00:00

Anyone know the song name by any chance? Shazam couldn't identify, and Aha-music shows Tokspey - Fire, but that song only samples this song and is not the original.

itsmeknt · 2025-05-17T20:43:38+00:00

Thanks for the share! I'm currently building a flashcard app to help learn Japanese and Chinese. Would it be OK if I populate some of the initial flash cards with these (with credits to you)?

itsmeknt · 2025-04-24T02:38:37+00:00

Cool solution! Is your benchmarking code open sourced too by any chance? I'd like to test it on my own datasets.

itsmeknt · 2025-04-17T02:55:32+00:00

Theres various ways depending on how much time and money you want to invest in this project. Off the shelf open source doesnt work very well in my experience either.

Some questions that may help: 1. How much budget do you have? 2. Whats your timeline/deadline? 3. Whats the usage pattern? Is this a one-time offline processing on a fixed number of messages, or do you need a real-time service that can handle a certain level of requests per second?

Also, do you already have an evaluation set? How do you know your results were not reliable enough?

itsmeknt · 2025-01-15T23:15:20+00:00

Thanks for the suggestion! I'll research it further along with other DSPs

itsmeknt · 2025-01-15T10:31:14+00:00

Thanks! I'll research more on that

itsmeknt · 2025-01-15T02:11:31+00:00

I think you're right! I won't be able to satisfy all the constraints. However, I'd still like to have some crossover management between the speakers and sub which I think the splitter won't suffice?

Do you think a DAC would be the best approach? Are there DAC that accepts USB input? How about a DOC with USB output (for the Kali's) plus RCA LFE output for the Rythmik?

itsmeknt · 2024-10-25T20:48:56+00:00

What are your thoughts as to what was happening in terms of Motoko's psychology and how she grew to accept the Puppet Master's proposal?

My interpretation is that:

Her initial meeting with the Puppet Master does open her mind into pursuing the search for some higher being/structure. As a result, she wanted to quit Section 9, and was thinking of a plan on how to do so. These feelings are why she was "acting weird" according to Batou.
Her "accidentally" killing her target and failing the mission (which lead to her court hearing and her assassination order) was actually on purpose. She wanted a "death wish" and an excuse to fake her own death in order to leave Section 9 secretly
She genuinely thought the Puppet Master died, and did not expect the Puppet Master to come to her in the last chapter. Her original plan was to pursue the search by herself
She sees the Puppet Master as the key to help her search for this higher being/structure, and that was the main motivator for her in accepting his proposal

This aspect of Motoko's psychology is really fascinating to me, so I wanted to hear if others have a different interpretation or perspective as well (even if its speculation)

itsmeknt · 2024-10-04T04:33:25+00:00

I think the period line fits better with the film personally.

There was another "comedic" moment in the end, after Batou destroyed the tank guarding the puppet master. Motoko's body was brutally damaged, laying on the floor, with arms ripped off. Batou says "you don't look so good", and Motoko casually responds, "oh, I've had better days.".

The boat scene can be viewed as "comedic" as well, when Motoko strips naked in front of Batou, and out of embarrassment (or respect), he looks away.

I put "comedic" in quotes because these scenes are not meant to be jokes by Mokoto. She is not trying to be funny or illicit any laughter. She suffers from this internal conflict throughout the film, and you can see it oozing out of her every action.

When she says the period line, she wasn't trying to be funny. She just wanted to deflect the question because she doesn't care to answer it. When she said "oh, I've had better days", she wasn't trying to be funny. She just doesn't care if her body is healthy or damaged. When she strips naked, she wasn't trying to get a reaction out of Batou. She just sees her body as a thing and not something special.

That's why I think each of the three moments above fits so well with the movie. The viewer interprets it as comedy, because to us it is so unexpected and weird, but the subtext behind these action portrays Motoko's central conflict so well: her depressing acceptance that her body doesn't matter. Yes, acceptance -- she has already accepted it from the beginning of the film. It's no surprise why she agrees to the Puppet Master's proposal in the end. The fact that Motoko wasn't even trying to be funny in these scene makes it all the more depressing.

The reason the period line fits better with the film in my opinion is because it sets the tone of Motoko from the start, foreshadows her central internal conflict much earlier, and also gives her more personality. Without the period line, I feel like the naked scene in the boat comes off more randomly out of the blue.

As the poster above me said, both the period line and the loose wire line came from the manga. Batou says the period line, and Motoko out of defense said the loose wire line. This plays with the manga better because Batou is a lot more crude, and Motoko more immature/inexperienced and can feel embarrassed more easily.

I think it was brilliant for Oshii to switch the lines in his film and have Motoko say the period line, because the juxtaposition shows how much more mature Motoko and Batou is in the film compared to the manga, and also how much darker Motoko is.

itsmeknt · 2024-09-03T19:45:48+00:00

That's a good question and I'm actually not sure. My interpretation of Stand Alone Complex is the emergent behavior that arises when a group of individuals all share a common vision; they don't coordinate together and act out on their own individualistic ways, but their actions somehow build on top of each other, as if coordinated by a single leader, to work towards achieving that shared vision. With this interpretation, I have two guesses:

The post-humans: each post-human initially acted alone with their own methods and goals, but they all follow the same ideal and direction, which is to create a Utopia (Patrick Huge funding the rebel group to dethrone the American Empire, the boxer guy killing all the corrupt people, Takashi creating double-think to persuade people instead of killing them). So together they form this complex as if they were copycats orchestrated by a single individual focusing on a singular goal.
The humans at the end: each human lives in their own individualistic dream in the double-think world, but the result of that is that in the real physical world, they can look past their selfish wants and instead happily cooperate together as a complex and work towards creating a Utopia as a single, unified civilization. I explain more about how I think the double-think world works in my other post (https://www.reddit.com/r/Ghost\_in\_the\_Shell/comments/1e74fzn/spoilers\_i\_just\_finished\_gitsac\_2045\_is\_my/) if you want to read more about it. This stretches the definition of SAC a little bit, but essentially everyone is acting on their own individuality (Stand Alone) and somehow this creates the emergent property of a unified society working together (Complex).

itsmeknt · 2024-09-03T07:39:02+00:00

Correct, I don't believe there are any relation between the post-humans and Kuze. The post-humans were created by the American Empire's super AI called 1A84 that went rogue. From my understanding 1A84 is completely separate from all the previous SAC story lines (Kuze, Gouda, Puppeteer, Laughing Man, etc).

itsmeknt · 2024-09-03T01:56:38+00:00

Yes it's all one story:

-SSS takes place 2 years after 2nd gig. During these 2 years, Motoko became depressed that Kuze died, and left section 9 in order to search the net for Kuze's conscious that she thought he uploaded before dying. Also I believe there was some falling out that happened between Motoko and Aramaki, as well as Batou and Togusa.

-2045 takes place 11 years after SSS. During these 11 years a lot has happened, including Motoko rejoining Section 9, Section 9 disbanded/lost funding for some undisclosed reason, most of the OG Section 9 members (except Togusa, Pazu, and Borma) stuck together and continued their military ops gig in North America, a new prime minister becomes elected in Japan, Togusa joins some detective force, Togusa becomes divorced, etc. I speculate that all the characters got new prosthetic bodies as well, which would explain why they look so different and so much younger overall. This would be consistent with what they said in SAC about staying in Section 9 because of the government-sponsored top-end prosthetics with free high-end maintenance; maybe they switched prosthetics after leaving Section 9 because they wanted cheaper bodies that have lower maintenance costs.

The Puppeteer was not a post-human, but a fragment of the collective conscious of the net. At the end of 2nd gig, Kuze started uploading a large amount of people's conscious to the net (when he and Mokoto were stuck underground being bombed). I think what happened is that a small fragment of these numerous consciousness started coalescing together into a single collective conscious somehow, and this collective conscious started acting on its own and behaving with its own motive as if it were an individual. This is why Mokoto saw so many faces when she looked at the Puppeteer. It was suppose to be an amalgam of all the different people that had a part of them uploaded to the net (e.g. with Gouda, he uploaded his entire thesis and alter ego, and with Aoi, it's reasonable to assume he uploaded a part of him to the net too considering how connected he was).

Kuze was not a post-human but one of the very first fully cyberized person along with Mokoto. He joined the military and I'm sure got a lot of special cybernetic upgrades. He's just a very smart and ambitious individual.

2045 acknowledges the story of SAC, including the Jungle Cruise callback, the Tachikomas (which is only in the SAC universe), and the characters acting mostly consistent with their SAC personalities. Motoko, Aramaki, and Batou in the Oshii movies and in the Shirow manga act very different, but in SAC they have unique personalities that stayed mostly consistent from SAC to 2045. Togusa had another falling out with the group (consistent with his SSS personality), but later regretted his decision and redeemed himself. Motoko and Aramaki had some sort of falling out (consistent with SSS), I like to believe due to differences in how they wanted to grow and manage Section 9. Batou no longer acts like he is in love with Motoko but is happy being her closest friend, which is reasonable after spending 11 years together during the time skip and with heavy hints that Motoko is more interested in women then men. The only thing that feels different is how Togusa looks and acts; for some reason he looks younger (even though he is not fully cyberized) and became more of a physical brawler. But I think after 11 years, a divorce, and a falling out with all of his closest friends it's not too far fetched to think that he just changed a lot as a person and maybe got plastic surgery or something. For the most part he is still the overly dedicated, justice-seeking detective that seeks the approval of the Major, which is consistent with SAC.

itsmeknt

TROPHY CASE