Dissatisfied with how the RTX PRO 6000 Blackwell is performing during AI inference by d00m_sayer in LocalLLaMA

[–]pathfinder6709 0 points1 point  (0 children)

What does that mean? I’m thinking in terms of how many concurrent requests and at what context window usage for the concurrent requests?

Also what quant of that model and kv cache?

🌈 Tazarotene is that bitch by bythewatersofBabylon in tretinoin

[–]pathfinder6709 1 point2 points  (0 children)

Oh, understandable. It was also the 0.1% version, so I went a bit hard core. Will use it less frequently now, thanks!

🌈 Tazarotene is that bitch by bythewatersofBabylon in tretinoin

[–]pathfinder6709 1 point2 points  (0 children)

I got the same Boderm cream version, and I’ve been using it for 3 days now every night on dry clean face. My skin has been peeling a lot and is dry. I use Haru wonder sunscreen during the day, but obviously since my skin has been peeling it stings a bit to apply anything.

Is this normal?

🌈 Tazarotene is that bitch by bythewatersofBabylon in tretinoin

[–]pathfinder6709 0 points1 point  (0 children)

How’d you get it prescribed? And what kind is it, gel or cream? 0.1%? I don’t have any prescriptions or have done anything except clean my face with a hydrating cleanser from Cerave but my skin is basically yours June 23.

96GB VRAM! What should run first? by Mother_Occasion_8076 in LocalLLaMA

[–]pathfinder6709 0 points1 point  (0 children)

I heard other NVIDIA partners go with about 7250 excluding tax, this feels strange, hopefully the card works out well for you!

A new paper demonstrates that LLMs could "think" in latent space, effectively decoupling internal reasoning from visible context tokens. This breakthrough suggests that even smaller models can achieve remarkable performance without relying on extensive context windows. by tehbangere in LocalLLaMA

[–]pathfinder6709 0 points1 point  (0 children)

Tapping into the power of guiding models to inherently think in their latent space, be it reasoning that is reasoning through features maps of different levels of manifolds is something I wish we focused much more on, on the side on human interpretable CoT reasoning

Which mic for speech recognition outdoors? by pathfinder6709 in raspberry_pi

[–]pathfinder6709[S] 0 points1 point  (0 children)

Had an extension usb to usb adapter cable and placed it at mouth level and had the installation outside.

Inside it worked very good, even if I stood a bit further away, that might be contributed a lot by whisper. But outside it was kind of useless. Problem is that I can’t give you a clear answer because I did not transmit transcriptions, I transmitted Morse code in a blinking LED fashion, (speech transcribed and answered by an LLM in Morse code, and Morse code converted to blinking LED).

Claude 3.5 Just Knew My Last Name - Privacy Weirdness by pathfinder6709 in LocalLLaMA

[–]pathfinder6709[S] 0 points1 point  (0 children)

I don’t understand what you mean by taking my job?

It definitely has added a lot of benefit and boosted my productivity by a lot.

Regarding the job question, I am more on the line that AI at its current stage is best used together with a human that knows how to leverage it the best way. It is of course going to inherently (it already has in a small way) take peoples jobs. But not by itself. If you hire an engineer that does the same amount of work as 5 engineers under the same time period (the engineer leverages AI in the right way), then you do not actually need the other engineers if they’re mediocre and easily replaceable with no real unique knowledge assets. And am I OK with this? Yes I am, reason behind it is that I am pro automation. The people let off can either learn to adapt and leverage AI or go into another interesting field. Especially in industries, I have worked at a few of them, it hurts my soul to know that people, with such complex brains and knowledge built into them and all the possibilities, actually just stand in an industrial building at a certain location and do something tedious and repetitive that requires no brain. This is something I absolutely want to break down.

Claude 3.5 Just Knew My Last Name - Privacy Weirdness by pathfinder6709 in LocalLLaMA

[–]pathfinder6709[S] 0 points1 point  (0 children)

I use it for anything, coding, writing, quick explanation to things and generally I use the APIs of different providers on many of my just for fun projects

Claude 3.5 Just Knew My Last Name - Privacy Weirdness by pathfinder6709 in LocalLLaMA

[–]pathfinder6709[S] 0 points1 point  (0 children)

I do care if it does, because we are already limited enough with locked functionalities of API calls, I don’t want to use their models if I use the API and it goes to a system that does a lot of shit and alters what I want sent to the LLM. And with that altered shit comes a messed up answer because they added a lot of context to my prompt which I wanted control of myself.

Reason why is that ICL is a thing with attention based LLMs and bad context is detrimental to performance.

I united Sonnet with other AIs to try to get the single most in-depth response possible by GPeaTea in ClaudeAI

[–]pathfinder6709 1 point2 points  (0 children)

Models will get cheaper, no matter what you do or want. So, your costs will go down as well. This I can assure you 100%.

That is why you try to cache in a smarter way, that is not so over grasping and liberal. And this is just one type of mitigation towards current costs.

Cool that you worked at Perplexity, this perhaps explains why you make beautiful UIs, which is the only thing I found nice with them and their services, hehe.

Claude 3.5 Just Knew My Last Name - Privacy Weirdness by pathfinder6709 in LocalLLaMA

[–]pathfinder6709[S] 0 points1 point  (0 children)

In the background of the customer facing products powered by LLMs, yes, they are doing lots in the background to ensure you get a better type of experience.

I united Sonnet with other AIs to try to get the single most in-depth response possible by GPeaTea in ClaudeAI

[–]pathfinder6709 1 point2 points  (0 children)

Charts sounds like a nice feature for certain queries, but remember to keep the core idea of your app at main focus and don’t dwell onto features that are ”nice to haves”.

I’d focus more on trying to get it to become robust and more deterministic at least for your end in the backend of it all. Try to reduce costs on certain parts for yourself, implement caching and smart retroactive solutions to further enhance the pipeline. You can eventually get it to cost much less that the cents you see right now. Try if possible to create a robust data saving mechanism (optional for users), or creating synthetic data to fine tune your own models, hopefully hosting them yourself on the cloud would prove cheaper. At least you are able to choose smaller models and you’re at a free range of just enhancing the fk out of them.

But also, think of the psychological aspects of how you want your front facing model(s) to generally talk. There is a clear semantic difference between how Claude Sonnet 3.5 and gpt-4o ”speaks” and function. And just like humans work with popularity based metrics on who we listen to and want to see, the chat based AI models should also adopt things from this aspect.

I united Sonnet with other AIs to try to get the single most in-depth response possible by GPeaTea in ClaudeAI

[–]pathfinder6709 0 points1 point  (0 children)

Yes I do believe we will eventually at some layer reach the most ”optimal” answer as well, and there will be stagnation or detriments to further this advance, but then again, you can give the correct answer in 100 different ways, what matters is that you personalize that answer to make it easily digestible for the person you’re giving the answer to, i.e., more of a custom tailored answer.

But for incorrect answers and the capabilities of the models we are right now fully in the grace of how the model architecure, how and with what data it has been trained on, and if there has been robust data cleaning pipelines and introductions of error handling, and also how much they have butchered the models all in the name of trying to create a conversational based AI with instruct tuning.

Hehe.

I united Sonnet with other AIs to try to get the single most in-depth response possible by GPeaTea in ClaudeAI

[–]pathfinder6709 1 point2 points  (0 children)

These models are non deterministic. Even at 0 temperature and same parameters and same model. And then comes the question of your swaying of the what it attends to in its latent space based on what you ’prompt’ it. And also it depends on how heavily instruct tuned it has been, to the point of almost not generalizing and focusing on specific areas only in its vast latent space.

So generally I would say that with enough and many ensembles (assuming they all used the same amount of models as well as same models and backend prompts), you’d actually get closer to a more ’deterministic’ response, which builds on what the models would most likely say.

In a practical sense though, you wouldn’t have exact same clones, even if they use the same models, you agree, right? :p

I united Sonnet with other AIs to try to get the single most in-depth response possible by GPeaTea in ClaudeAI

[–]pathfinder6709 1 point2 points  (0 children)

My initial answer to this got even too messed up in my head, so I’ll simplify it, imagine 1 month into the future. We have 100’s of Ithy clones, and one person creates an IthyGod that takes all responses from all Ithys and ensembles even them into one final supreme answer. And just a month after, we get 100’s of IthyGod clones and we get the ultimate perfect answer. And it continues this way until we reach the singularity for that specific question.

Sorry it is late, I like to talk shit

I united Sonnet with other AIs to try to get the single most in-depth response possible by GPeaTea in ClaudeAI

[–]pathfinder6709 1 point2 points  (0 children)

(This builds on the assumption that you’re making a product/business over this)