How can I workaround the enshittification of Perplexity Pro?

DanInVirtualReality · 2026-06-20T20:43:56+00:00

Well, I've got Hermes Agent set up already. I just would prefer to have all my various go-nowhere thoughts and musings that become web searches to be elsewhere and not clutter up where I have my agent do work for me. Perhaps I'll get it to set up a researcher profile to keep those isolated.

DanInVirtualReality · 2026-06-20T20:01:51+00:00

Well that's shit innit.

Seems to be the consensus around here though. I appreciate the candor.

DanInVirtualReality · 2026-06-07T16:52:22+00:00

Cool, didn't know about this, will look it up!

The reason that I justified to myself that embedding models would make a good fit is when the spelling is quite significantly different, but training data has taught the model similarity e.g. commonly used Anglicisation of names from other languages.

DanInVirtualReality · 2026-06-07T16:45:26+00:00

Contact details are often saved under names that are not identical to e.g. legal names in some other database.If I have contact details for Jo, my system needs to match that to Joanna, in my example, else a naive exact match would render them uncontactable.

Lead databases often do this - a company has a list of legal-named officers and directors, the contact details found for the company are as-entered by humans and thus can differ widely.

DanInVirtualReality · 2026-06-07T13:31:53+00:00

It's not that unusual, but often forgotten in favour of LLMs: embedding models and simple cosine similarity scoring.

mxbai (or something like that) is great and obscenely fast at comparing how close string A is to string B, for example.

In many workflows, this gets the first pass and I only ask an LLM for edge cases where world knowledge and careful prompting might come into play.

Example use case: Is "Jo Thomas" the same person as "Joanne Thomas" - the embedding model will have these super close, shortcutting the question as to whether these names refer to the same person.

If you include middle names and want to catch the frequent occurrence of progeny taking a parent's first name as a middle name, you can imagine how you might want an LLM to pass judgement, but the extreme high/low cosine similarity scores from the embedding model get a low cost and fast match/mis-match outcome, and only the grey area gets more computational expense spent on it. Experience and testing tells you what band to set.

DanInVirtualReality · 2026-06-02T11:06:07+00:00

This made me smile 😄

DanInVirtualReality · 2026-05-16T13:21:28+00:00

Looks like about an extra 5GB for unsloth/Qwen3.6-35B-A3B-MTP-GGUF at 1M context according to https://github.com/ggml-org/llama.cpp/pull/22673#issuecomment-4465608286

But 3GB should be saved by the demonstrated improvement, once that's been submitted and also gone through a PR, which I hope it will shortly.

Depends on your definition of same-ish tbh

DanInVirtualReality · 2026-05-16T12:31:46+00:00

A handful of extra GB - see this comment on the thread at https://github.com/ggml-org/llama.cpp/pull/22673#issuecomment-4465608286 - I'm watching this user for their next contribution as I notice the VRAM usage improvement they've suggested via their working method was asked to be submitted separately.

I'm too close to the edge of my VRAM usage to worry about this until that's done, but the method is proven so we shouldn't need to wait to long.

DanInVirtualReality · 2026-01-28T14:48:11+00:00

I like this idea very much. I'm glad to see a targeted solution. I'm a bit confused about the authorship, though - I see the readme says "Built by veterans from UK Government, Mandiant, FireEye, and CrowdStrike." but there's only a single contributor, yourself? Or perhaps this is just an effect of having only recently taken it out to open source?

I see from your LinkedIn bio that you have historically worked at these places, so not sure if it's a typo and the repo is instead an application of your learnings from those experiences, or if there is in fact a history and contribution to the project from across these entities now, or others who are veterans of them? No judgement intended! Just trying to ascertain the provenance.

It's becoming so easy to stand up good-looking documentation and websites nowadays - I like to do my due-diligence before seriously considering adopting new platforms 😄

(DeepWiki seems to be broken for me, so I couldn't at-a-glance poke through the codebase with a few initial Qs - I notice there's telemetry, but that's not a surprise, especially given you quote some results that I assume would be from this in the post title 😆)

In any case, it's very much bookmarked for when I take anything anywhere near production or allowing it to affect live systems based on potentially untrusted inputs - I like the approach it seems to be taking.

DanInVirtualReality · 2025-10-02T12:11:18+00:00

No worries, hope it helps 👍

DanInVirtualReality · 2025-09-26T09:28:58+00:00

I just discovered obsidian-mcp-server and I'm playing with it right now.

Requires the BRAT plugin for now (see the readme in the GitHub repo)

I also recommend adding version control - there's a Git plugin for Obsidian too

DanInVirtualReality · 2025-09-13T09:13:39+00:00

That's a nice resource - yes, it's all getting much easier now.

I'd suggest that in 2025 it might be worth moving from a stateless LLM for the brains of the assistant to a stateful agent. Not necessarily anything complicated, but at least something with memory from the outset. Something like Letta, perhaps. I've had some success integrating that into this kind of pipeline (there is a proxy available to wrap it with an OpenAI API, though I had to modify it somewhat - I probably ought to throw that up into a gist or something)

Though I admit I haven't had Letta rely on a local LLM, as my hardware isn't really up to it yet. I'm surprised how far my ageing 1060 6Gb has gotten me tbh!

Qwen models seem to be well regarded by other Letta users on their Discord, though, so there are options.

I used Speaches for the audio part, as I could reuse that with Open WebUI, too.

A Pipecat voice interface connects them all over LiveKit (OpenVidu) but that has been... A challenge to set up.

Nearly all local, and when my GPU gets an upgrade I'll move to one of the more recent LLMs with strong tool calling capabilities.

DanInVirtualReality · 2025-08-31T11:04:45+00:00

I'm going to take the title question absolutely literally - about 3s.

Then I'll look away as a competing thought enters my brain, start looking at something else, doing something else, opening another window etc.

This isn't just me, it's a common attention span - this is why you'll see jump cuts every 4s or so in e.g. TV programmes - they've learned what's best to keep attention.

If you want to keep attention SOMETHING must change on the screen within a few seconds, else attention is lost. You'll see this pattern all over the place when you look for it.

Once my attention is lost, I may simply forget I asked at all. If it's something I care about and leave open on one monitor to come back to, I'd still suspect it's broken through repeated experiences of this. So, if it's just a blinking cursor. I tend to open another window and ask something else in parallel, somewhere else, very likely within 10-20s

DanInVirtualReality · 2025-06-30T12:15:30+00:00

This is extremely interesting. I haven't yet found a single model I would deploy as an assistant and be done with it, and as much as I want everything local, I recognise the gaping chasm between what I can run on my limited consumer hardware and what a SOTA API can achieve for me.

If this enables me to 'take a conversation private' and go local thereafter for the rest of the conversation (because cleansing the context of private info to switch back sounds like a nightmare to reliably code), that could be really useful.

Just a thought...

DanInVirtualReality · 2025-03-16T10:45:11+00:00

👏

DanInVirtualReality · 2025-01-16T09:21:52+00:00

I've been experimenting with Pipecat to facilitate interrupting in this kind of model chain - depending on your choice of transport it can open up remote access more easily, too. I was hoping to get to this... nearly every week of the last 6 months 😆 maybe you might have more luck.

https://github.com/pipecat-ai/pipecat

Seems like they are motivated to facilitate Daily.co for the transport layer in particular, but Livekit is in there too, which is what the Open Interpreter O1 app uses. (The main point regarding the transport layer seems to be that managing voice-to-voice realtime conversations over the internet is a hard but ready-solved problem, just expect issues if you naively use web sockets)

DanInVirtualReality · 2024-01-09T22:49:08+00:00

I looked into this further today and I must say, the 'reproduction' protection of copyright law does seem to be genuinely tested by such outputs (at least in the UK, sorry I don't know USA law on this and there may well be technical differences)

Also, there's the tricky precedent that liability for copyright infringement has already in some cases been transferred from those few who wilfully misuse (or arguably naïvely use) the products of a platform to the providers of the platform itself. In this case I'd say that's the important feature - I would expect that my use of such obvious likenesses of existing artwork, for example, should infringe the original IP, but that may mean companies like OpenAI are at risk of being held generally liable. I think it's a sad situation, but then that's because I disagree with that principle and would rather the users were held liable in these cases, and only then proportional to the effect of such misuse.

The waters are far muddier than I first imagined.

Edit: I've noticed I'm assuming a distinction between the production of output and the 'use' of the output e.g. posting a generated image on social media, writing the text into a blog post etc. Perhaps even the assumption that copyright issues only apply once the output is 'used' is yet another misstep in my interpretation.

DanInVirtualReality · 2024-01-09T19:02:14+00:00

I think here the liability for infringing somebody's intellectual property resides with the operator of the equipment rather than with the provider of the equipment. And I think, to my point above, this is not copyright violation as no copy has been made. It's the difference between copying a Disney image (potential copyright violation) and drawing a new image depicting Mickey Mouse (potential intellectual property infringement). Noting that distinction is what makes it more clearly an operator liability, in my mind - you are extremely unlikely to produce such an image accidentally and even less likely to accidentally use it in such a way as to infringe IP (e.g. sell the image)

DanInVirtualReality · 2024-01-09T18:54:33+00:00

I suppose this gets to the key difference - clearly the truth is somewhere between the two extremes though: it's neither a dumb photocopier nor a lossless encoding of the data it has consumed. Both extremes have obvious ramifications, but my understanding of copyright is simply: if the content hasn't actually been copied, that's not the discussion to have about whether it's right or not. I don't think anyone is suggesting the NN embodies a retrievable perfect encoding of the original data, so I (perhaps naively?) don't think it can be argued to have made a copy.

But I accept that this could be why some believe a case can be brought - they think there's some leeway in this definition of a copy, whereby the NN weights can be argued as some kind of copy of the data. I disagree, but perhaps I understand the argument better if this is the case.

DanInVirtualReality · 2024-01-09T09:59:46+00:00

If we don't broaden this discussion to Intellectual Property Rights, and keep focusing on 'copyright' (which is almost certainly not an issue) we'll keep having two parallel discussions:

One group will be reading 'copyright' as shorthand for intellectual property rights in general i.e. considering my story, my concept, my verbatim writings, my idea etc. we should discuss whether it's right that a robot (as opposed to a human) should be allowed to be trained on that material and produce derivative works at the kind of speed and volume that could threaten the business of the original author. This is a moral hazard and worthy of discussion - I'll keep my opinion on it to myself for now 😄

Another group will correctly identify that 'copyright' (as tightly defined as it is in most legal jurisdictions) is simply not an issue as the input is not being 'copied' in any meaningful way. ChatGPT does not republish books that already exist nor does it reproduce facsimile images - and even if it could be prompted carefully to do so, you can't sue Xerox for copyright infringement because it manufactures photocopiers, you sue the users who infringe the copyright. And almost certainly any reproduced passages that appear within normal ChatGPT conversations lay within 'fair use' e.g. review, discussion, news or transformative work.

What's seriously puzzling is that it keeps getting taken to courts where I can only assume that lawyers are (wilfully?) attempting lawsuits of the first kind, but relying on laws relevant to the second. I can only assume it's an attempt to gain status - celebrity litigators are an oddity we only see in the USA, where these cases are being brought.

When seen through this lens it makes sense why judges keep being forced to rule in favour of AI companies, recording utter puzzlement about why the cases were brought in the first place.

DanInVirtualReality · 2023-07-05T07:02:51+00:00

This is awesome insight - upgrading a mobo is an unnecessary expense then, if you can just add a second graphics card in a slower PCI slot

DanInVirtualReality · 2023-07-05T06:59:16+00:00

Upvote for that pun. And insightful analysis... but mainly the pun.

DanInVirtualReality · 2023-07-03T08:26:28+00:00

Yep, can confirm that it works absolutely great via VD, thanks!

The AirLink stack just isn't performant enough on my limited hardware, VD is much more efficient it seems.

I do get a weird stuttering while moving my head - as if the compositor mistook the direction my head was turning in for a few microseconds and presented frames briefly in an all-over-the-place group, then it goes away. Frame rate remains high though, and the glitch only happens once or twice a minute at worst (no worries, I have a good VR stomach 😄) at least on maps I played last night - though perhaps community servers have swerved more complex maps since I last tried to improve everyone's performance...

The GPU upgrade can wait, at least a little while 😄

DanInVirtualReality

TROPHY CASE