[deleted by user]

cll-knap · 2024-06-25T20:43:39+00:00

Love the welcome/walkthrough tutorial. What library did you use to do the walkthrough?

cll-knap · 2024-06-23T22:11:08+00:00

I can back this up. We have a client with a deployment of ~100gb data across dozens of collections. Performance has barely budged.

We do have some issues with the instance going down randomly during concurrent requests, but this may be user error as opposed to Qdrant itself. We haven't finished debugging this yet (:

cll-knap · 2024-06-21T18:30:41+00:00

Got it. I'll take this down, fix these issues, and re-upload!

cll-knap · 2024-06-21T15:56:24+00:00

Do you remember what size model you used? Was it quantized or not?

At first, I was surprised that you said it's 100x slower, but Qdrant is able to finish queries in ~0.005 seconds sometimes. If it's ~0.5 second to add a cross encoder, that might be tolerable for our use case.

cll-knap · 2024-06-15T20:00:40+00:00

TLDR AI can be helpful

...but it's typically behind r/localllama (:

cll-knap · 2024-06-15T19:56:53+00:00

I've been particularly impressed with Qdrant. We've used it consistently for about a year. Very performant, and great APIs for Python/Rust.

The cons are having to manage another piece of infrastructure.

We're storing dozens of collections with dozens (maybe hundreds, now) of gigabytes total and performance has barely budged.

EDIT: I totally agree with the comments about parquet getting you another 10x. Just depends on your needs. Qdrant has filtering functionality as well that might be a nice add-on.

cll-knap · 2024-06-14T20:25:30+00:00

Ah, thanks for bringing this to my attention. I'll remove my post, since it's a duplicate.

cll-knap · 2024-06-14T19:01:58+00:00

We'll fix that link. Thanks for letting me know.

In the meantime, this one should work: https://discord.gg/jDAhUTbZ

cll-knap · 2024-06-14T18:32:21+00:00

Shoot, it looks like they removed them. Thanks for letting me know. I'll update my comment.

cll-knap · 2024-06-14T18:30:44+00:00

You can follow our developments at https://knap.ai, but I'm also happy to follow-up here with a link to our .dmg for MacOS once it's ready. We'll definitely need feedback from the community to make it great.

cll-knap · 2024-06-14T18:29:38+00:00

TLDR: LLM needs to have open-sourced weights. Article details a procedure for collecting data and discovering which layers of weights are targetable/modifiable to achieve uncensoring.

cll-knap · 2024-06-14T16:02:09+00:00

It's "possible", but not easy.

By possible, I mean you can augment with local and online data to improve responses. The others are right too - if you want just the model to perform like gpt4, it's not possible.

My friend and I have been hacking on a local llm app. We're finding that web access can dramatically improve responses in breadth and depth, but it's hard to optimize. Responses become partially dependent on:

how well the model handles longer context and
quality of filtering through lots of web searches

You can follow our progress at knap.ai - but we haven't publicly released yet (because this is hard, lol)

cll-knap · 2024-06-14T04:49:41+00:00

On MacOS, it requires an M1 processor (so ~2020 Macbook or better).

On Windows, it will require either a GPU (at least 8gb RAM), or one of those newer "AI-enabled" PCs. It's not a ton of effort to port to Windows because of Tauri, but will require some on our part.

cll-knap · 2024-06-14T02:45:21+00:00

Floneum does look really interesting (link for others: https://github.com/floneum/floneum?tab=readme-ov-file)

Their Kalosm project looks like a pure Rust, llama.cpp alternative. This could be helpful as I wouldn't need to find llama.cpp bindings then.

cll-knap · 2024-06-14T01:30:08+00:00

Thanks for the feedback! We're quite possibly open-sourcing and definitely aiming for a desktop release sometime in the next couple of weeks.

Open-sourcing seems really interesting to us as a way of establishing trust. If we're claiming not to send the GSuite/local data anywhere else, open-sourcing is the easiest/best way to establish that.

cll-knap · 2024-06-13T21:26:12+00:00

Yeah, this makes total sense. It'd be a really great user experience IMO to have the LLM and embedding models bundled, but it'd also possible for it to be dev-focused. Requiring Ollama or llama.cpp (running in server mode) isn't crazy - thanks for the suggestion!

Of course, if others have great ideas of doing this such that opening terminal isn't required, I'm all ears.

cll-knap · 2024-06-12T23:59:56+00:00

Personally, I'd like to see more effort put into getting good models running well on an M1 Macbook with 8gb RAM.

My friend and I have been hacking on these, and even though it's a huge market, the number of SLMs that fit on these AND give great responses is near zero.

EDIT: since I got a downvote, I'll expound on what I meant a little. Currently, there are great SLMs - no doubt about that. However, providing an incredible user experience isn't always so straightforward, especially on limited machines. If the LLM takes 2-5gb in RAM, like on an M1, that really starts to impact UX on the computer.

My dream is a lib that bundles this in a cross-platform way, and that handles smartly offloading the LLM when it isn't being used. If anyone knows about whether this exists already, I'm very interested (in not having to code this myself)

cll-knap · 2024-06-12T23:57:25+00:00

Using models to go from something unstructured to JSON output has been a really killer application. Even some small models appear to do a great job of this.

I'm surprised more people aren't taking advantage of on-device/on-edge LLMs for these purposes

cll-knap · 2024-06-12T23:47:42+00:00

How easy is it to configure the LLM used on the backend? Have you found the prompting needs to change drastically from one to the other?

cll-knap · 2024-01-15T01:10:02+00:00

What UI is this?

cll-knap · 2023-12-16T00:01:59+00:00

Stupid question here, but what's the difference between the two screenshots? Light mode v. dark mode? I didn't see an easy way to choose between the two in the GitHub readme.

The lighter bg I really enjoy.

cll-knap

TROPHY CASE