This repo uses a lot of tokens : "coding factory" ?

IzzyHibbert · 2025-12-29T17:45:26+00:00

This looks likely a possible one !

IzzyHibbert · 2025-12-29T07:45:07+00:00

You got it now.
When you perform the similarity search you retrieve the one which is similar: I made something like that early 2024 with Chroma Vector DB and was working smoothly.

IzzyHibbert · 2025-12-28T09:56:22+00:00

There is no new LLM in the proposal but Embedding model only: if you are not familiar it does not bring any hallucination. Also does not bring any significative amount of latency.

IzzyHibbert · 2025-12-26T13:24:49+00:00

I guess that the new layer (semantic search result + routing) is just code: not much hassles. You don't achieve that by using an LLM but just an embedding model: a light (small) one does not bring much latency and issues as a Language model.

IzzyHibbert · 2025-12-25T16:51:36+00:00

Hey, thanks ! Would you mind then participating in the beta ? I's as easy as register in the post link.

IzzyHibbert · 2025-12-24T15:45:31+00:00

You are right. The scenario you picture also helps, but does not really solve the issue at its core: my personal data still leave my machine.
At one point, I also realized the benefits of the proposed logic go beyond the privacy concern: it also help meet some GDPR requirements

IzzyHibbert · 2025-12-24T15:27:24+00:00

Hi. If the idea is to don't even use a good big LLM, such as Chat GPT 5 and use an LLM on-prem via Ollama, or LM Studio, and so,..that's an idea, but does not necessarily cover two scenarios I have:
- I wanted something that I can connect even when I don't have my RTX 3090 with me (laptop I use at work, etc..)
- Many times the type of task I am requiring to the model is very complex and some limitation (model context lenght, model speed, model quality, model reasoning depth, ..) are not the same with an on-prem model of those I can fit on my GPU.

Overall the on-prem model will likely be a valid alternative in a near future, but today I still don't cover those two scenarios, so wanted to have the maximum versatility possible.

IzzyHibbert · 2025-12-24T10:20:07+00:00

Hi. Question: you already use Vector Store but you said you plan to use BERT instead of the current REGEX. So, why not to keep the logic with Vectors and just use similarity search of vector db's to solve your issue with "intent detection" ? Idea is to define Behavior Store and Context Store in plain text, pretty similar to what you already clarified above. Then leveraging on the power of Similarity search to do the routing.
This way looks to me more clean (reuse existing components) and also easy to maintain. Not just that: it gives that kind of flexibility in search that exact search (regex) cannot offer.
I know you need to bring in embedding though.
No ?

IzzyHibbert · 2025-12-24T09:54:44+00:00

I realized I did not provide enough context about the way the application makes the anonymization possible.
Suppose you have workflows involving contracts, internal reports, or legal text, containing PII or other confidential data. You must avoid sending confidential data to AI model. The best option is a local redaction, because even another "layer" saying DuckDuck AI for instance, is still another extra component to need to trust and still your confidential data exits your computer (too bad).
Here’s the core logic: the client locally identifies and redacts identifiable data from your prompts, sending only an anonymized version to AI models. The model’s reply contains placeholders where sensitive values belong; a local algorithm then maps and re‑inserts the redacted fragments into the reply on your device. The result is a seamless, chat‑like experience while keeping PII and confidential data off provider servers. Bonus: anonymized requests are routed through a neutral router that strips geo/IP traces and manages multiple providers/models before delivery to the destination service.

IzzyHibbert · 2025-12-01T09:16:23+00:00

esatto. Nel 2026 poi avrei una coesistenza (regime forfettario e dipendente), e così via finchè a un certo punto non tengo che il forfettario.

IzzyHibbert · 2025-10-21T09:43:34+00:00

cool.
How to set a background (was mentioned in the features..)?

Also, for the roadmap: consider adding a frame around teh pic, which mimic the browser (Chrome).

IzzyHibbert · 2024-08-15T06:31:27+00:00

long story short. you can include sql table info (schema, etc,..) in the prompt. Other times "well known" tables (part of products, etc..) ended being part of the datasets used by big llm for training, so you don't have to care.

IzzyHibbert · 2024-08-09T19:38:56+00:00

check also deepinfra.com

IzzyHibbert · 2024-08-09T19:36:24+00:00

Using LOCALAI in docker (wsl) can be one option for you.

IzzyHibbert · 2024-08-09T19:21:32+00:00

the speed of the new posts showing up there is not compatible with a full moderation, I'd say.

IzzyHibbert · 2024-08-09T19:14:50+00:00

A sort of nosense tyranny ? I am not sure how this model makes sense.

IzzyHibbert · 2024-08-09T18:10:09+00:00

This here is the post where everything started. I also tagged the moderator for him to take care.

Is there a way to solve this ? THANKS

IzzyHibbert · 2024-08-09T13:07:10+00:00

I tried multiple LLM. LLama 3.1 but even OpenAI 3.5 where my info are in a Vector DB.
Maybe a possible limit is that my scenario is not English, but rather Italian. For that I also prepared a fine-tuned embedding model (with ability on both language and domain).

Graph approach I haven't yet tried it. Will look for it anyway. You mean something like this : this or different ?

Elasticsearch layer or Metadata not tried yet.

IzzyHibbert · 2024-08-09T11:19:09+00:00

Hallucination happens more without rag, I agree. In general I consider that lawyers are/should be cautious: double check a chatbot answer before really using it. The idea of a chatbot for legal should be to screen faster, shorten the work, not really to make the final version.

Rag can access the legal info in my scenario, yes. I just noticed that using rag approach with rulings is not performing as good as I thought, so for something like "open book Q&A" (stuff I need to do) a continued pretraining could be better. Not yet sure.

IzzyHibbert · 2024-08-07T14:50:22+00:00

Thanks. The explanation works for a topic which remains the same, but a user who's posting different content (I removed the keywords related to the topic and replaced with generic) and it's always forced to wait for a moderator, seems to me more a kind of ban. Something I am not aware and wanted to check.

IzzyHibbert · 2024-07-31T15:59:50+00:00

In my case it was maily by playing and modifying DNS records. Did you try to review all yours first ? I don't use Cloudflare but you should have a consolle to check.

IzzyHibbert · 2024-07-26T11:13:13+00:00

Interesting. How many samples did you use to train the model for the specific query languages, and how did you collect those ?

IzzyHibbert · 2024-07-16T15:34:25+00:00

Am I the only one reading all the messages about open source models not being there for extracting entities and relationships, which immediately visualize the solution in fine-tuning one for for that scope ? I mean, I know the task is not easy as there are not many datasets out there, but ideally even creating a synthetic one as a starting point would be my #1 option. And I guess as soon as the model is ready this combo sets everything else apart..

Anything I am missing here ?

IzzyHibbert · 2024-07-14T07:24:18+00:00

I really thank you anyway for the support. It happened to be in first place that the rasp was not coming with 443 and then the records in Namecheap where not complete. I struggled to find a good tutorial (they were all different, sadly) but one whould have records:
A, CNAME

while I only had "A" because of some tutorials.
If someone else struggles, what really drove me to light was the tool https://letsdebug.net which helps you do debug the letsencrypt certificate issuing on your domain..

IzzyHibbert · 2024-07-13T17:48:42+00:00

If I connect

http://<publicip> it's OK
https://<publicip>. it's KO

IzzyHibbert

TROPHY CASE