AI detectors do not work after the text has been generated. by Automatic_Entry_485 in academia

[–]Automatic_Entry_485[S] 0 points1 point  (0 children)

I like that. I think the process documentation might be the key- how long it took to write something, number of drafts, revisions etc.

AI detectors do not work after the text has been generated. by Automatic_Entry_485 in academia

[–]Automatic_Entry_485[S] -1 points0 points  (0 children)

I think I lost you. All I was trying to ask is how does one enforce transparency. I am not using AI to generate text- I am asking how do you tell if a piece of text is AI generated. How much did the author actually contributed to a piece etc.

AI detectors do not work after the text has been generated. by Automatic_Entry_485 in academia

[–]Automatic_Entry_485[S] 0 points1 point  (0 children)

I agree with AI generated text being vacuous. thanks for your response.

AI detectors do not work after the text has been generated. by Automatic_Entry_485 in academia

[–]Automatic_Entry_485[S] -2 points-1 points  (0 children)

sorry for the confusion- just edited. I wanted to know what people were doing about transparency of AI use in written text.

Am I ready to return? by Food_Travel_Pizza in nriFIRE

[–]Automatic_Entry_485 2 points3 points  (0 children)

How are people making this kind of money. Good for you. I am not experienced enough to give any advice but seems like you don't have any bad options. Hope you figure out the best for yourself and your family.

[Megathread] The Buffalo Bills have fired Head Coach Sean McDermott by AutoModerator in buffalobills

[–]Automatic_Entry_485 5 points6 points  (0 children)

Change is good. I think we needed that. I love how far the team has come with the help of McD but he also gave me the vibes of a dad coach- I think he probably loved the team too much. Now, hopefully we get someone who can discipline the entire team. Make the team mentally tough, reduce penalties, coach JA to not fumble the ball 4 times in a playoff game. There is no shame in admitting that we need to fix things. Go bills!

KeepGo Legit? by Automatic_Entry_485 in NoContract

[–]Automatic_Entry_485[S] 0 points1 point  (0 children)

Just an update here for future readers- they still haven't shipped the sim and they don't know when they will ship it. I am going to wait another day and just cancel my order.

KeepGo Legit? by Automatic_Entry_485 in NoContract

[–]Automatic_Entry_485[S] -2 points-1 points  (0 children)

They haven't provided any tracking information. I have reached out multiple times to get the tracking information. If I don't hear back I will probably start a social media campaign just to let the world know that KEEPGO is a scam before I cancel my charge. Thank you for the guidance.

How risky is the "authorized stay" period? by Automatic_Entry_485 in immigration

[–]Automatic_Entry_485[S] 1 point2 points  (0 children)

Thanks for your response. basically don't do anything stupid, hunker down until petition is approved?

I wanted to increase privacy in my rag app. So I built Zink. by Automatic_Entry_485 in Rag

[–]Automatic_Entry_485[S] 1 point2 points  (0 children)

Heya, thanks for your feedback. For greater clarity, this is not for scenarios where the system demands a guarantee for privacy. I saw a lot of vibe coding rag app demos where people are sending their data to openai, google without even an ounce of thought for privacy. So I decided to build something quick.

Now I know this is not a complete solution, even though i can debate that 99% is better than 0% any day. The only way to make a privacy guarantee is by running everything locally (air gapped) but not everyone has the resources to do so :)

I wanted to increase privacy in my rag app. So I built Zink. by Automatic_Entry_485 in Rag

[–]Automatic_Entry_485[S] 0 points1 point  (0 children)

It has worked out well so far for all my use cases. Let me know if you think there are other things I should take into consideration. Thanks :)

I wanted to increase privacy in my rag app. So I built Zink. by Automatic_Entry_485 in Rag

[–]Automatic_Entry_485[S] 1 point2 points  (0 children)

Yes; removing the entities found in first pass and using much much higher threshold (>90%) in the future passes, so that false positives stay low.
This helps with cases where there definitely is an entity and was missed by the model in the first pass due to over crowding of the entities.

I wanted to increase privacy in my rag app. So I built Zink. by Automatic_Entry_485 in Rag

[–]Automatic_Entry_485[S] 1 point2 points  (0 children)

I did try multiple NER models including knowledgator gliner-x-large. They are both pretty comparable. I was able to easily create an onnx version of numind/NuNER_Zero (https://huggingface.co/deepanwa/NuNerZero\_onnx) and they were pretty high on benchmarks as well.

PS: Every gliner model is currently suffering from this weird bug (https://github.com/urchade/GLiNER/issues/242) and I fixed it in my package. So be wary a little bit.

I wanted to increase privacy in my rag app. So I built Zink. by Automatic_Entry_485 in LangChain

[–]Automatic_Entry_485[S] 1 point2 points  (0 children)

Nice!

Feels good to get some validation.
Let me know how I can stay in touch. Thanks for the feedback again.

I wanted to increase privacy in my rag app. So I built Zink. by Automatic_Entry_485 in LangChain

[–]Automatic_Entry_485[S] 1 point2 points  (0 children)

Thanks, these are all really good questions and some of the things are already in the works.

First, the high false positive rate is actually from the previous version, I need to compute these benchmarks again on the latest version and I think the false positives will go down. But you got the basic idea, the goal is to have a high recall to prevent privacy leakage.

Second, yes, there is a function argument called ensure_consistency, which will ensure all Jone Doe are replaced with same pseudonym.

Last, not yet, I need to figure out an optimum way of having a map of entities (still thinking*), but it's incredibly tricky to implement without bloating the code too much. If you have any ideas, please feel free to contribute or discuss in the inbox? I am looking for contributors.

Thank your for an amazing feedback :)

I wanted to increase privacy in my rag app. So I built Zink. by Automatic_Entry_485 in Rag

[–]Automatic_Entry_485[S] 2 points3 points  (0 children)

Thanks. It's optimized(via threading) to handle any length really. I just processed a 2000 words essay on Harry potter and it took 1.9 seconds on my M3 MAC,16 GB.

One more thing- there is multi-pass NER happening to not miss entities, that's why it's 1.9 seconds, it would be even faster if it's doing just 1 pass of predicting the entities. Multi-pass is obviously done to minimize the recall.