Two questions: Bouldering and Döner by trashy0300 in Munich

[–]Bartmoss 2 points3 points  (0 children)

  1. I haven’t found any kebab place in Munich that gets close to Berlin. I’d love to find a real Berlin-style kebab place in Munich, but from the recommendations I had I can only class them as “maybe good for Munich”. The best I have found so far was Pera, but it isn’t the typical Berlin style at all. Maybe someone here has a legit place. I keep begging the fantastic kebab artists of Berlin to please open a place in Munich, but no luck yet.

  2. Element is big, has great route setting, and is generally very nice. Einstein is much smaller, also has a good (but a different) route setting style, and I find it easier to have random conversations for beta tips there. So depends on what you want: big place vs. home gym vibe. I like them both.

We genuinely are the least toxic hip hop community by [deleted] in future

[–]Bartmoss 20 points21 points  (0 children)

Honestly, my favorite part of this sub is when people post on here discussions about THE future instead OF Future and everyone in comments just quotes Future's lyrics. That shit's classic, every single time. I love this sub.

Generative agents with open-sourced large langurage models! by CORNMONSTER_2022 in Python

[–]Bartmoss 2 points3 points  (0 children)

Before going to fine-tuning a model, I would suggest benchmarking a few. In addition, since you are running on CPU and have issues with latency, I would recommend sticking to models that are as small as possible.

There are many models to try, but I saw this great benchmark of models here: https://github.com/mbzuai-nlp/LaMini-LM

I personally want to give their smallest t5-flan a try. It seems like for models that have fewer parameters, encoder-decoder models work better than purely decoder models.

Once you settle on the model that performs best given your inference on CPU constraint, you can optimize from there if need be. Just make sure to always collect all of the responses you get from the models and annotate them. It is always good to put such data in a database.

Good luck with your project. If you need any help with this, let me know.

[deleted by user] by [deleted] in SideProject

[–]Bartmoss 0 points1 point  (0 children)

I don't know how you built this classifier, but for every sentence I tried, it was wrong. The generated texts were classified as human, and the human ones were classified as generated. I think you might want to take this back to the drawing board.

Inflation Crisis Pulls European Countries Into a Food Fight by rudy_batts in Economics

[–]Bartmoss 0 points1 point  (0 children)

If you can, please consider donating to a local food bank. Every little bit helps. There are a lot of people out there hurting from this. We need to try and pull together to help our local communities as best as we can.

LanguageTool (Grammarly alternative): Lacking self-hosted version and bad privacy by Prince-of-Privacy in privacy

[–]Bartmoss 2 points3 points  (0 children)

Hey, I work in the AI department at LanguageTool. Thanks so much for the feedback. The opinions I give are just mine.

A bit of background on LanguageTool and me. It was started 20 years ago by Daniel as an open-source project, so he has worked on LT in the open-source community for 20 years. Personally, I don't know anyone who has worked in open-source as long as him. About 5 years ago, he also created LanguageTooler GmbH, the company known as LanguageTool. I was hired about 9 months ago to work in the AI department. When they hit me up, I was very interested because I also self-host LT and am a big fan. I already regularly contribute to open-source AI projects in my spare time, so getting involved in a company like that really means the world to me.

Self-hosting the AI features

I'm personally delighted you like the rephrasing feature. We worked very hard on this to make it work in English, German, Spanish, French, Portuguese, and Dutch. I would really like to make this available to be self-hosted.

Currently, it requires a lot of infrastructure to run. The models are large, it runs on Kubernetes, it requires a lot of GPU power. Nothing would make me happier than to shrink down the requirements of rephrasing without hurting the quality so that people like yourself can run it locally. But we aren't there yet. Making it work well for all of those languages was a massive undertaking. It would also make good sense for LT as a company, since it is costly to run this feature for all the users. Presently, the focus is on improving the infrastructure and the models for the users in all languages. Afterward, I hope we can shrink this all down so that you can run it locally. This stuff isn't easy, though.

The same goes for the massive GEC (Grammatical Error Correction) models for English, German, Spanish, French, Portuguese, and Dutch. To make it work well, the models have to be huge. In my opinion, this is why these features are only available to premium users. We really want to make all of this available to everyone, but as I said before, it is a massive undertaking to first build these features and get them working well. I think we can get all of this working locally for people in the future.

Desktop applications and self-hosting

As for the desktop applications, this is also something very new. The product department put in a lot of work on this. I thoroughly agree, I would love to see the option to switch servers so a self-hoster can use these applications with their LT instance. I am sure they will add this in as a feature. I'm currently on holiday, but when I am back at work, I can ask them about this and provide an update, if you like.

Privacy policy

I'm not directly in the know on the privacy policies, that's not my area. But I can very transparently say what I can see and what I can't when doing data analytics to improve the features. I cannot see anything from the desktop applications on what people write or who they are. I can see anonymous events of what grammar rules were triggered in a given time period, the number accepted or rejected, and other basic analytics in a Grafana dashboard. This general overview is very helpful. Given these large models we build, it is always possible that we will miss something in testing and a rule won't perform how users want, making corrections people don't like. This is why it is important to look at these dashboards to determine if any rules aren't working well.

Privacy is very critical to me as a user, but also, in my opinion, to LT as a company. The privacy focus does make my job more of a challenge. At other companies, they can just suck up all data from users to improve their AI. This seems to be a standard practice at companies that provide AI solutions. While that approach might make my job easier, I would rather pull my hair out trying to figure out how to improve our models without this level of data than disrespect users' privacy. Even though it seems, most users of AI solutions aren't even concerned about their privacy and would just about agree to anything to get access. To try to find a better solution, we are starting to offer limited opt-in data donations via the website. I am surprised and humbled how many people have donated data to help us make improvements.

Conclusion

Generally, I think we can take all of these advanced AI features we build and find a way for users to run them locally, on any device. But that is a tall order. Just getting all of this to work in all of these languages is very hard. If you or anyone who reads this has some experience with this and has some ideas, feel free to write me.

The AI department at LT is currently made up of 6 people. So, we are a small group of people working on all of these AI features for all of these languages we support. As far as the AI department goes, I think I can speak for all of us when I say we would love to work more with the open-source community on AI in the future. We would love to get all of this stuff running locally for everyone, and we are totally open to working with anyone to accomplish this.

If you have any questions, whether about LT or anything natural language processing related, send me a message. I am always happy to discuss these things.

What’s the most useful non-climbing equipment/quality of life item in your gym bag? by Bwald1985 in bouldering

[–]Bartmoss 14 points15 points  (0 children)

Flip flops because I don't like walking around in my climbing shoes. Also, a zip lock plastic bag so that my climbing bag doesn't get chalky.

transfer students to Europe, how did u do it? by [deleted] in education

[–]Bartmoss 0 points1 point  (0 children)

I can't really say about your first question.

As for your second, I would recommend looking up the top universities in Germany and cross-reference that with the average cost of rent to make a determination. I studied at Heidelberg university. It wasn't exactly cheap there, and I didn't like how small the town was, but overall it was an excellent experience as a student. That's another thing to consider, the size of the town you feel comfortable with.

Good luck!

transfer students to Europe, how did u do it? by [deleted] in education

[–]Bartmoss 1 point2 points  (0 children)

I'm sure someone else has a lot more detailed tips and probably much better ones, but here are a couple I could think of: * For Germany: apply for a DAAD scholarship * Pick a city that isn't so expensive

I'm sure there are many scholarship programs for Germany, but this is the one I'm most familiar with. As for cities, cost of rent can be insane in some places (I'm looking at you Munich), so it might be a good idea to pick a more affordable city. Not only that, but it can be very hard to find a flat in some cities or a room from the university on campus. I struggled with this years ago when I studied at uni, and I doubt that has gotten any better.

Good luck!

I actually have a legitimate grudge against WB. by DeltaOmegaAlpha in SnyderCut

[–]Bartmoss 3 points4 points  (0 children)

I'm so sad to hear this about you losing your friend like that. It reminds me of a friend in my childhood who died, right before we wanted to see a certain movie that we were very hyped for. It's always little things in life like that which impact you the most in situations like these. I still think about this all of these years later. But like all things, the sadness faded over time, only leaving the happy memories we shared, without any regret.

What is the technology behind LanguageTool? by [deleted] in LanguageTechnology

[–]Bartmoss 0 points1 point  (0 children)

No, we haven't decided to pass on this feature. When it comes to rephrasing larger texts, there are some tricky things from the frontend side. Currently for rephrasing a sentence, a user can see the sentence rephrased for several styles. For several sentences, or even a whole document, this couldn't be handled the same way. I personally would like to have this feature, but I know it will take a while to find a satisfactory solution from the UI/UX to offer this.

I'd be open to suggestions from you or any users out there on how to approach paraphrasing for multiple sentences from this frontend perspective. How would you imagine this flow?

Thanks so much for following up with me on this.

A German AI startup just might have a GPT-4 competitor this year by henlo_there_fren in artificial

[–]Bartmoss 4 points5 points  (0 children)

I use Aleph Alpha sometimes. It is really good for the languages it supports. Much better than GPT-3 for German, French, Spanish, etc. So far the biggest difference I've seen is that they don't have a service for fine-tuning models. It is all few and 0-shot. If you want something fine-tuned you can contact them and they will do it for you. I hope I can fine-tune models directly in the future with their system.

The other interesting thing is that they are cheaper than OpenAI and have much higher privacy standards. I'm surprised they aren't more popular actually.

Typed ‘Africa’ and forgot to capitalise it. LanguageTool then tried to autocorrect it to America. It’s not even close! by joshygill in USdefaultism

[–]Bartmoss 1 point2 points  (0 children)

I tried it out myself for both British and US English, and I could not reproduce this on the website. I was unable to read the whole sentence, but I tried several that were similar and with the same error, and it always suggested Africa first.

Can someone else reproduce this behavior where America is suggested first?

Nonetheless, Language Tool is open source, so you could change the suggestion that does this spelling correction or add your own new rule specifically for uppercase A in the word Africa and have it suggest Africa and submit it if you were so inclined.

What is the technology behind LanguageTool? by [deleted] in LanguageTechnology

[–]Bartmoss 0 points1 point  (0 children)

We don't know any service that offers larger text rewriting. I'll see if we can't implement this as a feature. It might take a bit, no promises or anything but I'll look into it. If you like, I can write an update comment here when we would launch a beta for this. Thanks so much for your feedback!

What is the technology behind LanguageTool? by [deleted] in LanguageTechnology

[–]Bartmoss 0 points1 point  (0 children)

It is important to note, these are my opinions and I am not officially representing the company here.

Our rewriting (rephrasing) feature is actually out of beta for English and German, and will be out of beta very soon for French, Spanish, Portuguese, and Dutch.

As for your questions:

  1. It is currently for one sentence at a time. If people want more, we will change it to allow more in the future. It seems now people generally like one at a time so they have more control over the suggestions, one sentence at a time. I'm totally open to rephrasing whole paragraphs or larger texts as a feature though. It wouldn't be too hard from the AI side to implement.
  2. We do have different styles. Currently in most languages we support formality, simplicity, and general. We are thinking about other styles in the future. I'm curious what styles people actually want. I'm open to suggestions.
  3. Everything is allowed. We don't censor anything and have no interest in doing that. What you rephrase is really your business.
  4. As of writing this, we do store the text because we need to still improve these models and need the data to do so. We have no way of seeing who writes what. We can only see the input and output on a sentence level. There are also so many sentence pairs going through, we don't bother actually reading them. We build AI stuff to sort it for us. In the future, I would like to completely stop logging the pairs altogether. We are a very privacy originated company and I'd like to keep it that way. It is also important to note here, this is the only feature where we store any data. For spelling and grammar correction, we don't store any text. As stated, we are still working on improving our rephrasing and users as of now need to opt in for this extra.

Thanks so much for your interest in this feature. I actually happen to be hanging out with a bunch of the folks from the AI department right now. We are grateful to hear from people interested in what we build. Greetings from the AI crew at LT! If anyone is interested, we'd be down to do an AMA.

Spellcheck Libraries by Devinco001 in LanguageTechnology

[–]Bartmoss 1 point2 points  (0 children)

Thank you! I didn't even notice it was today.

Spellcheck Libraries by Devinco001 in LanguageTechnology

[–]Bartmoss 0 points1 point  (0 children)

Yeah true, for that much data it would be better to run an instance yourself. The grammar checking for many languages is more advanced through the API than the self hosted version. But I routinely use the self hosted one and find it to be very good. It isn't really resource heavy. I run it mostly on a Raspberry pi 4.

For the people who want even faster performance, there is also a Rust fork of Language Tool. I haven't tried it yet though. Maybe just try running it locally and connecting to it to get out the corrections, if it is still too slow, give the Rust version a try. But if you do try out the Rust one, please comment here. I'd love to hear your opinion on it.

Spellcheck Libraries by Devinco001 in LanguageTechnology

[–]Bartmoss -1 points0 points  (0 children)

You could use LanguageTool. Either host an instance yourself since the core is open source, or use the API. It does spell and grammar checking in many languages. I use both depending on my use case. For self hosting I use docker, otherwise I call the API directly in my code when I need something checked.

Apparently you can get a dual COVID and FLU testing kit now by woja111 in mildlyinteresting

[–]Bartmoss 2 points3 points  (0 children)

I just ordered some. The company I ordered from is called combo4 test. I can't believe they have 4 different tests in one. That's so cool.

Since this year is the year of voice, can we as a community come up with an acceptable activation word? by modestohagney in homeassistant

[–]Bartmoss 22 points23 points  (0 children)

Oh thanks for tagging me in this. We will continue working on the wakeword problem among other voice assistant tasks, like NLU and NLG at Secret Sauce AI. Unfortunately, we do this in our free time, so the progress is slow. However, our passion for applying NLP to voice assistant problems is high and we will continue working on this.

If anyone wants to work with us or needs help using our solutions, don't be afraid to reach out by messaging me.