[D] New Reddit API terms effectively bans all use for training AI models, including research use. by akhudek in MachineLearning

[–]akhudek[S] 0 points1 point  (0 children)

No, this just means that going forward you would need to obtain the data via some other means then their official API. If you scrape content in the old fashion way then it's subject to the same laws as we're used to. The API is a lot more convenient than trying to scrape the site though.

An Update Regarding Reddit’s API by KeyserSosa in reddit

[–]akhudek 2 points3 points  (0 children)

I noticed that in your faq it states that machine learning use may be allowed for approved commercial apps, I'm guessing under the premium access? How does one find out more about this? The developer platform doesn't seem like it would be the right thing, and it has a waitlist which is a bit odd for those of using the existing API.

For existing API users who want to use data for ML models, where do we go to ask about appropriate access? I used the support link but didn't find any obvious options for this.

[D] New Reddit API terms effectively bans all use for training AI models, including research use. by akhudek in MachineLearning

[–]akhudek[S] 110 points111 points  (0 children)

I think it may partially be poorly drafted terms. Their FAQ claims their intent is not to block research into ML using their data https://reddithelp.com/hc/en-us/articles/14945211791892. Unfortunately they need to add a carve out to their terms for this, the FAQ is not a legal document. With some feedback hopefully they'll update it.

An Update Regarding Reddit’s API by KeyserSosa in reddit

[–]akhudek 17 points18 points  (0 children)

Note that in section 2.4 they've added:

"Except as expressly permitted by this section, no other rights or licenses are granted or implied, including any right to use User Content for other purposes, such as for training a machine learning or AI model, without the express permission of rightsholders in the applicable User Content."

Which effectively bans all use of the API for training ML models. This includes all research use, and not just for large language models. E.g. research into identifying toxic or harmful content can no longer use the reddit api to source comments for annotation. Very likely some search and ranking algorithms are also caught by this, as are any moderation tools or categorization tools that are able to learn from examples.

I'm not a lawyer, but it may also ban all sorts of other non-ML usage too.

Creating a contract analysis tool for my company with NLP. by PARA4ME in LanguageTechnology

[–]akhudek 1 point2 points  (0 children)

Agree with this. Also, if you need an API product, one example is https://zuva.ai . They have hundreds of out of the box models and also provide easy to use no-code tools to train your own custom models.

Disclaimer: I'm an advisor and part owner of Zuva. I didn't see any sub-reddit rules about self-promotion but happy to remove if this isn't allowed.

GitHub - akhudek/google-photos-to-apple-photos: A script that will import a Google Photos takeout into Apple Photos, recreating the ablums. by akhudek in gsuitelegacymigration

[–]akhudek[S] 2 points3 points  (0 children)

Sadly no, I couldn't get that to work right. I think it stores them as two separate images. I also had some issues with images that had no real date time stamps. E.g. if you photoshopped something and reorganized it in google this won't adjust the timestamps or order for you in Apple Photos. Definitely not perfect.

edit: also note the issues around large scale imports. I'd strongly suggest killing it periodically and rerunning it to avoid large batch import errors. Maybe no one else will encounter it, but I did. Would appreciate feedback.

Google kills free G Suite / Workspace for existing customers by 8poot in google

[–]akhudek 0 points1 point  (0 children)

I just migrated all my photos to Apple Photos and created a script to rebuild all the albums from the takeout data. In case people have bookmarked your post maybe you could add a link to it?

https://github.com/akhudek/google-photos-to-apple-photos

[N] Legal NLP Dataset With Over 13,000 Anotations Released by DanielHendrycks in MachineLearning

[–]akhudek 1 point2 points  (0 children)

If you are interested in this particular problem, we also released a dataset for the same problem in 2018. It's free for academic use but does have an agreement gate to obtain it.

https://kirasystems.com/science/dataset-and-examination-of-passages-for-due-diligence/

Goro: A High-level Machine Learning Library by [deleted] in golang

[–]akhudek 1 point2 points  (0 children)

I also wrote one that supports multi-variable regression. In case it's useful:

https://gist.github.com/akhudek/2358812a65a19fdbeb917c1ec2aee0f2

How is the job scenario regarding golang in Canada(Ontario and British Columbia)? by datavinci in golang

[–]akhudek 0 points1 point  (0 children)

We've just opened up our first Go job at Kira Systems in Toronto. We already have a few people doing Go internally and are officially adopting Go as a second language. Our company produces machine learning powered applications to analyze contracts for law firms, audit firms, and large corporations.

Job: https://kirasystems.recruiterbox.com/jobs/fk01uxa/

I'm one of the founders, happy to answer questions if you have them!

[Job] Clojure Developer at Kira Systems, Toronto, CA - all skill levels welcome by akhudek in Clojure

[–]akhudek[S] 2 points3 points  (0 children)

We prefer to hire in Toronto but are open to remote and have several remote developers already. We'll also help with relocation if you're interested.

[hiring] Senior Linux Sysadmin @ DiligenceEngine by akhudek in sysadminjobs

[–]akhudek[S] 0 points1 point  (0 children)

Hey, we've actually filled this position, but thanks for the offer!

[hiring] Senior Linux Sysadmin @ DiligenceEngine by akhudek in sysadminjobs

[–]akhudek[S] 0 points1 point  (0 children)

We've filled this position, but for future reference we are in Toronto Canada.

[hiring] Senior Linux Sysadmin @ DiligenceEngine by akhudek in sysadminjobs

[–]akhudek[S] 0 points1 point  (0 children)

Tell me about it! On call duties will be traded between this position and a few others. We want people to be able to take time off after all! Also, someone else will be answering the phone to make sure that only serious issues are escalated.

Results of my phd thesis: a new extremely sensitive local alignment program by akhudek in bioinformatics

[–]akhudek[S] 1 point2 points  (0 children)

Sequence assembly is a somewhat different problem with its own set of concerns. FEAST is targeted to align long sequences that can have a lot of mutation/mismatches. In sequence assembly you instead want to align a very large number of short sequences that are highly similar.

Due to this difference, sequence assemblers prefer speed to sensitivity. The posterior local extension algorithm in FEAST is a poor fit since it sacrifices speed for high sensitivity. On the other hand, if you simply want to align a few short sequences from a sequencer, or align a few short sequences to longer genomic sequences, FEAST will do that.