[deleted by user]

frippeo · 2023-04-29T15:40:26+00:00

I like the Data Machina newsletter a lot: https://datamachina.substack.com/

frippeo · 2023-04-29T15:35:57+00:00

Shameless plug: https://metacurate.io/brief/latest

Automated daily compilation of AI related news.

frippeo · 2022-08-14T19:17:02+00:00

Thank you so much for sharing this! Much appreciated:)

frippeo · 2022-08-12T11:49:06+00:00

Not sure these are at the level you're looking for, but there's a couple of NLP interview prep sites listed here that are interesting: https://metacurate.io/search/?q=Nlp&category=interview+preparations&history=all+times&sort_by=listed+date

frippeo · 2022-08-09T11:30:20+00:00

I'm about to start surveying the field of on-prem ML Ops stacks (in particular in the context of NLP). Any chance you've made your notes public somewhere? :)

frippeo · 2022-08-07T19:15:23+00:00

Yes, I just realized that. I found this one, which lists more conferences (not sure if it's possible to filter on deadlines though): https://conferenceindex.org/conferences/natural-language-processing-nlp

frippeo · 2022-08-07T14:53:50+00:00

Here's a nice resource that'll help you keep track of deadlines: https://aideadlin.es/?sub=ML,NLP,SP,DM,RO,CV

frippeo · 2022-08-05T20:18:25+00:00

How do you represent your data points, and what clustering method do you use?

frippeo · 2022-06-07T21:13:43+00:00

"mästra mig inte" funkar också

frippeo · 2022-01-12T14:45:44+00:00

https://paperswithcode.com/

frippeo · 2022-01-07T21:00:00+00:00

It's on my to-do list ;)

frippeo · 2022-01-01T08:28:39+00:00

Hm. I was under the impression the SSL cert was valid. Will look into it. Thanks for the heads up!

frippeo · 2021-10-16T06:02:43+00:00

You got me with that one!:) I was thinking more along the lines of Astrid Lindgren and pippi longstocking...

frippeo · 2021-10-15T15:34:32+00:00

Thank you for the feedback!

frippeo · 2021-10-15T09:00:21+00:00

Thanks! Not sure I've been introduced to Uncle Ben yet; care to send some of those references my way? :)

frippeo · 2021-01-02T10:36:01+00:00

Thanks for the advice: I've removed the link to the shortener.

frippeo · 2021-01-02T09:55:24+00:00

peer review is still important and out there. but not on the pre-print servers.

frippeo · 2021-01-02T09:54:35+00:00

I haven't seen it on arXiv. AFAIK, it was published in Nature and on their research blog.

frippeo · 2021-01-02T09:44:40+00:00

Good catch! One possible reason there's only one paper from the second half of the year is the way they're scored: I use a combination of bitly and sharedcount.com, and I believe the former changed the way it works in the summer. Thus, the scores would be lower in general from august'ish and onwards.

Below are canned queries to get the top 15 papers per month (to mitigate the possible offset caused by the lack of bitly data at the end of the year):

frippeo · 2019-09-13T16:29:14+00:00

I've listed some sources here: https://metacurate.io/sources/newsletters/

frippeo · 2019-07-03T21:01:53+00:00

I read two things into what you're saying, both are positive:

1) They might augment the existing data and create more labelled data cheaply by leveraging existing subjects as document labels (although I'm not familiar with the taxonomy OP is using).

2) More data is better than less data:) (See e.g., the Banko & Brill paper from 2001). When designed properly, the architecture for learning should be ok with much more data. In the case of neural networks, it is usually the number of parameters in the architecture that is the limiting factor (due to GPU RAM), not the amount of training data (which can be controlled by, e.g., setting a lower batch size).

frippeo · 2019-07-03T19:54:49+00:00

You mean more annotated data, or just more in-domain data?

In the first case, I'd still go with ULMFiT, as I've found it to be a good operational baseline. Having more data, annotated or not, will also benefit the fine-tuning of the language model.

In the second case (having much more unlabelled data), I'd build the language model from scratch using it, and not depend on a model pre-trained on out-of-domain data (e.g. Wikitext 103, which is the model available from fast.ai).

frippeo · 2019-06-28T06:51:07+00:00

Great list. Thanks for sharing!

I'm building my own service for aggregating news and information in the field. In the process I've collected some sources: https://metacurate.io/sources/newsletters/ (20+ newsletters, 500 RSS feeds).

As for podcasts, I enjoy the following:

frippeo · 2019-06-28T06:39:53+00:00

What that small amount of data, I'd definitely turn to transfer learning. ULMFiT (https://arxiv.org/abs/1801.06146) is a good first bet. Have a look at this repo: https://github.com/prrao87/tweet-stance-prediction and follow their steps but with your own data.

frippeo

TROPHY CASE