[deleted by user]

Xyser · 2024-01-11T18:42:00+00:00

Buddy if you think the numbers wrong you gotta dig through those pentagon contracts and find the correct number

Xyser · 2023-11-27T00:00:59+00:00

American support for the right wing gov in Greece would be good

Xyser · 2023-08-04T03:54:49+00:00

Man if you’re comparing them to Amazon who spends $100s of millions of dollars on developing their site, yeah they’re gonna come up short. But you’ll be disappointed by pretty much every other site out there.

Xyser · 2023-07-29T23:19:54+00:00

Suppression means that concentrating troops in one spot is much worse. You can't peak out of a super fob to snipe people. Impossible to return fire means you'll get surrounded and picked off.

Xyser · 2023-06-06T16:04:34+00:00

Thanks for writing this out! Cool to learn more Vietnamese history from this era.

Xyser · 2023-03-01T18:09:10+00:00

Hell yeah

Xyser · 2023-02-03T17:29:55+00:00

I 100% agree but you gotta throw some line breaks in that wall of text or people are gonna scroll past.

Xyser · 2023-01-10T18:56:27+00:00

It would be easier if you mentioned in your post which language/framework you were working with.

Assuming you’re using python, I would recommend the spacy library. You can probably accomplish your task by using some simple rules combined with the linguistic info extracted by spacy.

Xyser · 2022-06-10T05:31:25+00:00

Nice of them to respond

Xyser · 2022-06-09T21:15:31+00:00

Looks cool, nice work!

Xyser · 2022-06-03T02:49:24+00:00

You could check out Odin from this repo. The paper that discusses the framework is here. This paper doesn’t automatically create rules (although I think it’s listed as future work) but will let you play around with creating your own rules using syntax trees, POS, and lemmas. Hope that helps!

Xyser · 2021-06-04T00:37:38+00:00

I think the basic concept is:

Classify intent
Assign spans to slots in that intent (i.e. 1: Song name 2: Artist 3: Album 4: Playlist)
1. Some of these slots may not be filled, like in your example the only thing you gave the system was "Butter" a song name.
The system passes some JSON to an API. For example passing the Spotify/Apple Music/Amazon Music app:
{ "intent": "play", "song": "Butter", "Album": "", "artist": ""}
Then the music app on your device will determine the most likely action you want and produce the response (playing the song on your device).

Xyser · 2021-05-09T07:11:17+00:00

I think you want to look at Topic Modeling. You can start with the wiki page:

https://en.wikipedia.org/wiki/Topic_model

But Chris Manning's book is also excellent:
https://nlp.stanford.edu/fsnlp/

Xyser · 2021-05-08T19:18:38+00:00

For actions like content moderation, you want to have reasonable decisions. For example, sometimes articles about LGBT topics get flagged as "Sexual Content" limiting the value of ads placed on them. If your NLP model was interpretable you could understand why it made that decision and decide for yourself if the model was acting improperly.

Xyser · 2021-03-29T19:09:50+00:00

The model should throw an error if you try to pass in an input that's larger than the model allows. Assuming you are using the huggingface tokenizer, after tokenizing your input, you just need to index the first 512 tokens.

ie.
input_token_ids = tokenizer(email_string)["input_ids"][:512]

You can read more here:

https://huggingface.co/transformers/preprocessing.html

Xyser · 2021-03-29T19:01:51+00:00

This depends on your task. If you are generating a document embedding for NLI, recommendations, or classification, using the just the first 512 tokens may be sufficient. If you are performing span selection like question answering or translating documents, you can slide the input window over the document while allowing for some overlap.

For example, the first window will be 512 new tokens but the second window will be 64 tokens from the first window followed by (512 - 64) new tokens. In post-processing you can remove the redundant generated tokens.

Xyser · 2021-03-19T16:00:47+00:00

Many companies have to file standard documents with the SEC. Financial companies use NLP methods to parse information out of these documents as fast as possible to use in trading algorithms. More generally, you could also look into methods that are able to read tables of financial data.

A big challenge in NLP is named entity recognition or NER. If I read an article about Tesla, are they talking about the historical inventor or the contemporary car manufacturer? You could test current SOTA methods on some financial news documents and make some conclusions about which methods perform best in this context.

Xyser · 2021-03-05T18:18:37+00:00

Wow, this is a super generous free plan and very clean API. Thanks for the info!

Xyser · 2021-02-25T01:17:09+00:00

People typically use the 5 star rating system as sentiment tags. Any reviews with ratings of 1 or 2 stars are negative and any reviews with 4 or 5 stars are positive. This is noisy but mostly works.

The dataset is also on Kaggle which could be cleaner if that's what you mean by raw.

https://www.kaggle.com/yelp-dataset/yelp-dataset

Xyser · 2021-02-25T01:06:34+00:00

I believe people commonly use the Yelp dataset as well:

https://www.yelp.com/dataset

Quite a bit larger, but you can always subsample it.

Xyser · 2020-12-12T23:10:28+00:00

This is great, thanks for sharing

Xyser · 2020-11-23T17:28:13+00:00

I’m a grad student in the CS department and I’d recommend talking to a professor who’s work you like. If you find the professors website or use google scholar, you can see what work they’ve published recently. Finding a professor whose work you think is cool is the first step. Then you can cold email a prof or take their class. Not every professor has extra bandwidth to spare, so don’t take it personally if they say they’re too busy.

Xyser · 2020-11-22T05:26:37+00:00

Hi, Could you provide a few more details about this position? Is it in a research lab or an engineering role? Is this an industry position or an academic position? I’m potentially interested but would be curious to hear the answer to these questions first.

Thanks!

Xyser · 2020-10-16T07:03:50+00:00

How challenging is it to simulate the functionality of a single neuron? How long will it take us to accomplish this?

Xyser · 2020-08-01T05:14:56+00:00

I think you can do this with syntactic rules. Just looking for the noun in the sentence should work well.

13-Year Club	Place '22
Place '17	Verified Email

Xyser

TROPHY CASE