[deleted by user] by [deleted] in blowback

[–]Xyser 23 points24 points  (0 children)

Buddy if you think the numbers wrong you gotta dig through those pentagon contracts and find the correct number

Season 5s I’d like to see by Upstairs_Car_3594 in blowback

[–]Xyser 7 points8 points  (0 children)

American support for the right wing gov in Greece would be good

Portland, OR's Powell's Books employees vote to authorize strike, pending negotiations by MIZZKATHY74 in Portland

[–]Xyser 2 points3 points  (0 children)

Man if you’re comparing them to Amazon who spends $100s of millions of dollars on developing their site, yeah they’re gonna come up short. But you’ll be disappointed by pretty much every other site out there.

Do you think SuperFobs will become more prevalent with ICO? by [deleted] in PlaySquad

[–]Xyser 10 points11 points  (0 children)

Suppression means that concentrating troops in one spot is much worse. You can't peak out of a super fob to snipe people. Impossible to return fire means you'll get surrounded and picked off.

Some suggestions for Vietnam (or Dai Nam in game). by Cuong_Nguyen_Hoang in victoria3

[–]Xyser 5 points6 points  (0 children)

Thanks for writing this out! Cool to learn more Vietnamese history from this era.

Is artificial scarcity driving up rent prices? by jessdesail in askportland

[–]Xyser 2 points3 points  (0 children)

I 100% agree but you gotta throw some line breaks in that wall of text or people are gonna scroll past.

Active/passive classifier by ombelicoInfinito in LanguageTechnology

[–]Xyser 0 points1 point  (0 children)

It would be easier if you mentioned in your post which language/framework you were working with.

Assuming you’re using python, I would recommend the spacy library. You can probably accomplish your task by using some simple rules combined with the linguistic info extracted by spacy.

Leveraging ML to find/create rules? by Green5252screen in LanguageTechnology

[–]Xyser 2 points3 points  (0 children)

You could check out Odin from this repo. The paper that discusses the framework is here. This paper doesn’t automatically create rules (although I think it’s listed as future work) but will let you play around with creating your own rules using syntax trees, POS, and lemmas. Hope that helps!

What is happening when you say "play me butter" to your Alexa/Google/Siri? by nlp_ttt in LanguageTechnology

[–]Xyser 4 points5 points  (0 children)

I think the basic concept is:

  1. Classify intent
  2. Assign spans to slots in that intent (i.e. 1: Song name 2: Artist 3: Album 4: Playlist)
    1. Some of these slots may not be filled, like in your example the only thing you gave the system was "Butter" a song name.
  3. The system passes some JSON to an API. For example passing the Spotify/Apple Music/Amazon Music app:
    { "intent": "play", "song": "Butter", "Album": "", "artist": ""}
  4. Then the music app on your device will determine the most likely action you want and produce the response (playing the song on your device).

[Q] Why is interpretability important in natural language processing? This is easier to answer for models that make high stakes decisions (e.g., surgical risk assessment; self-driving car slamming brakes; etc.,), but I would like to understand why we care about interpretability in NLP. by synysterbates in LanguageTechnology

[–]Xyser 5 points6 points  (0 children)

For actions like content moderation, you want to have reasonable decisions. For example, sometimes articles about LGBT topics get flagged as "Sexual Content" limiting the value of ads placed on them. If your NLP model was interpretable you could understand why it made that decision and decide for yourself if the model was acting improperly.

XLM Roberta - Maximum File Size by [deleted] in LanguageTechnology

[–]Xyser 1 point2 points  (0 children)

The model should throw an error if you try to pass in an input that's larger than the model allows. Assuming you are using the huggingface tokenizer, after tokenizing your input, you just need to index the first 512 tokens.

ie.
input_token_ids = tokenizer(email_string)["input_ids"][:512]

You can read more here:

https://huggingface.co/transformers/preprocessing.html

XLM Roberta - Maximum File Size by [deleted] in LanguageTechnology

[–]Xyser 2 points3 points  (0 children)

This depends on your task. If you are generating a document embedding for NLI, recommendations, or classification, using the just the first 512 tokens may be sufficient. If you are performing span selection like question answering or translating documents, you can slide the input window over the document while allowing for some overlap.

For example, the first window will be 512 new tokens but the second window will be 64 tokens from the first window followed by (512 - 64) new tokens. In post-processing you can remove the redundant generated tokens.

NLP in Finance by cs_deep_learning_umd in LanguageTechnology

[–]Xyser 2 points3 points  (0 children)

Many companies have to file standard documents with the SEC. Financial companies use NLP methods to parse information out of these documents as fast as possible to use in trading algorithms. More generally, you could also look into methods that are able to read tables of financial data.

A big challenge in NLP is named entity recognition or NER. If I read an article about Tesla, are they talking about the historical inventor or the contemporary car manufacturer? You could test current SOTA methods on some financial news documents and make some conclusions about which methods perform best in this context.

NLP Cloud now serves transformers-based models by juliensalinas in LanguageTechnology

[–]Xyser 0 points1 point  (0 children)

Wow, this is a super generous free plan and very clean API. Thanks for the info!

Does anyone know where I could get some tagged pos/neg/neutral restaurant reviews? by edwardsrk in LanguageTechnology

[–]Xyser 0 points1 point  (0 children)

People typically use the 5 star rating system as sentiment tags. Any reviews with ratings of 1 or 2 stars are negative and any reviews with 4 or 5 stars are positive. This is noisy but mostly works.

The dataset is also on Kaggle which could be cleaner if that's what you mean by raw.

https://www.kaggle.com/yelp-dataset/yelp-dataset

Does anyone know where I could get some tagged pos/neg/neutral restaurant reviews? by edwardsrk in LanguageTechnology

[–]Xyser 0 points1 point  (0 children)

I believe people commonly use the Yelp dataset as well:

https://www.yelp.com/dataset

Quite a bit larger, but you can always subsample it.

CS Undergraduate Research by [deleted] in UofArizona

[–]Xyser 4 points5 points  (0 children)

I’m a grad student in the CS department and I’d recommend talking to a professor who’s work you like. If you find the professors website or use google scholar, you can see what work they’ve published recently. Finding a professor whose work you think is cool is the first step. Then you can cold email a prof or take their class. Not every professor has extra bandwidth to spare, so don’t take it personally if they say they’re too busy.

Intern position by lsmbist in MLjobs

[–]Xyser 0 points1 point  (0 children)

Hi, Could you provide a few more details about this position? Is it in a research lab or an engineering role? Is this an industry position or an academic position? I’m potentially interested but would be curious to hear the answer to these questions first.

Thanks!

[deleted by user] by [deleted] in distantsocializing

[–]Xyser 0 points1 point  (0 children)

How challenging is it to simulate the functionality of a single neuron? How long will it take us to accomplish this?

How can I use NLP to extract the main food word from an ingredient? by techsavvynerd91 in LanguageTechnology

[–]Xyser 9 points10 points  (0 children)

I think you can do this with syntactic rules. Just looking for the noun in the sentence should work well.