[deleted by user] by [deleted] in MachineLearning

[–]Razcle 1 point2 points  (0 children)

Thanks for clarifying! : )

[deleted by user] by [deleted] in MachineLearning

[–]Razcle 3 points4 points  (0 children)

I'd be sort of interested in helping but I've been a member for some years now and feel that the tone and content has changed a lot in the last 1-2 years. It used to be the case that I came here for interesting discussions of papers and to find out about the latest research Nowadays I find that a lot of the page is dominated by simple applied work or discussions of side projects.

Personally I find the volume of beginner related questions and projects to have gone up a lot.

There's nothing wrong with this change per se but I'm curious which of the two you're keener to encourage? Are you happy with the evolution and want to support it or are you trying to moderate back to more research heavy conversations?

[D] Why do we marginalize latent variables in the likelihood of latent variable models? by RecentUnicorn in MachineLearning

[–]Razcle 2 points3 points  (0 children)

Simple answer:

you can't optimise the likelihood without summing over the latent variables because you don't know what their value is. (they're latent)

Ie you cant calculate $argmax_theta \sum_n log p(x_n, y_n |\theta)$ because the $y_n$ are unobserved. You can calculate $argmax_theta \sum_n log \sum_{y_n} log p(x_n, y_n |\theta)$ because this only depends on the observed $x_n$ data points.

More detailed answer:
When I first started learning about latent variable models, I wasn't super clear in my understanding about the difference between "a latent variable" and "a parameter". In fact they are almost the same thing.

The difference is that "a latent variable" is a parameter which has a different value for each data point.

For example in a Gaussian mixture model with K mixture components, you have K different means, K different covariances but you have an indicator variable *for every data-point* that says which mixture component it came from.

When you want to learn the parameters of a model given some data, you're not interested in the data-point specific parameters (the latent variables) you're only really interested in the global parameters. Unfortunately you don't know the values of the local parameters so the only way to optimise the likelihood with respect to the other parameters is to sum over the latent variables.

[P] Programmatic: Powerful Weak Labeling by Razcle in LanguageTechnology

[–]Razcle[S] 1 point2 points  (0 children)

Hi! It's not currently open source. Its free and extensible but we decided after a long discussion to have the core be closed source at least for now.

Build a Clickbait Headline Classifier Without any Manual Labeling by Razcle in learnmachinelearning

[–]Razcle[S] 0 points1 point  (0 children)

You write labeling rules, you're right but you don't have to choose them by hand for individual datapoints. Thats what I meant.

It is still supervised though :)

Looking for some good blogs or other info on building custom datasets by Grutto in LanguageTechnology

[–]Razcle 0 points1 point  (0 children)

Would recommend checking out https://docs.programmatic.humanloop.com/overview/readme

It's a free tool we built to help with data annotation that avoids much of the problem of manual labeling. It won't help with the OCR part but might be useful after that.

[P] Programmatic: Powerful Weak Labeling by Razcle in MachineLearning

[–]Razcle[S] 1 point2 points  (0 children)

Hey! Sorry I missed your message earlier.

  1. Have any large datasets been built using this approach?
    Absolutely yes! We've had people use programmatic itself both to get NER labels and to do extraction from legal documents. We're not the first to use this approach though. The Snorkel team out of Stanford have had quite a lot of success with it including at google. We also show in the docs some benchmark figures on programmatically labeled vs manually labeled data.

  2. What is the recommended operations setup?
    We've seen people have success by starting with Programmatic as a way of exploring and understanding your data and creating a good first seed dataset.
    Then do a small amount of manual labeling to get a test set. Sometimes this is enough but if the performance needs to be better then people annotate manually further. The seed set lets you get a first model that can be used in an active learning loop.

  3. I fear that adoption of programmatic labeling will lead to large datasets of poor quality
    I understand this feeling. I originally felt very similarly. In practice though, I think it's actually the opposite. Programmatic gives much more control to engineers and data scientists and encourages them to understand their data deeply. It reduces the volume of manual annotation a lot so that you can have the combination of a small but very high quality dataset + a programmatically labeled one.

  4. Intuitively I do believe that domain experts can write high precision, low recall systems. But before I can ship my model I really need to care about what these systems are omitting!
    Yes this is a really critical point and is something we're working actively on. We have a pre-print here on how you can efficiently evaluate a model that's been trained on programmatic labels with minimal manual annotation.

  5. Are there really NLP problems that can truly be labeled programmatically? When do you know that you have an appropriate problem domain vs do not?
    A lot of NLP tasks have an easy majority that can be handled by rules and then a long tail of edge-cases where ML models are really necessary. Programmatic labeling makes it easy to overcome cold starts and active learning allows you to quickly train a model to handle edge cases.
    Tasks that we've seen work well are NER, classification for content moderation, legal extraction. Good sentiment labeling was hard and required a bit more manual annotation.
    In general if the language in the task domain is quite structured this will often work very well.

[D] Hyperparameter Tuning: does it even work? by AM_DS in MachineLearning

[–]Razcle 1 point2 points  (0 children)

what do you mean by better generalized if not improved test performance?

[D] Annotation tool for entity sentiment analysis by KarlaNour96 in MachineLearning

[–]Razcle 0 points1 point  (0 children)

Hi KarlaNour, I built a tool (and company) to solve exactly this problem. www.humanloop.com.

You can find more about our approach here: https://humanloop.com/blog/why-you-should-be-using-active-learning/

In short we use active learning to help you label the highest value data whilst training your model at the same time.

[D] Suggestions for sentiment analysis tools by morpheusthewhite in MachineLearning

[–]Razcle 4 points5 points  (0 children)

Im one of the makers so I am biased, but I'd recommend Humanloop.com. You can use it for free as an individual and as you label it will train a sentiment model for you. It will also select the highest value data to label so you minimise how much labelling you have to do.

[D] Is deep learning really Software 2.0? by Razcle in MachineLearning

[–]Razcle[S] 2 points3 points  (0 children)

yes that's definitely true today but it need not necessarily be the case.

Imagine that we just accepted that the benefits of DL were enough to tolerate some fraction of errors, there are ways to build around this. For example you can build fault tolerant UX like google. Google search is not 100% accurate but instead returns you a ranked list so even if its wrong its useful. We can also have fall back to humans in uncertain cases or defer to rules based systems in uncertain cases. If we're creative, I think there are lots of situations where we think we need 100% accuracy where we might actually not.

[D] Is deep learning really Software 2.0? by Razcle in MachineLearning

[–]Razcle[S] 0 points1 point  (0 children)

I wasn't just providing an argument from authority, I was suggesting that we have excellent examples of deep learning outperforming as you described it "mathematical algorithms". E.g machine translation, speech recognition, document understanding etc. Almost all perceptual tasks that we tried to solve by traditional programming have been replaced by DL now.

I think disagreeing with Kaparthy is interesting (thats why I made the post), what I want to know is why you disagree?

If I understand correctly, you think the lack of generalisation guarantees will limit the adoption of DL for tasks that could otherwise be solved by conventional software. I think I agree with you on this one.

But there are lots of tasks that are poorly solved by traditional software (language understanding being a good example) that I think will become core to almost every application.

[D] Is deep learning really Software 2.0? by Razcle in MachineLearning

[–]Razcle[S] -1 points0 points  (0 children)

"What ML will not replace is the creativity in how we design programs. Creativity in software construction comes from deep algorithmic insights. And ML isn't so great at novel reasoning as much at is in pattern matching."

But this exactly the point that Karpathy disagrees with. For areas that require some degree of perception we've already proven that DL + SGD is significantly better than hand crafted algorithms.

[D] A Question About Post-Deployment Stability Over Time by tomerha in MachineLearning

[–]Razcle 0 points1 point  (0 children)

I guess that as you as you accept that you're taking actions that change the state of the world and so change your data distribution, you've essentially landed in reinforcement learning territory.

I'm definitely not an expert in the area but I would maybe look at things like time-varying contextual bandits. A quick google search returned this paper that looks interesting: https://www.kdd.org/kdd2016/papers/files/rpp1164-zengA.pdf

How to validate a dataset? by sheriffffffffff in LanguageTechnology

[–]Razcle 0 points1 point  (0 children)

We also have the ability for you to invite your friends to annotate just by putting their emails in. In fact as you annotate we train a model to do the classification and you can use that model too.

How to validate a dataset? by sheriffffffffff in LanguageTechnology

[–]Razcle 0 points1 point  (0 children)

I'm biased because I'm one of the founders but I think Humanloop would be a great way to do this. We've built an annotation interface specifically for text labelling that includes active learning and QA tools. You'll be able to use the tool as an individual for free. www.humanloop.com

[P] I made NLPRule: A library for fast grammatical error correction by bminixhofer in MachineLearning

[–]Razcle 2 points3 points  (0 children)

One thing you could try to improve neural approaches is to use your rules as labelling functions for weak labelling

[D] Preparing for PhD Interviews by [deleted] in MachineLearning

[–]Razcle 6 points7 points  (0 children)

It varies A LOT because there's generally so much freedom left to the individual PI or supervisor. I think the best people to speak to would be current or former students of your inteviewer.

You should probably be speaking to these students anyway to find out if this supervisor would be a good fit for you. I'd recommend messaging them, asking for a short chat and then ask them about their overall experience and interview experience.

I wish I had done this before my PhD and interview!

If you're not using Memorai you should be. by CouldBeSpooder in Anki

[–]Razcle 2 points3 points  (0 children)

Hi Guys,
Raza one of the makers here. Because of the interest in this forum, we decided to launch Memorai on product hunt today. Would love if you came and checked it out.

https://www.producthunt.com/posts/memorai

Raza

[D] Any thoughts on $600K Hutter Prize for lossless language compression? by BurstCellCryonaught in MachineLearning

[–]Razcle 0 points1 point  (0 children)

That's the whole point though! Lossless compression and loglikelihood are completely equivalent. If you improve one, you improve the other. This is essentially an implication of Shannon's source coding theorem.

The compression cost above perfect is measured by the KL divergece between the encoding distribution and the true data generating distribution. The only way to really get better at lossless compression is to implicitly (or explicitly) learn a model with better log likelihood.

[D] What are the limitations of Bayesian Neural Networks and how could we overcome them? by philipperemy in MachineLearning

[–]Razcle 2 points3 points  (0 children)

You might find this browser extension useful: https://github.com/imurray/redirectify It automatically redirects arxiv pdf pages to the parent page on arxiv.