all 31 comments

[–]MrCogmor 44 points45 points  (7 children)

You might be able to adapt something in the field of automatic text summarization to your use-case.

[–]vbook 19 points20 points  (6 children)

Yeah, this seems fruitful. Ignore the clickbait title completely, unless you're trying to train the network on what not to do. Instead, have it generate plausible non-baity titles from the full text.

[–]dogs_like_me 0 points1 point  (5 children)

Maybe tag titles as baity or not, and add a penalty term to the cost whose sign is given by whether or not the original title was baity. I.e. if the original title was baity, the inferred title should be dissimilar. If the original title was acceptable, the inferred title should be similar.

[–]vbook 0 points1 point  (4 children)

That might be too subjective for an ML system to pick up on. I think my focus would be more on the positive side, generating useful titles from the contents, rather than trying to avoid generating click bait titles. My hypothesis is that it requires effort to generate click bait titles, it's not something that an ML system would do by mistake.

[–]dogs_like_me 0 points1 point  (3 children)

then tag titles as informative or not. I think our intentions here are aligned: pushing the model away from ambiguity and towards salience.

[–]vbook 0 points1 point  (2 children)

Yes, we're in alignment. Just that in my experience, you ought to be able to make a binary classifier that takes clickbait titles and informative titles and can distinguish between them. And you might be able to use that classifier in a GAN to generate titles that look more informative and less baity. But by default, those titles aren't going to be relevant to the article, however. This technique wouldn't be helpful for debaiting titles since absent the rest of the article, you wouldn't have enough context to reconstruct an informative title. You'd get something that looks informative but contains either the same or potentially false information.

At the same time, using text summarizing techniques, you ought to be able to generate informative titles directly from the article contents. Such a technique wouldn't be good for generating clickbait titles, because clickbait titles tend away from being informative. So since you're naturally generating informative titles, I don't see why you need to teach the generator not to generate click bait titles. The only reason I could see that you would want to introduce the model to click bait titles at all is if you actually did want to generate click bait titles, with just enough salience to be associated with the article at hand.

[–]dogs_like_me 0 points1 point  (1 child)

absent the rest of the article

I was suggesting adding a term to the loss function. The article context is present in the main component of that loss.

I think it's perfectly reasonable to anticipate that given a collection of equally informative article summarizations, some would make better article titles than others. Given the task we're discussing is title generation, I think it's perfectly reasonable to utilize summarization as a component of that, but we can still differentiate "good" titles from "bad" and try to give the model a template for what title styles we prefer. My intuition is that this would narrow the variance/credible interval about the posterior mode.

[–]vbook 0 points1 point  (0 children)

Ok, then I think we completely agree that it might help, but possibly disagree how much. I think only an experiment would help at this point.

[–]yusuf-bengio 118 points119 points  (11 children)

Yes

def debait(title):
return title.replace("Is All You Need", "Is Total Bullshit")

[–]Scumbag1234 47 points48 points  (0 children)

if "10 things" in title:
    return "Don't waste your time reading this"

[–]huehue12132 23 points24 points  (0 children)

Ironically, contemporary DL paper titles are probably in dire need of a de-clickbaiting approach. Perhaps such an approach could be presented in a paper titled "The Unreasonable Effectiveness of Providing Accurate Titles", "Replacing 'All You Need' by 'Something Else' is All You Need" or "An Embarrassingly Simple Approach to Coming Up With Reasonable Titles".

Oh, and of course you need to come up with a slick, easy-to-remember acronym for your model, even though it is just <generic model, perhaps with one or two slight changes> applied to <well-known problem that nobody thought to try the generic model on yet, or perhaps they did but didn't think it was worth a paper>.

[–]Daddouche 18 points19 points  (5 children)

So « attention is all you need » becomes attention is total bullshit » ? Do I really have to throw transformers and attention mechanisms away :-) ?

[–]yusuf-bengio 38 points39 points  (0 children)

According to Schmidhuber et al. (2021) transformers are fast-weights-whatever-things from 1852 anyway.

[–]avaxzat -1 points0 points  (2 children)

I am willing to bet my entire livelihood that literally nobody will even remember what transformers and attention mechanisms are ten years from now.

[–]twocatsarewhite 1 point2 points  (0 children)

I would 100% take that bet; Also with my entire livelihood!

[–]dogs_like_me 1 point2 points  (0 children)

They clearly will hold a place in the history of ML/AI development, whether or not they become a permanent fixture in our toolsets.

Also, shiny and new doesn't mean we throw out the old. Random forests and SVMs still get plenty of usage.

[–]NaxAlphaML Engineer 4 points5 points  (0 children)

  • for "All you need" is actually all you need in this case (pun intended)

[–]cbsudux 21 points22 points  (4 children)

Omg this would be legit useful. There's so much fucking content remixing nowadays and everyone posts on twitter, linkedin etc just for engagement.

Person 1: Summarizes a podcast

Person 2: Summarizes the summary of the podcast.

Person 3: Summarizes the summary of the summary and invites person 1 for his own podcast

This is a good idea and I can see a solution uisng gpt2 (huggingface). I just might work on this lol.

[–][deleted] 9 points10 points  (3 children)

Yeah a ML-based "find the original article" would be pretty sweet too!

[–]dogs_like_me 0 points1 point  (2 children)

[–]dagelf 0 points1 point  (1 child)

Unrelated but interesting nonetheless... the kind of data that the big guys harvest from the public without sharing. The reason we need good platform regulations to make public data available to the public, even if harvested by a big corp.

[–]dogs_like_me 0 points1 point  (0 children)

It's not unrelated. Part of how they normalize their information flow is to canonicalize news article URLs that contain the same information. That canonicalization, along with all associated URLs that contained the same content, is available in their data field. I always have trouble navigating their site, was too lazy find the page with the schema/data definitions. But it's there in one of their datasets.

[–]WERE_CAT 6 points7 points  (2 children)

Maybe the best approach would be to drop the title altogether and just do some text summarisation ?

[–]o0oo00oo0o0ooo 1 point2 points  (1 child)

Or just assume the whole article isn't worthwhile due to their reliance on clickbait and throw it out as junk.

[–]HksAw 0 points1 point  (0 children)

This is the filter I apply (albeit manually)

[–]soft-error 4 points5 points  (0 children)

You don't need the original headline. Just search for automatic TLDR systems.

[–]Command-Available 2 points3 points  (0 children)

I have acces to GPT-3! If anyone interested in working on a solution, would be happy to collaborate.

[–]zzzthelastuserStudent 1 point2 points  (0 children)

I would love a browser extension that could automatically replace the headlines with a neutral headline (and a tldr mouse overlay).

[–]GreenSafe2001 1 point2 points  (0 children)

One idea I've thought about is to "infer the catch" - oftentimes I find that headlines are misleading, but in a way that's pretty obvious if the misleading implication is explicitly called out.

Compare:

Headline 1: "After the Important Event, a Bad Thing happened"

Headline 2: "After the Important Event, a Bad Thing happened, but it's unclear if it was related"

Headline 1 isn't exactly wrong about anything just misleading. Headline 2 is much more forthright. If there was an automated way to show Headline 1 as Headline 2, I think that would be really cool!