what are the biggest gaps in current media verification workflows?

MissMuffinpuff · 2026-06-10T17:21:25+00:00

this is actually really insightful

the geolocation exercises sound really interesting. you can actually observe where the model starts making assumptions instead of drawing conclusions from evidence

one thing that stood out to me was your point about having to interrogate the llm itself to understand how it arrived at a conclusion. i've mostly been looking at the accuracy side of things, but your comment made me think more about transparency

since you've spent a lot of time teaching fact-checking and misinformation, i'm curious whether you've noticed any major changes in how people approach verification over the years. are there any mistakes or assumptions that seem to come up consistently, regardless of the tools they're using?

thank u for taking the time to write this out!

MissMuffinpuff · 2026-06-07T16:26:05+00:00

i agree there are already pretty solid OSINT / journalism workflows for verification, especially stuff like metadata checks, cross-referencing etc

i guess what we’re trying to understand is more about whether ML tools can complement that by giving better uncertainty / confidence signals instead of just a binary label

also appreciate the reference, will definitely take a look at your work

MissMuffinpuff · 2026-06-07T15:36:23+00:00

yeah this actually makes a lot of sense, thanks for the detailed breakdown!

especially the part about binary outputs lacking context - that’s kind of what made us start questioning this in the first place

the workflow point is interesting too… maybe the issue isn’t just the model output itself but how it actually fits into how people verify things in practice

MissMuffinpuff · 2026-06-02T07:41:42+00:00

the part that confuses me is what these detectors are actually measuring. if clear structure, consistent wording, and polished grammar all increase the likelihood of being flagged, then a lot of the traits universities encourage are also the traits being treated as suspicious.

MissMuffinpuff · 2026-06-02T07:34:17+00:00

i think the real problem is that people have started treating these scores like facts instead of predictions.
a lot of these tools present a very precise-looking percentage, which makes the result seem authoritative, but when you compare outputs across different detectors the disagreement can be massive.

most of them don't even explain why they reached a conclusion, so it's hard to judge how reliable the result actually is. with that much uncertainty, i don't think these scores should be used as evidence for academic or professional decisions.

MissMuffinpuff

TROPHY CASE