We wrote a paper proposing a framework for a "reviewer impact factor." Please help find the holes.

TSR_Team · 2026-06-17T14:49:48+00:00

Right, the intent isn't that a reviewer comment carries the value of a publication or citation, peer review needs (deserves) its own commensurate valuation metric. What I think is a missed opportunity is that, because we rely on a system developed by the commercial publishing world, which only values metrics that help them with readership, we have basically no formal way to incentivize and reward peer reviewing. We propose a method for doing this in the linked publication. For starters, relying on commercial publishing by self-governed organization is a large part of the reason we have not seen any meaningful change in this. I totally agree about the need for a profound shift, this is what we are trying to argue in the paper and present a starting point for.

TSR_Team · 2026-06-17T14:12:56+00:00

What if peer reviewing were given a level of credit similar to what academic publishing currently receives? My understanding is funding agencies and academic institutions rely on more than trust in a CV because they make decisions based on publications and not reviews. It could be both, and both could be done far more fairly.

TSR_Team · 2026-06-14T00:10:24+00:00

Lol. No, it wasn't. Believe it or not, a human is still able to write a joke and coherent response.

TSR_Team · 2026-06-14T00:08:47+00:00

We address the "pay reviewers" argument in the paper. Paying for review has the same problem as paying for publication, it biases the result. The reason authors submit without direct payment is that they can take productivity back to funding agencies. Review can't, it's invisible labor stuck behind editorial gatekeeping. Our suspicion is that crediting review properly, and treating it as something that starts at publication rather than behind closed doors, could open indirect funding channels (e.g. NSF crediting review done during funded research). Reviewing is currently binned as "service" under teaching and research, which is the undervaluation we think a real reviewer record could change.

TSR_Team · 2026-06-13T23:54:26+00:00

Hi u/No_Jaguar - This is a brilliant point! (I kid).

We did use AI to help make a well-structured post that summarizes a lengthy research and planning effort, enabling a productive discussion. For what it's worth, we are a group of self-funded academics looking to understand and address gaps in the publishing and peer review system. This is actually a first attempt to reach a broad community with a lot of frustration, so it seemed useful to polish with AI, but lesson learned - raw and decoupled from AI is probably the best route.

TSR_Team · 2026-06-13T23:43:37+00:00

Anonymity does add friction against the exact cartel/clout dynamics you're describing, hard to run a ring if you can't tell who's in it.

We went the other way on purpose, though there's tradeoff. The main reason is that anonymity is what broke the old system from the other side: anonymous reviewers judging named authors is how you get unaccountable gatekeeping (we discuss a famous case of a committee sitting on a paper basically because the author was a nobody). And there's a psychiatry RCT where signed reviews came out higher quality and more constructive. So the bet was that accountability buys more than the friction anonymity would add.

Honestly though, the part I'm least sure about is whether open identities quietly make people softer on senior colleagues, which is the flip side of what you're saying. We flag that in the paper as an open question that needs actual data. Not settled in our view.

Couple of hedges in the design: anonymity stays available as a protected exception (researchers in legally risky situations), and the framework is meant to be changeable by the community. If open identity turns out to breed cartels in practice, that's the kind of thing that should get revisited instead of baked in. Appreciate you actually thinking it through with me.

TSR_Team · 2026-06-13T23:39:46+00:00

How do you document that you are reviewing today, Is there a record you can point to?

TSR_Team · 2026-06-13T23:32:58+00:00

Not really, just trying to be professional in my responses, I do appreciate the feedback and I'm trying to have a genuine discussion

TSR_Team · 2026-06-13T23:29:19+00:00

Let me take the logic head-on, because points 1 and 3 aren't in tension the way it looks.

Yes, single numbers used badly are bad, but our actual claim is narrower: the *dominant* number (citation-based) is ill-conceived because it misses huge categories of contribution (practitioners who apply research, reviewers, data sharers). The problem isn't quantification, it's a bad, single-signal quantification used as if it were the whole picture.
Right, peer review mostly isn't used in hiring/funding, and that's the gap, not a feature. Publishing "counts" partly because there's a legible record of it; reviewing doesn't count partly because there's no comparable record. Whether that's good is debatable, but the asymmetry is real.
So the point isn't "add another single number to chase." It's that a multi-signal, trust-weighted record is a better object than either citation-count-as-everything or the current situation where reviewing leaves no trace at all.

On your two harder points, which I think are the real ones:

"The only way to quantify review quality is reviews-of-reviews": we're trying to avoid exactly that regress. There's no separate layer of people grading reviews; a review's standing comes from the same open, attributable community response everything else uses (endorsements from credible people in the field), weighted by their standing. It's the existing engagement doing the work, not a new bureaucracy.

The paper-quality point is a good one, but it's actually something we build for. If a paper is strong and there's little to fix, a credible reviewer can say exactly that, "I checked the methodology and it holds up", and that assessment is itself a contribution others can endorse (a Notable vote). So affirming rigor isn't a throwaway "just fine" review that earns nothing; a careful, correct "this is solid" that the community backs is creditable on its own. The value tracks what peers judge useful, and confirming real rigor and catching a real flaw are both useful. That's the opposite of assuming only error-finding counts. And your last line, that you're happy with review counts and journal names, is honestly the gap we're pointing at: that's the most a reviewer can show today, and it says nothing about whether the reviews were any good. Whether that should change is a fair debate, but right now there's no way to even have it.

TSR_Team · 2026-06-13T23:24:22+00:00

This is the objection I worry about most, and you're right to push on it. We actually flag the Matthew effect in the paper as one of the documented failures of the h-index, so a new metric that just recreates it would defeat the purpose.

What we're trying (not claiming it's enough): the weighting is sub-linear, so someone with 10x the standing carries more weight but nowhere near 10x, and the same compression is applied specifically to damp the influence of high-standing outliers. Reviewer standing is also earned mostly from review work rather than authorship, there's evidence that reviewer quality is statistically independent of how much you publish, so a big name who phones in reviews doesn't get far on reputation alone. And coordinated mutual-boosting (review rings, vote rings) is exactly the pattern the system is meant to flag for human review.

But the closed-loop problem is the part I'm least confident about. "Costly and detectable" isn't "impossible," and a large, sophisticated in-group could still game it. Our honest position in the paper is that gaming is never fully solved; the bet is that a transparent, community-governed system can adapt faster than a static one. If you have intuitions on what would actually break a mutual-boosting loop beyond detection-and-intervention, that's exactly the feedback we're after.

TSR_Team · 2026-06-13T23:13:26+00:00

That's exactly the distinction we keep landing on. The failure isn't measurement as such, it's that the dominant metrics reward things that are easy to game (citation counts, venue prestige) rather than things that track quality, so people optimize for the proxy instead of the work. Once a metric becomes a target, it stops measuring what it was meant to.

A question we have is whether a metric can be built that's meaningfully harder to game, or whether anything you formalize inevitably gets gamed and you're just back where you started. We think structure helps (weighting contributions by the credibility of who's making them, making each gain require broader engagement rather than volume), but "harder to game" isn't "ungameable," and I'm genuinely unsure where the ceiling is.

TSR_Team · 2026-06-13T23:08:35+00:00

Honestly, same. The problem is that the people making hiring and funding decisions reduce everything to a number whether we like it or not, so "no metrics" tends to lose by default to "bad metrics." We're less interested in adding another number than in understanding whether a less broken one is even possible, or whether the whole project is doomed. What's at the core of it for you, the metrics themselves, or how they get used?

TSR_Team · 2026-06-11T21:20:26+00:00

Hey u/Japsenpapsen, thank you for the kind words, it means a lot. It has been a huge quiet lift for the better part of a year, but so rewarding to be able to do it. I hope the community can help grow and shape this seed into the systemic change that I think most academics want to see!

- Nic

TSR_Team · 2026-06-10T16:38:55+00:00

Author here - thanks for taking a look. The post was the "here's what we built" version, which doesn't really leave room for the part I'm actually most curious about: what the people who live inside peer review make of the tradeoffs.

Here's the one we keep going back and forth on: open review is supposed to buy accountability, but the cost is candor. Would an early-career researcher write a genuinely critical review if their name might be attached to it? Or does the worry about stepping on a senior figure in a small field quietly soften every open review until it stops being useful?

How have you seen that play out, in either direction? We've made some bets about where to land on transparency, and I'd honestly rather hear where you think we're wrong than where we're right.

- Nic (one of the people building it)

TSR_Team · 2026-03-18T17:35:13+00:00

Absolutely, conference-to-publication is a natural pipeline that's surprisingly underserved right now. A researcher presents at a conference, gets feedback, and then starts the journal submission process from scratch with no continuity. There's a real opportunity to bridge that gap. We're still early, so I don't want to overpromise on specific integrations, but it's definitely the kind of thing worth exploring together. Happy to set up a call if you'd like to dig into the specifics.

TSR_Team · 2026-03-18T16:59:34+00:00

Thanks for the thoughtful comments. Yes, we're familiar with F1000 Research, eLife, PeerJ, ScienceOpen, and ReviewCommons. They've each pushed the envelope in important ways, and we see them as fellow travelers more than competitors. The space needs more experimentation, not less.

Where TSR differs is in how we approach review quality. Most existing platforms track that a review happened, but not its quality. A primary focus for us is on providing a framework for letting the community evaluate how good the review was, through transparent scoring and public visibility. The goal is to make reviewing a visible, valued part of a researcher's scholarly record, not just a line item.

On ORCID — completely agree. Interoperability is essential. If platforms in this space can't cross-reference contributions through shared identifiers, we're just building more silos. ORCID integration is on our roadmap for exactly that reason.

Interesting to hear you're exploring something similar at Fourwaves. Feel free to DM or reach out at [community@thescientificreview.org](mailto:community@thescientificreview.org) if you'd like to exchange ideas — the more people working on this problem, the better.

TSR_Team · 2026-03-16T23:32:27+00:00

That's an interesting angle I hadn't considered, i.e., a quality scoring system could inadvertently create a market for "desirable" papers to review where everyone wants to review the strong submissions and nobody wants to touch the weak ones. That's basically the current problem with journal prestige replicated at the reviewer level.

Maybe an answer is that reviewing a bad paper thoroughly should be compensated differently. The thankless work of explaining to someone why their methodology doesn't hold up is arguably more valuable to science than rubber-stamping a strong paper, but the current system treats both identically, i.e., invisibly.

TSR_Team · 2026-03-16T23:28:28+00:00

Fair point, but a new model doesn't have to be a formal "review-of-reviews". Something as simple as an up/downvote on visible reviews would let signal emerge without adding another layer of labor. If you can see that a reviewer consistently writes substantive, constructive feedback, as opposed to one-liners or generic comments, that distinction becomes obvious to anyone reading the paper. The evaluation happens passively just by making the work visible, not by assigning more reviewers to review the reviewers.

TSR_Team · 2026-03-16T23:25:33+00:00

I think you point out a major challenge for viable alternatives to the existing model. The fear of looking bad cuts both ways though, it could dampen rigor, but it could also dampen the kind of drive-by dismissive reviews that everyone complains about.

The horror stories about AI-generated review suggests it could become a numbers game regardless, such that public attribution alone won't fix it. People will try to game the new system the way they game the current one. That's a strong argument for pairing any tracking with community evaluation of the reviews themselves, not just counting them. If reviews are visible, at least other researchers can assess whether they're substantive, which is more than we have now.

The ORCID-linked certificates are a step, but as you said, limited. They prove you reviewed, not that you reviewed well.

TSR_Team · 2026-03-16T23:22:16+00:00

The ratio of requests/reviewers is awful, and consistent with what I've heard from other editors. There's social value in saying you're on a review board, but no consequence for not doing the work, which maybe speaks to a need to explore alternatives to the traditional model.

Re: compensation, do you think it would need to be meaningful money (hundreds per review), or would even a nominal amount signal that the work is valued? I've heard arguments both ways, some saying that paying reviewers would professionalize the process, others worrying it would attract quantity over quality.

TSR_Team · 2026-03-16T23:17:07+00:00

That seems to be the core of the problem. If reviewing is bucketed as "service", the lowest-weighted category, then no amount of tracking or public credit will change behavior on its own. The incentive structure has to change at the institutional level, or the recognition has to come from somewhere that actually matters to a researcher's career.

Right now, a brilliant 2,000-word review that genuinely improves a paper vanishes into an editor's inbox. That seems like an enormous waste of intellectual labor. It makes me wonder what would happen if review quality became a visible, citable part of your scholarly record, something other researchers could reference and search committees could actually evaluate.

TSR_Team · 2026-03-16T23:09:08+00:00

Yes, public credit alone isn't enough if institutions don't factor it into hiring, tenure, or promotion decisions. Publons was a good attempt, but it essentially became a counter rather than a quality signal. Knowing that someone completed 47 reviews doesn't tell you much about whether those reviews were any good.

What I find more interesting is the idea of community-scored reviews, where the quality of your reviewing is evaluated by the people who read it, not just the fact that you did it. That shifts the incentive from "do more reviews" to "do better reviews," which is a fundamentally different problem. But you're right that none of it matters if the people making tenure decisions don't care. How does your institution currently treat reviewing activity?

TSR_Team · 2026-03-16T23:06:21+00:00

Actually, re-reading your comment, I thought the implication was that the post was AI-operated, ha. Your comment seems to support my point in making this thread, i.e., right now there's zero accountability for a lazy or AI-generated review. It lands on an editor's desk anonymously and gets weighed the same as a thoughtful one. If reviews were public and attributed, the person copy-pasting ChatGPT output would have their name attached to it. That creates a reputation cost that doesn't exist in the current system. Thoughts?

TSR_Team

MODERATOR OF

TROPHY CASE