Thursday, February 14, 2008

Falsifying Data II: Hits, Runs, Errors

Or, More Favorites

3. Someone does an experiment five times. It works once. They report it worked.
Problems: Irreproducible. But many, many perfectly valid results are more or less irreproducible, or can only be done under the same exact circumstances. These are what we call 'non-robust' results. Probably more likely to slip through. I personally think cherry-picking is the worst: the person who does it knows that their result is probably not the real one, and maliciously chooses to hide a valid part of their data set. Somehow it seems nastier to cheat than to lie outright. I suppose because it's more devious than baldfaced. Of course, the fact that I spent six months trying to replicate someone's cherrypicked data has nothing to do with it, oh no...

4. Someone makes a mistake and doesn't catch it in time.

These fall into two camps: Honest (also Kind of Honest, Or We Were Really Sloppy), and Not. Good retraction: Oops, we made a mistake; we're sorry that we misinterpreted our data. Thanks to so-and-so for setting us straight. (Although part of the problem is that high-profile papers take shortcuts, but that's another problem.) Bad retraction: Oops, we got caught.

My lab, on the other hand, just discovered that an artefact caused a 10-fold error in a value we reported. Erratum? No way.


Some of you fine readers maintain that outright fakery is worse. And I agree intellectually. But personally, I must say I find the other kinds much more inconvenient.