News: Crossref and Retraction Watch - Crossref

https://0-doi-org.pugwash.lib.warwick.ac.uk/10.13003/c23rw1d9


This is a companion discussion topic for the original entry at https://0-www-crossref-org.pugwash.lib.warwick.ac.uk/blog/news-crossref-and-retraction-watch
4 Likes

The Blog mentions “a community call on 27th September at 1 p.m. UTC to discuss this new development in the pursuit of research integrity.” how is that accessed? have details gone out already?

2 Likes

Hello! Thanks for reaching out. You can register here: Webinar Registration - Zoom

We are looking forward to seeing you online!

Rosa

1 Like

Will the retraction dataset be fully integrated into Crossref’s API? For example, will we be able to search for retractions by journal, author name, institution, etc.?

1 Like

Yes, that’s our plan. Geoffrey, our Director of Technology & Research, mentioned that very thing in a thread earlier today:

'Our recently announced opening of the RetractionWatch data will only ever be made available via the REST API.

-Isaac

1 Like

Will the Retraction Watch Hijacked Journals Checker also be implemented somehow, that would be marvelous!!!

Hey, Crossref, I wanted to let you know that the csv version of the database that can be directly downloaded from this article has mixed character encoding. It’s mostly UTF-8, but has embedded Windows-1252, which frequently happens when copy-pasting from different sources. This unfortunately makes importing that file into any program a real PITA to figure out.

I strongly encourage you to correct the encoding and replace the version at that link (and anywhere else it lives, if there are multiple endpoints). Heck, email me, and I will send you a version in pure UTF-8. You will save many people the time and frustration it takes to figure out why none of the normal encodings are working and to find an appropriate conversion tool.

In the meantime, a tip for anyone using Python: UnicodeDammit.detwingle() can be used to fix this file.

1 Like

Thanks for that tip!

The context provided by my colleagues on our technical team who are working most closely on the Retraction Watch data is that the character encoding of the file that is given to us by Retraction Watch is somewhat broken. We can either decode it as UTF-8, ignoring errors like the ones you’ve pointed out, or decode as Latin1, which incorrectly displays Russian names (among other things).

In January, we switched the encoding from Latin1 to UTF-8, just ignoring the errors. Neither solution is ideal, but that’s the trade-off we have for the moment.

Another workaround in Python is to pass errors=“ignore” to the decode function - there will still be a few problematic entries, but it should make using it go more smoothly.

We are working with Retraction Watch to eventually build a system that will produce the data without those encoding errors.

2 Likes