Subject codes, incomplete and unreliable, have got to go - Crossref

Subject classifications have been available via the REST API for many years but have not been complete or reliable from the start and will soon be deprecated.

This is a companion discussion topic for the original entry at

Hello, I have previously conducted some research based on your past subject classifications. I just saw this post. I would like to ask how significant the issues with past classifications are and whether they can be applied to serious academic research. I have a large number of documents; can statistical methods be used to reduce the impact of errors in them?

Hello. It’s hard to say whether the issues we observed would hinder your academic research as it is difficult to quantify the accuracy of our now-removed subject classifications. Given the first reason for why they were problematic noted in the above blog post, I would be wary of their utility:

They are misleadingly exposed in the API as a property of the work, when in fact they are a property of the container (e.g. a journal or conference proceeding). Just because a journal’s broad topic category is “X” doesn’t mean that a particular article in the journal is about “X.”

But again, without knowing more about how they were applied in your work, it’s hard to say.