Ticket of the month - November 2023 - Similarity Check URLs? Tell me more!

Here’s the situation: You’re a Crossref member interested in applying for the Similarity Check service. You look at our documentation about how to become eligible and are immediately stumped by the first sentence:

When you apply for the Similarity Check service, you must ensure you have full-text URLs for Similarity Check present in the metadata of at least 90% of your registered articles (across all your journal prefixes).

There are a couple reasons why this requirement may be confusing, and we’re going to dive into these reasons below.

Why do we need to provide this extra metadata?

The very first thing to tackle is why this is a requirement of using the Similarity Check service. The answer is because Turnitin (the company who runs and manages the iThenticate software on which the Similarity Check service runs) will index these full-text URLs that you provide and store them in their database. This way, when other users check for similarity against their manuscripts, the software will include your publications as well, ensuring that your content is not being borrowed improperly. In order to do this comparison, Turnitin need to access the full-text of articles.

Providing these full-text URLs helps Turnitin have a more complete database of scholarship to check against. It also helps you protect your publications.

Resolution URLs vs Similarity Check URLs

The next thing to look at is what exactly we’re talking about when we talk about “Similarity Check URLs”. To clear this up, we need to clarify the difference between resolution URLs and Similarity Check URLs.

Firstly, a resolution URL is where your DOI resolves. This is usually the landing page (abstract only) of the article, but not always. For example, if your DOI is 10.53502/RAIL-138809, and you search for it on https://0-www-doi-org.pugwash.lib.warwick.ac.uk/, you will be directed to this: http://www.railvehicles.eu/Nowe-trendy-w-rozwoju-transportu-szynowego,138809,0,2.html.

When you look at this DOI through our REST API (api.crossref.org/works/10.53502/RAIL-138809), you can see what it looks like from the backend:

However, to be eligible for the Similarity Check service, you’ll need to include full-text URLs in your metadata. These full-text URLs live in a different field in your metadata than your resolution URLs.

Let’s go back to our example.

We already looked at what the resolution URL looked like from the backend, so let’s look at what the Similarity Check URL looks like:

As you can see, this is in a different section than the resolution URL. It’s tagged as being for “similarity-checking” and points to http://www.railvehicles.eu/pdf-138809-65690, which is the full-text of the article in a PDF download format.

What if all my content is Open Access??

But now you ask one of our more common questions: What if all our content is Open Access and all our resolution URLs already point to the full-text of the article? Why aren’t we eligible?

The answer is still the same: that special Similarity Check URL field in your metadata still needs to be populated, even if it’s exactly the same as your resolution URL. These two fields of your metadata are unique, and how we use and apply that metadata is different.

Here’s an example of an Open Access article and what its metadata looks like:

As you can see, both their URLs point to the same location: https://dx.plos.org/10.1371/journal.pwat.0000009. Even though the article URLs are identical, you’ll still need to include that URL in the “similarity-checking” field of the DOI’s metadata.

OK, but what if all my content is paywalled??

This then leads us to the next question we often get, which is what happens if your full-text content is behind a paywall or a login screen.

You would still provide those full-text URLs in the Similarity Check field of your metadata, but then you’d just need to make sure that you’re safelisting Turnitin’s IP address so that they can index that content. There are some specific instructions about how to do that in our documentation here.

Here is an example of an article that is behind a paywall and what its metadata looks like:

As you can see, there are two different URLs provided here. The resolution URL takes you to the landing page of the article: https://0-online-ucpress-edu.pugwash.lib.warwick.ac.uk/fq/article/54/3/53/41566/The-Circle. And Similarity Check URL is different: https://0-online-ucpress-edu.pugwash.lib.warwick.ac.uk/fq/article-pdf/54/3/53/637605/fq_2001_54_3_53.pdf. This is the URL that Turnitin will have access to once their IP address is safelisted. However, if you don’t have a login for this URL and click on it, it will redirect you to that same landing page. So, even though the full-text PDF URL is being added to the metadata, it will only be available to Turnitin to index.

This is all well and good, but how do I do it?

Now that we’ve sorted out the difference between these two types of URL, the next question is often, “How do I add these to my metadata?”

There are two different processes to add these Similarity Check URLs to your metadata. You may need to add these to already-registered DOIs, and then you’ll need to add them to future DOIs that you plan on registering.

Luckily, we have extensive documentation on both processes. For step-by-step instructions on how to add these full-text URLs to your existing content, please visit us here. Many members find that using the web deposit form’s supplemental metadata upload is the easiest method. If you need assistance getting a list of your DOIs missing Similarity Check URLs or you need help with the upload, don’t hesitate to reach out to Crossref’s Support team below in this thread for a helping hand.

And, depending on how you register your content with Crossref, we have you covered for when you’d like to include these full-text Similarity Check URLs with your new registrations. These steps are covered in our documentation here.

Hopefully, this helps clear up some confusion as you begin the process of applying for the Similarity Check service. If you’d like to see if you’re eligible, please do use our eligibility checker found here. (Please note, this will only check for journal articles. So, if you register other content types, do reach out to us, and we can help.)

And, as always, if you have questions about this, or any of our other services or processes, please do reach out to us in Support for more guidance.