Share your thoughts on preprint metadata

mrittman · July 11, 2022, 2:15pm

Our Preprint Advisory Group was assembled last year to support Crossref to collect and improve the quality of metadata for preprints. The group has developed recommendations in four key areas of the preprint metadata schema. These are:

Preprint withdrawal and removal
Preprints as an article type (rather than a subtype)
Versioning of preprints
Preprint relationship metadata

We invite your feedback on these before we start on implementation later this year. We appreciate all comments on this initial set of recommendations by the end of September 2022.

Please add your comments here and join the discussion (if you’re not sure how, please take a look at our guidelines and ‘how to’ instruction here).

mccurley · July 13, 2022, 5:22pm

I’ve seen many different reasons for withdrawal. In one case the authors claimed the results of the paper are valid, whereas the editors have been convinced of the opposite. I’ve seen other cases in which conflict of interest were violated, or there were other code of conduct violations. The action to withdraw a paper may therefore come from an author (and this may be disputed even by multiple authors), or by an institution associated to the publication (e.g., employer or sponsoring agency), an editor, or the board of directors of a sponsoring society. It is actually difficult to imagine all of the possible social failure modes, because humans are creative. For this reason, when a withdrawal is initiated, it seems useful for the scholarly record that the organization or person who initiates the withdrawal should be identified. This could of course be encoded into the free-text field for reason, or it might be easier to just use a boolean field to distinguish author-initiated vs non-author-initiated withdrawals. My reasoning is that the scholarly record is a social construct, and disagreements are part of the process. We don’t need to insist on surfacing all of the ugly details behind the controversy, but there is a pretty big difference between an author-initiated withdrawal and a non-author initiated withdrawal.

Estabraq · July 14, 2022, 9:00am

Dear Crossref,
I totally agree with you in distinguishing the author who initiated withdrawal. but is it possible to make a specific formula to distinguish the researcher who has withdrawn from the rest of the researchers for the same paper?

mrittman · July 14, 2022, 9:22am

Yes, it’s very complex, and preprint servers usually don’t have the resources to investigate to the same extent as a journal publisher would. Some kind of broad taxonomy of reasons for withdrawal is an idea that could work, and the origin of the withdrawal could be a part of it. Publishers usually note whether the authors (and sometimes which authors) agreed with a retraction. Formalizing that could be helpful. Getting it widely adopted would a whole other problem.

zarnigor2006 · July 14, 2022, 9:33am

I don’t understand why or why you are sending this message. Please explain. I work at a research institute. I didn’t understand the information you wrote about who to add to the article so that I can publish the article. Sincerely, Khusniddin.

zarnigor2006 · July 14, 2022, 9:38am

I think the reason for such messages is my recent article. The person who worked with me and wanted to publish an article in the scopus journal did not give a clear answer to my question about which journal you want to publish the article in. That’s why I refused to publish the article

mrittman · July 14, 2022, 10:22am

@zarnigor2006 This is a general discussion on a public forum. You probably received the message because of your notification settings, there is nothing specific to your research. If you have questions about the forum, see here.

mccurley · July 14, 2022, 5:13pm

Most of my experience has been with eprint iacr org and arXiv. I was reluctant to define a whole taxonomy of reasons, because I think a text field is best suited to the nuance of describing it. In most of the cases I have seen, it’s because there was a flaw in mathematical reasoning, and that is best described in text. In other fields it might be experimental failure or realization that a model is not descriptive enough. There are probably reasons of plagiarism or outright exaggeration or slander or every other kind of human misbehavior. I was mostly focused on who made the decision to withdraw. I don’t see a need to identify the individual, since for example it may have been a vote among the editors or the board or an ethics committee (IACR has had both in the 25 years of eprint).

One failure mode I had not anticipated is when an author wishes to withdraw their name from a multi-authored paper, or an affiliation wishes to disclaim the work. I assume this is properly classified as a revision to the paper rather than a withdrawal, but it might not be strong enough given that the previous versions still have the author or affiliation name on them. It’s more like a half-withdrawal. Is that within the scope of the recommendations?

mrittman · July 15, 2022, 1:58pm

One author removing permission is a really interesting case, and frustratingly common in preprints in my experience when not all authors are consulted about posting a preprint. Formally, the paper should be withdrawn but there might be legitimate reasons to just remove an author (if they were added by mistake) and every case has its own nuances. With regards to how to capture this in the metadata, a text explanation would be helpful, and obviously an update to the author list.

I seem to recall that when we discussed terms for withdrawal and removal in the AG there was one preprint server which used a different term internally for author-initiated removals. These are much more straightforward to handle and perhaps it is useful to distinguish them. My only counter-argument would be that if there’s a legitimate reason to withdraw, it isn’t affected by who initiated the withdrawal request. On the other hand there’s no harm in having more information about the process. Definitely a useful topic to explore, thanks for bringing it up.

MarioMalicki · August 12, 2022, 4:30pm

Suggestion 1: Please consider adding a new relationship relation - that would describe when a preprint (version) is the same as manuscript version that was submitted to a journal/conference proceeding. Preprints can be posted in various times to a server, and sometimes they do not equate the version submitted to a journal - but for those that are - it is important to be able to identify the version that was submitted to a journal - to better understand the role of peer review and changes that occur due to peer review and journal edits. With some journals or preprint servers automatically allowing submission of a preprint to a journal and vice a versa this should not be difficult - and might prove invaluable in the studies of role of peer review and scholarly communication.
Suggestion 1: Please consider introducing metadata fields that capture description of changes that occurred from the previous version - see more on that here www.scholcommlab.ca/2020/04/08/preprint-recommendations/. Some servers or F1000 have different formats how they capture this - but this information could be an important part of the meta-data.

Kind regards,
Mario Malicki

mccurley · August 12, 2022, 5:22pm

In my experience, the preprint version can vary from the “official published version” in several ways:

the preprint might be an early version
the preprint might be essentially the same except for the footnote with a link to the official version
the preprint might be identical to the official version
the preprint might be an expanded version. This happens for us sometimes if the official version has a page limit, in which case the preprint version ends up containing full proofs for mathematical results that are omitted in the “official version”.
the preprint might be an updated version posted after the official version, with errata to the official version.

Note that what actually gets posted may not be in compliance with what the publisher of the official version wants. We don’t police that. Maybe I am overlooking other ways in which the preprint could differ from the official version.

mrittman · August 24, 2022, 9:32am

Mario, thank you for the suggestion. We have in our schema a relation of ‘is-identical-to’ that we use in some specific cases. An issue is that identical could mean subtly different things: exactly the same document, the same content reformatted (e.g. PDF vs XML vs Word manuscript), or some minor typographical changes that wouldn’t need a version change.

I think the suggestion of a field to describe changes between versions is an excellent one and something that we’re considering to add as metadata.

Another question is who determines whether something is identical? I don’t think platforms would take the time to see if something (such as a conference paper) is exactly the same or simply similar to a previous work (such as a preprint). If they do, then they could use the is-identical-to relationship. The best pragmatic solution is to have links to the full text so that downstream users and services could make a comparison.

MarioMalicki · August 24, 2022, 4:50pm

Thank you for considering the suggestions and working on them.

Just to clarify, is identical to - is linkage to another document, what I suggested is that there is a filed which would specify this is THE Version submitted to the journal, not a connection it to any other document/paper/proceeding. For example, if you look at Springer Natures - In Review platform for my journal (link below for an example), the preprint can be the exact manuscript version that was submitted to my journal, i.e. authors opted that that version is also shared as a preprint during journal submission time - and instead of “just” classifying these as preprints, I hope you will have a field that will show this is the “journal submitted version”. So the meta fields could be either:
venue-submitted: Yes.
and then you would have details for the venue:
e.g. journal-title: Name
date-submitted, and so on.
And then as we have published review reports for them, you could link those reviews to the preprint itself. Perhaps that is another option - through which researchers could identify it being the submitted version - if the reviews come from a journal - but a field would be helpful to be able to filter based on that field, rather than on reviews - as some journals might share the information on the submitted version but not shared reviews.

Example:
Manuscript version - preprint- posted on In-review during journal submission process: researchsquare /article/rs-582546/v1
Note: On in review - you can see the whole review process timeline in the right box of the page

Manuscript published (VoR): esearchintegrityjournal.biomedcentral /articles/10.1186/s41073-021-00116-4

mrittman · August 25, 2022, 9:58am

@MarioMalicki Thanks for the clarification, this is a bit different from what I thought before. It seems like a good topic to discuss with our preprint advisory group. I have questions about how it would be implemented, and who would be responsible for updating the metadata when the state changes in the editorial process. On the other hand, I think there would be an interest in having this data publicly available via metadata.

Eric · August 25, 2022, 12:55pm

I am a bit late to this thread, and this may have been mentioned before, but NISO has recently been updating their own standards for pre-prints and document versions that might be beneficial to this discussion.

Working group info is available at https://www.niso.org/standards-committees/journal-article-versions

mccurley · August 26, 2022, 7:30pm

The link to Sally Morris’ paper is broken on that niso page. One of the principles of a preprint server is to have permanent URIs.

mrittman · August 29, 2022, 6:31am

@Eric Thanks for noting this, I’m not aware of any recent (or forthcoming) output from this group, but it would be interesting to know. The context around preprints has changed significantly since the group was formed.

@mccurley The paper cited on the NISO page seems to be https://0-doi-org.pugwash.lib.warwick.ac.uk/10.1087/095315103322110941 (PDF available from Wiley), but it’s hard to tell without a more specific reference.

vincent · August 29, 2022, 5:55pm

The NISO JAV working group is actively working on revising and updating JAV. Some information about this working group is at https://www.niso.org/standards-committees/jav-revision. There two representatives from Crossref involved (I’m also a member of this working group). Preprints are among the versioning scenarios that have been brought up for discussion in the JAV working group.

This recommendation for preprint metadata also addresses withdrawals and removals for preprints. This may relate to NISO CORREC (https://www.niso.org/standards-committees/correc), which is a new recommended practice that is being developed by an active working group.

mrittman · August 31, 2022, 1:35pm

Good to hear that there is ongoing work on JAV. There are several people on both that committee and our preprints advisory group (including my colleague Patricia) so it’s more than like that there’s been a crossover of ideas. CORREC is also one we’ll be keeping a close eye on.

ISC_SG_SciPub · September 30, 2022, 1:03pm

Response by the International Science Council’s Steering Group on the Future of Scientific Publishing

The International Science Council’s Open Science and Future of Scientific Publishing Project has been advocating the increased use of preprints as a recognized and effective route for the dissemination of research findings and for these to become an integral part of the record of science. (e.g.https://council.science/publications/normalization-preprints/)
For such a system to be effectively implemented requires appropriate metadata targeted to facilitating functionality. We support and welcome the changes being proposed, specifically in:

Recognizing preprints as a mainstream publication type in their own right
Improving two-way connections between different versions of preprints
Allowing for withdrawal/correction of a preprint
Improving connectivity between reviews of a specific pre-prints with the preprint itself
Improving relationships between the preprint and associated data/programs etc

Our primary concern with the approaches proposed is in not sufficiently allowing for interoperability between different platforms and DOI registration entities. Allowing integration and discoverability of the scholarly content published globally is of fundamental importance to the ISC. It is a vital to be able to integrate research from different geographic regions. Whilst your report notes that enabling such interoperability is difficult and responsibility for undertaking this is hard to assign, any possibility for successful interoperability requires metadata which recognises alternative systems - and we believe this should be a particularly important consideration for the metadata schema most commonly adopted within the western scholarly publishing community. We believe that the evolution of common standards that are governed by the global scientific community are essential to the future of scientific publishing.
In many cases this will be achieved via the various relationship entities within the CrossRef schema - but presently the scope for inclusion of alternative platforms within these tags is limited and not always fully functional. This, for example, has recently been highlighted by the difficulties faced in relating a review to an article posted on arXiv: https://crossref.atlassian.net/browse/CR-777.
We would encourage relationship entities to include as diverse a range of alternative platforms and identifiers as possible, to enable the eventual development of effective and inclusive cross-platform discovery processes either by CrossRef or other entities.

Topic		Replies	Views
Discovering relationships between preprints and journal articles - Crossref Content Registration content-registration , preprints	1	279	December 8, 2023
Substack & Science Technical Support content-registration , content_type_advice , new-members	22	841	November 17, 2023
Double trouble with DOIs Content Registration metadata , blog , crossref-labs , metadata-quality	7	1643	April 22, 2022
Our attempts at automating journal subject classification News and current events	0	536	May 23, 2023
Thoughts on citations, addressability, and trust in scholarly communications by Martin Eve Engineering crossref-labs , labs , citation , citations	1	384	September 27, 2023

Share your thoughts on preprint metadata

Related Topics