DataCite launches first release of the Data Citation Corpus

First-of-its-kind aggregation brings together millions of data citations to advance understanding of data usage

https://doi.org/10.60804/r14z-mw10

DataCite, in partnership with the Chan Zuckerberg Initiative (CZI), is delighted to announce the first release of the Data Citation Corpus. A major milestone in the Make Data Count initiative, the release makes eight million data citations openly available and usable for the first time via an interactive dashboard and public data file. We invite the community to engage with the data and provide feedback on this collaborative effort.

As highlighted by Make Data Count, the lack of a centralized resource for citations to datasets has hindered the evaluation of how open data is being used. To address this gap, DataCite, with funding from the Wellcome Trust, has developed an innovative aggregation that brings together for the first time data citations from diverse sources into a comprehensive and publicly accessible resource for the global community.

“There is a pressing need to understand how open data is used, but we have lacked a resource to access this information in a centralized and open manner. The Data Citation Corpus will allow the community to gain access to critical insights on data usage.” said Iratxe Puebla, Director of Make Data Count. “We are thrilled to share the progress from our collaboration with CZI to bring together citations from different sources, and look forward to working with others in the community to expand the breadth and coverage of the corpus.”

The first release of the corpus includes data citations in DataCite and Crossref metadata as well as asserted data citations contributed by CZI, available to the community via a data citation store and dashboard developed by Coko. Leveraging accession numbers from Europe PMC, CZI applied a machine-learning model to a large set of full-text articles and preprints to extract mentions to datasets. This has enabled the first-ever aggregation of citations for datasets with DOIs and accession numbers into a single corpus, enabling a more complete picture of data usage.

“As an organization that invests in research data and reference datasets, we believe it is critical to understand how data is shared and reused to enable new scientific discoveries,” said Patricia Brennan, Vice President of Science Technology at the Chan Zuckerberg Initiative. “DataCite has been a leader in this space, providing critical infrastructure for data citation and for tracking its reuse. We’re proud to support them in their vision to build a comprehensive global corpus of actionable data citations.”

The interactive dashboard of the corpus allows users to visualize and report on citations by a variety of facets, such as funder, data repository, or the journal where the article citing the data is published.

A complete data file of all of the citations is also available for additional analysis and evaluation. Request the data file via this form.

Forthcoming releases will focus on addressing existing metadata gaps, for example, related to the disciplinary information for the datasets, and on incorporating feedback from early adopters. DataCite will also pursue new collaborations with additional citation aggregators to expand the breadth and scale of data citations in the corpus. 

Community input is an integral part of this project and DataCite invites researchers, institutions, funders and infrastructure providers to provide feedback on the first release of the corpus and future development work. Please join us for an online webinar on February 22 to learn more about the first release of the corpus and how to use it. Register now to participate in this interactive session.

About DataCite

DataCite is a global community that shares a common interest: to ensure that research outputs and resources are openly available and connected so that their reuse can advance knowledge across and between disciplines, now and in the future. 

About Chan Zuckerberg Initiative

The Chan Zuckerberg Initiative was founded in 2015 to help solve some of society’s toughest challenges — from eradicating disease and improving education, to addressing the needs of our communities. Through collaboration, providing resources and building technology, our mission is to help build a more inclusive, just and healthy future for everyone.