A key learning from our initiative is that the community needs a clear understanding of data reuse to monitor impact, inform future funding, and improve the dissemination of research. While promoting best practice with publishers is essential, the development of a trusted central aggregate of all references to research data across articles, preprints, government documents, and other outputs will help achieve our goal of building responsible, meaningful data metrics.

In 2023, The Wellcome Trust awarded funds to build the Open Global Data Citation Corpus to dramatically transform the data citation landscape. Through this award, DataCite has partnered with Chan Zuckerberg Initiative, EMBL-EBI, and other organizations that scrape and assert data citations. The corpus will store asserted data citations from a diverse set of sources and can be used by any community stakeholder. This open CC0 corpus of data citations expands the scope beyond DataCite and Crossref metadata and includes both DOI and non-DOI (e.g., accession ID) data. Many datasets are only mentioned in an unstructured format in research articles.

Publisher Data Citation Resources

Publisher Standards

Repository Standards