Blog

DataONE Implements New Usage and Citation Metrics to Make Your Data Count

Crossposted from DataONE blog: https://www.dataone.org/news/new-usage-metrics 

Publications have long maintained a citation standard for research papers, ensuring credit for cited work and ideas. Tracking use of data collections, however, has remained a challenge. DataONE is pleased to share our latest effort in overcoming this barrier and in demonstrating data reuse with new data usage and citation metrics.

The data usage and citation metrics available on our data search and discovery platform, https://search.dataone.org, include live counts of citations, downloads, and views for each dataset in our network of Earth and environmental repositories. These metrics are interactive and can be further explored to identify articles citing the data or, in the case of downloads and views, scaled over specific time-increments to dive into details of when the data was accessed. The implementation of these data usage and citation metrics results from our collaboration as part of the Make Data Count project and compliments metrics recently implemented in Dash. The metrics are also in compliance with the COUNTER Code of Practice for Research Data.

The usage counts are acquired from views and downloads occurring directly through the DataONE interface, via activity at the individual repository or through clients, such as an R script. Additionally, researchers might cite a dataset having received it directly from a colleague. For these reasons, values for citations, downloads and views do not always covary.

We encourage you to explore these new metrics on https://search.dataone.org. Click on each metric and pull up the interactive figures to visualize downloads and views across time for each dataset, or the full lists of citation information for research papers citing each dataset. You can also check out the DataONE documentation for further details on the metrics implementation.

Make Data Count & Scholix Join FORCE(2018)s

With Make Data Count now in its second year, the focus is shifting from building infrastructure to driving adoption of our open data-level metrics infrastructure. As described in previous blog posts, we built and released infrastructure for data-level metrics (views, downloads, citations). While we developed a new COUNTER endorsed Code of Practice for Research Data and new processing pipelines for data usage metrics, we are using the outputs of the Scholix working group to count data citations. The most important next step? Get people, repositories, and publishers to use it. We teamed up with Scholix at FORCE2018 to explain to publishers and data repositories how they can implement and support research data metrics.

Scholix: it’s not a thing 

Adrian Burton (ARDC), co-chair of the Scholix working group, started the session with a very important message: Scholix is not a thing, nobody is building it, and it’s not a piece of infrastructure. Scholix is an information model that was developed to allow organizations to exchange information. In practice, this now means that, through Crossref, DataCite, and OpenAire, you can exchange information about article-data links. These links form the basis for data citations that can be obtained by querying the APIs made available by these organizations. However, think not what Scholix can do for you, but what you can do for Scholix. The system is only useful if organizations also contribute information about article-data links.

Send citations today! 

Following Adrian’s talk, Patricia Feeney (Crossref) and Helena Cousijn (DataCite) discussed with publishers and data repositories how they can add information about citations to their metadata and thereby make these openly available to others. The discussion revealed that several data repositories already do a lot of work to make this happen. They hire curators for manual curation and text-mine articles to ensure all links to datasets are discovered and made available. When they deposit their metadata with DataCite, they add information about the related identifier, the type of related identifier, and indicate what the relationship is between the dataset and article. This ensures publishers also have access to this information and can display links between articles and datasets.

The publishers often depend on authors to add information about underlying datasets to the manuscript or in the submission system. When authors don’t do this, they’re not aware of any links. Most of the publishers indicated they started working on data citation but are still optimizing their workflows. Better tagging of data citations, both in their own XML and in the XML that’s sent to Crossref, is part of that. Patricia explained that Crossref accepts data citations that are submitted as relations and also DataCite DOIs in the reference lists. This gives publishers two ways to make their data citations available.

The most important message? You can use your current (metadata) workflows to contribute article-data links. Both Crossref and DataCite are very happy to work with any organization that needs assistance in implementation or optimization of data citation workflows.

Make Data (Usage Metrics) Count

Following the discussion on why data citations are essential, and how publishers can contribute data citations to our infrastructure, we moved on to data usage metrics. Daniella Lowenberg (CDL), project lead for Make Data Count, explained how repositories can standardize their usage metrics (views, downloads) against the COUNTER Code of Practice for Research Data, contribute these usage logs to an open DataCite hub, and start displaying standardized usage metrics at the repository level. Repositories, check out our documentation here, and get in touch with Daniella if you would like to begin talking about implementation!

Event Data: for all your reuse questions

Closing out the workshop, Martin Fenner (DataCite) finished the session with an explanation of how you can consume data metrics. The links that are contributed following the Scholix framework are openly available and can therefore be used by all interested organizations. You can get these through Event Data, a service developed by Crossref and DataCite to capture references, mentions, and other events around DOIs that are not provided via DOI metadata.The Crossref and DataCite APIs support queries by DOI, DOI prefix, date range, relation type, and source. If you want to extract both usage statistics and data citations, you can obtain these through the DataCite API.  For more information, take a look at the documentation!

What does it all look like?

DataONE: Example from a Dryad dataset, all data citations displayed

Dash: Example view of standardized usage metrics & citations

COUNTER Code of Practice for Research Data Usage Metrics Release 1

Crossposted from COUNTER on September 13, 2018

There is a need for the consistent and credible reporting of research data usage. Such usage metrics are required as an important component in understanding how publicly available research data are being reused.

To address this need, COUNTER and members of the Make Data Count team (California Digital LibraryDataCite, and DataONE) collaborated in drafting the Code of Practice for Research Data Usage Metrics release 1.

The Code of Practice for Research Data Usage Metrics release 1 is aligned as much as possible with the COUNTER Code of Practice Release 5 which standardizes usage metrics for many scholarly resources, including journals and books. Many definitions, processing rules and reporting recommendations apply to research data in the same way as they apply to the other resources to which the COUNTER Code of Practice applies. Some aspects of the processing and reporting of usage data are unique to research data, and the Code of Practice for Research Data Usage Metrics thus deviates from the Code of Practice Release 5 and specifically address them.

The Code of Practice for Research Data Usage Metrics release 1 provides a framework for comparable data by standardizing the generation and distribution of usage metrics for research data. Data repositories and platform providers can now report usage metrics following common best practices and using a standard report format.

COUNTER welcomes feedback from the data repositories that implement this first release of the Code of Practice. Their experiences will help to refine and improve it and inform a second release.

Make Data Count Summer 2018 Update

It’s been two exciting months since we released the first iteration of our data-level-metrics infrastructure.  We are energized by the interest garnered and questions we’ve received and we wanted to share a couple of highlights!

Screen Shot 2018-08-06 at 10.40.02 AM

July Webinar

Soon after launch we hosted a webinar on “How-To” make your data count. Thank you to the 100 attendees that joined us for asking such thoughtful questions. For those that could not make it, or those that would like a recap, we have made all resources available on “Resources” tab of the website. Check out the July 10th webinar recording, webinar slide deck, and a transcript of the Q&A session.

If you still have questions, we encourage you to get in touch with us directly so that we can set up a group call with our team and yours. We have found our meetings with repositories and institutions to talk through the code of practice and log processing steps have been very helpful.

Zenodo Implemented the Code of Practice

A big congratulations and a thank you goes out to the Zenodo team for their implementation of standardized data usage metrics. Our project is only successful if we have as many repositories as possible standardize their data usage metrics so that we can truly have a comparable data metrics landscape. Zenodo is a global, popular, repository that was able to follow the Code of Practice for Research Data that we authored and standardize and display their views and downloads. We are looking forward to Zenodo displaying citations and contributing their usage metrics to our open-hub.

In-Person Team Meeting

Last week members from the DataCite, DataONE, and CDL teams were able to meet for a full day of planning the next quarter of the project. Prioritizing by project component, we were able to agree on where we would like to be by RDA Botswana. In broad terms – we would like to have citations integrated into the DataCite open hub (instead of as a separate entity in Event Data), we plan to gather user feedback on valued metrics, and we would like to spend time analyzing the citation landscape and the reasons why citations not making it to the hub. Follow along at our Github here.

IMG-0173

Our biggest goal is to get as many repositories as possible to make their data count. But beyond repositories, there is a role for all of us here:

Repositories:

  • If you are on a home grown platform, follow along our How-To guide. Let us know if you are implementing, and share with us your experience. The more that we can publicize repositories experiences and resources, the easier it will be for the community to adopt.
  • If you are a part of a larger platform community (fedora, dataverse, bepress), help us advocate for implementation!
  • Send your data citations through DataCite metadata. DataCite collects citation metadata as part of the DOI registration process. Enrich your metadata with links between literature (related resources) and data using the relatedIdentifier property.

Publishers:

  • Index your data citations with Crossref. When we first implemented MDC at our repositories we noticed that some known data citations were not appearing, and when looking in the Crossref API found that even when researchers added data citations they were in some cases stripped in the XML. When depositing article metadata, please ensure data citations are included as references (in the case of DataCite DOIs) or as relationships (in the case of other PIDs).

Funders, Librarians, and other Scholarly Communications Stakeholders:

  • Help us advocate for the implementation of data level metrics! Catch us at 5AM Conference, ICSTI, FORCE2018, or at RDA Botswana/International Data Week to learn more about our project and better equip yourself as an advocate.

Follow us on Twitter, join our Newsletter, or contact us directly here.

It’s Time to Make Your Data Count!

photo-1520246819288-8bcefb7ac966.jpeg

One year into our Sloan funded Make Data Count project, we are proud to release Version 1 of standardized data usage and citation metrics!

As a community that values research data it is important for us to have a standard and fair way to compare metrics for data sharing. We know of and are involved in a variety of initiatives around data citation infrastructure and best practices; including Scholix, Crossref and DataCite Event Data. But, data usage metrics are tricky and before now there had not been a group focused on processes for evaluating and standardizing data usage. Last June, members from the MDC team and COUNTER began talking through what a recommended standard could look like for research data.

Since the development of our COUNTER Code of Practice for Research Data we have implemented comparable, standardized data usage and citation metrics at Dash (CDL) and DataONE*, two project team repositories.

Screen Shot 2018-06-01 at 9.43.57 AM

Screen Shot 2018-06-05 at 6.33.12 AM
*DataONE UI coming soon

The repository pages above show how we Make Data Count:

  • Views and Downloads: Internal logs are processed against the Code of Practice and send standard formatted usage logs to a DataCite hub for public use and eventually, aggregation.
  • Citations: Citation information is pulled from Crossref Event Data.

The Make Data Count project team works in an agile “minimum viable product” methodology. This first release has focused on developing a standard recommendation, processing our logs against that Code of Practice to develop comparable data usage metrics, and display of both usage and citation metrics at the repository level. We know from work done in the prototype NSF funded Making Data Count project that the community value additional metrics. Hence future versions will include features such as:

  • details about where the data are being accessed
  • volume of data being accessed
  • citation details
  • social media activity

We just released our first iteration of data-level metrics infrastructure, what next?

1) Get Repositories Involved

For this project to be effective and for us to compare and utilize data-level metrics we need as many repositories as possible to join the effort. This is an open call for every repository with research data to Make Data Count. A couple of important resources to do so:

  • Check out our How-To Guide as described by the California Digital Library implementation of Make Data Count. Tips and tools (e.g. a Python Log Processor) are detailed in this guide and available on our public Github. Links in this guide also point to the DataCite documentation necessary for implementation.
  • Join our project team for a webinar on how to implement Make Data Count at your repository and learn more about the project on Tuesday, July 10th at 8am PST/11am EST. Webinar link: http://bit.ly/2xJEA4n.

2) Build Advocacy for Data-Level Metrics

Publishers:

When implementing this infrastructure in our repositories we became aware of how few publishers are indexing data citations properly. Very few datasets are correctly receiving citation credit in articles. If you are a publisher or are interested in advocating for proper data citation practices, check out the Scholix initiative and our brief guide here as well as DataCite’s recent blog on the current state of data citations.

Researchers & the research stakeholder community:

For the academic research community to value research data we need to talk about data-level metrics. This is a call out to researchers to utilize data-level metrics as they would with articles, and for academic governance to value these metrics as they do with articles.

With the first version of our data-level-metrics infrastructure released, we are excited to work as a community to further drive adoption of data metrics. For further updates, follow our twitter @makedatacount.

Publishers: Make Your Data Citations Count!

Many publishers have implemented open data policies and have publicly declared  their support of data as a valuable component of the research process. But to give credit to researchers and incentivize behavior for data publishing, the community needs to promote proper citation of data. Many publishers have also endorsed the FORCE Data Citation Principles, Scholix, and other data citation initiatives, but still we have not seen implementation or benefits of proper data citation indexing at the journal level. Make Data Count provides incentives and aims to show researchers the value of their research data by displaying data usage and citation metrics. However, to be able to expose citations, publishers need to promote and index data citations with Crossref so that repositories utilizing the Make Data Count infrastructure can pull citations, evaluate use patterns, and display them publicly.

So, how as a publisher, can you support open research data and incentivize researchers to think about data like articles?

  1. Implement policies that advise researchers to deposit data to a stable repository that gives a persistent, citable identifier for the dataset
  2. Guide researchers to cite their own data or other data related to their article in their references list
  3. Acknowledge data citations in the article, data availability statement, and/or reference list, tag it as a data citation, and send this in XML to Crossref via the references list or in the relationships type. Crossref has put together a simple guide here.

Make Data Count Update: Spring, 2018

The Make Data Count team is rapidly approaching the first release of standardized and comparable data level metrics (DLMs) on California Digital Library’s Dash and DataONE repositories. Resources on this release will be available shortly, but in the meantime the team would like to share updates on work completed in winter and our spring roadmap.

Berlin, March 2018

Before the Research Data Alliance (RDA) 11 Plenary in Berlin, MDC team members met for a day to map the work towards reaching our minimum viable product (MVP) by May. The focus of Fall 2017 – Winter 2018 was releasing a recommendation for counting data usage metrics. Now that this standard has been released, the team is utilizing it as guidance for processing logs at the repository level and sending these reports to a centralized open hub (at DataCite for access and aggregation).

Discussions for log processing, hub aggregation, and display at the repository level centered around the interactions between and roles of a repository, DataCite hub, and CrossRef Event Data (architecture map to be released this summer). The team also discussed how to tackle publication date (when repositories allow for delayed publication for peer review reasons), how Scholix and Event Data work together, how how citations will be pulled, and what resources should be produced for the community to implement MDC at their own repositories. For more information please check out our public Github.

Our goal is to release the first iteration of DLMs in May. With this release in Dash and DataONE we will also be providing:

  • A How-To Guide for repositories
  • Webinar (recorded) for repository implementation
  • Log Processor (in Python) for repositories that would like to utilize this built tool
  • Explanatory architecture diagram of the push and pulls from repositories to the DataCite hub

We have been collecting a list of repositories that have expressed interest in processing and displaying comparable DLMs, but if you have not yet been in touch with us please do contact us.

While at RDA, members from the MDC team presented at the Scholix (Kristian Garza and Martin Fenner pictured below) and Data Usage Metrics Working Group sessions.

The Data Usage Metrics WG was recently formed to engage the community in discussions around needs and priorities for usage metrics (and not just following the path of article level metrics). At this first session Make Data Count was presented by John Chodacki (CDL), Kristian Garza (DataCite), and Dave Vieglais (DataONE) pictured below. Wouter Haak presented ongoing initiatives at Elsevier and members from the audience shared out their work in this space.

Full notes and a recording of this session are available. While the focus of the group is broader than the Make Data Count use case, we encourage anyone interested in this space to join the working group.

April, 2018 and Roadmap Forward

Beyond RDA, members from Make Data Count were invited to the AAMC/NEJM/Harvard MRCT meeting on “Credit for Data Sharing” where MDC was briefly presented as an example of infrastructure for credit. Throughout Spring and Summer, MDC plans to direct outreach at repositories on how and why to get involved, and how and why publishers should index data citations. For repositories to be able to display citations, they need to be indexed by publishers with Crossref. A priority for the MDC project is to elevate the number of publishers doing this. To do so we will be releasing a series of resources for repositories and publishers virtually and in person at conferences.

Catch us this summer at:

Stay tuned for more updates, webinar dates, and resources!

Code of practice for research data usage metrics release 1

Kicking off Love Data Week 2018, the Make Data Count (MDC) team is pleased to announce that the first iteration of our Code of Practice for Research Data Usage Metrics Release 1 has now been posted as a preprint.

Beginning in June, members from the MDC team and COUNTER began conversations around what a standard for data usage metrics may look like. By September we were able to release an initial draft outline for community feedback. Comments from the community and further drafting spurred discussions around how data are different than articles and where the code of practice for data needed to deviate from the COUNTER Code of Practice Release 5.

Our first release has been posted as a preprint so that we can continue to receive community feedback and input. This Code of Practice will act as the framework for the MDC project goals of having comparable data usage metrics across the repository landscape. As we begin to implement this standard in our own (CDL and DataONE) data repositories we will be adapting the Code of Practice based on our experiences. We hope that repositories interested in being early adopters of displaying standardized data level metrics in accordance with this recommendation will also contribute to future iterations of the code of practice.

We look forward to utilizing this first release as a starting point for community discussion around data level metrics, and urge anyone interested to get in touch with us or join our RDA Data Usage Metrics Working Group.

Make Data Count Winter 2018 Update

For the past few months, we have worked to garner interest and facilitate discussion about data usage metrics within the community. Internally, we are working to drive development toward comparable, standardized data usage metrics and data citations on repository interfaces. We are excited to share our progress and we want to thank those who have given us feedback or have gotten involved along the way!

December, 2017

Early in December we had a series of webinars for the DataONE and NISO communities. Although the recordings are not available, the discussions between the MDC team and institutions and repository communities were engaging and productive. Thank you to those who joined us. Slides for these webinars can be found here.

During the week of webinars we also had DataONE’s Matt Jones at AGU talking on the  “Receiving Credit Where It’s Due” panel about the MDC project.

Matt Jones (DataONE) presenting at AGU, 2017

January, 2018

In the new year we gathered an Advisory Group of community members that have expertise in driving adoption of open data, open source, and open access initiatives. Our first Advisory Group meeting proved to be energizing for our MDC team as we brainstormed our best path towards mass adoption, various initiatives we should be working in conjunction with, and projects that could expand on the MDC work.

We are also pleased to report that we have launched an RDA Working Group “Data Usage Metrics” led by CDL’s Daniella Lowenberg, DataONE’s Dave Vieglais, and Scopus Product Manager Eleonora Presani. We will be at the RDA11 Berlin Meeting and would love for you to join the group and help us spread the word. The focus will be research data usage metrics implementation, adoption strategies, and future metrics to be considered.

Closing out January, members from MDC met at PIDapalooza and had a meeting focused on mapping implementation of log processing in the CDL Dash Data Publication platform and DataONE repositories.

At the conference we did a brief presentation on the progress of Make Data Count and spent much of our session time having a discussion about what constitutes “usage metrics”. Questions arose around how stakeholders may not differ in how they would benefit from data usage metrics (i.e. institutions versus funders), how to understand impact from usage metrics, and how citations are indicators of usage.

Martin Fenner (DataCite), Daniella Lowenberg (CDL), and Trisha Cruse (DataCite) presenting at PIDapalooza

What’s in store for the rest of winter?

Instead of hibernating, we have one major priority: implementation in the Dash and DataONE repositories. Coordinating efforts between the building of an open and public hub (hosted at DataCite) and implementation in the repositories, we are documenting our questions, answers, and experiences to develop a “how-to” guide for the repository community. We are continually looking for early adopter repositories that would like to log process and display standardized data usage metrics and citations. Please get in touch with us if your repository would like to be a part of this. To follow on our implementation work, check out our public Github.

And lastly, we have been working to formalize our recommendation for research data usage metrics. Stay tuned next week for the release of our COUNTER Code of Practice for Research Data preprint.

Make Data Count Update: November, 2017

The Make Data Count (MDC) project is moving ahead with full force and the team wanted to take a moment to update the research stakeholder community on our project resources and roadmap.

In September, the MDC team sat down and mapped out the project plan for our two-year grant. Working in an agile method, we defined a “minimum viable product” (mvp) that comprises a full ecosystem of data usage and citation metrics flowing in and out of the technical hub and displayed on the DataONE repositories, Dash (California Digital Library Data Publishing Platform), and DataCite by summer of 2018.

Screen Shot 2017-11-08 at 11.30.12 AM

This fall the MDC team also spent time traveling to several conferences to gather early adopters and gauge interest in data usage metrics. Many energetic and thoughtful discussions occurred regarding what the MDC-envisioned full ecosystem of data usage metrics will look like and how various stakeholders can contribute. The main takeaway: there is a need for a comprehensive and standardized way to count and display data level metrics.

MDC

Coming up, representatives from the MDC team will be (and hope you can join us) at:

So, what is MDC working on outside of these presentations?

All of the MDC project work can be tracked on Github, and we encourage you to follow along.

MDC_Roadmap

  • MDC and COUNTER are gathering community feedback from the COUNTER Code of Practice for Research Data Draft and turning this outline into a full narrative to be posted as a preprint in December.
  • DataCite is working to build out a Data Level Metrics Hub that will ingest data citations and data usage metrics, use the COUNTER recommendation as a standard to log crunch, and push out standardized usage metrics for display on repository interfaces.
  • Our first repositories, listed above, will be working to log process usage metrics against the COUNTER recommendation and technical hub for implementation.
  • Designs for displayed data metrics on repository interfaces will be created and tested.
  • Conversations with any groups that may want to be involved will continue- the more community feedback & support the better!

How can you help?

Everyone: We put out a COUNTER Code of Practice for Data Usage Draft and would appreciate community feedback. As stated above, this recommendation is what the usage metrics ecosystem will be standardized against. We also need help with mass outreach about our project, so please help us spread the word!

Repositories: We are collecting the names of those who would be interested in log file crunching against our COUNTER recommendation and hub and be early adopters of data level metrics; please get in touch if your repository supplies DOIs and would be interested.

Publishers: Support data citations! The data citation information is coming from CrossRef Event Data and DataCite, and the more that publishers support data citations in article publication, the more data can be fed into our hub.

Researchers: We want to give you credit for your research data. We are always looking for beta testers of our system and would appreciate your input. Please get in touch if you or your labs are interested in getting involved.

Join our mailing list & follow us on Twitter (@MakeDataCount)