Blog

Open Data Metrics: Lighting the Fire

The Make Data Count team has been working on various infrastructure and outreach projects focused on how to measure the reach and impact of research data. While busy driving adoption of these frameworks and services, we have yet to discuss where we’re at in terms of high-level challenges and where we believe we need to go to.

To clarify to the community what our opinions and approaches are in terms of open data metrics, members from the Make Data Count team (Daniella Lowenberg, John Chodacki, Martin Fenner, Matt Jones) sat down and wrote a book that we hope will jump start a community conversation. We would love to hear your feedback and look forward to engaging with you on the topic.

MakeDataCount 8

http://opendatametrics.org 

Research data is at the center of science, and to date it has been difficult to understand its impact. To assess the reach of open data, and to advance data-driven discovery, the research and research supporting communities need open, trusted data metrics.

In Open Data Metrics: Lighting the Fire, the authors propose a path forward for the development of data metrics. They acknowledge historic players and milestones in the process and demonstrate the need for standardized, transparent, community-led approaches to establish open data metrics as the new normal.

Save The Date: MDC Spring Webinar

Following advice from our workshop attendees at RDA13, we invite you to join us for our spring webinar.

Join us on May 8th at 8am PST/3pm GMT as we demo our new aggregation services at DataCite and DataONE. This webinar is intended to spotlight the features and services we can build off of our central infrastructure such as aggregated usage and citations. This webinar will be recorded and posted on our website.

Webinar Registration: bit.ly/MDCSpringWebinar 

 

 

Repository Implementation Webinar: March 26, 2019

 

Screen Shot 2019-02-28 at 9.03.05 AM

 

Join us on March 26th at 8:00am PST/4:00pm GMT for a webinar on repository implementation of our COUNTER Code of Practice for Research Data and Make Data Count recommendations. This webinar will feature a panel of developers from repositories that have implemented or about to release standardized data metrics: Dataverse, Dryad, and Zenodo. We will interview each repository on their implementation process. This recorded discussion between our technical team and repositories, providing various perspectives of implementation, should be a helpful guidance for any repository interested in implementing!

To register for our webinar, please fill out this form.

For those who cannot make it, a recording will be made available on our website. Please tweet to us any questions that you may want asked.

Save the Date: Make Data Count Pre-RDA13 Workshop

When: April 1st, 10:00am-12:00pm

Where: Loews Hotel, Philadelphia. Room: Congress C

Why: As we begin to wrap up our two-year grant, it is essential that we bring in the community to learn about our data metrics infrastructure and understand community feedback on adoption. In this workshop, we’ll be demonstrating how repositories and publishers can contribute data usage statistics and citations. Several repository implementers will be present to share their experiences and explain best practices. In addition, we will show how anyone interested can consume these open metrics. Adoption of comparable usage statistics and data citations is a critical first step towards usable research data metrics, so we urge all data providers and data stakeholders to come join our interactive session!

Questions? Tweet at @makedatacount or email us here

DataONE Implements New Usage and Citation Metrics to Make Your Data Count

Crossposted from DataONE blog: https://www.dataone.org/news/new-usage-metrics 

Publications have long maintained a citation standard for research papers, ensuring credit for cited work and ideas. Tracking use of data collections, however, has remained a challenge. DataONE is pleased to share our latest effort in overcoming this barrier and in demonstrating data reuse with new data usage and citation metrics.

The data usage and citation metrics available on our data search and discovery platform, https://search.dataone.org, include live counts of citations, downloads, and views for each dataset in our network of Earth and environmental repositories. These metrics are interactive and can be further explored to identify articles citing the data or, in the case of downloads and views, scaled over specific time-increments to dive into details of when the data was accessed. The implementation of these data usage and citation metrics results from our collaboration as part of the Make Data Count project and compliments metrics recently implemented in Dash. The metrics are also in compliance with the COUNTER Code of Practice for Research Data.

The usage counts are acquired from views and downloads occurring directly through the DataONE interface, via activity at the individual repository or through clients, such as an R script. Additionally, researchers might cite a dataset having received it directly from a colleague. For these reasons, values for citations, downloads and views do not always covary.

We encourage you to explore these new metrics on https://search.dataone.org. Click on each metric and pull up the interactive figures to visualize downloads and views across time for each dataset, or the full lists of citation information for research papers citing each dataset. You can also check out the DataONE documentation for further details on the metrics implementation.

Make Data Count & Scholix Join FORCE(2018)s

With Make Data Count now in its second year, the focus is shifting from building infrastructure to driving adoption of our open data-level metrics infrastructure. As described in previous blog posts, we built and released infrastructure for data-level metrics (views, downloads, citations). While we developed a new COUNTER endorsed Code of Practice for Research Data and new processing pipelines for data usage metrics, we are using the outputs of the Scholix working group to count data citations. The most important next step? Get people, repositories, and publishers to use it. We teamed up with Scholix at FORCE2018 to explain to publishers and data repositories how they can implement and support research data metrics.

Scholix: it’s not a thing 

Adrian Burton (ARDC), co-chair of the Scholix working group, started the session with a very important message: Scholix is not a thing, nobody is building it, and it’s not a piece of infrastructure. Scholix is an information model that was developed to allow organizations to exchange information. In practice, this now means that, through Crossref, DataCite, and OpenAire, you can exchange information about article-data links. These links form the basis for data citations that can be obtained by querying the APIs made available by these organizations. However, think not what Scholix can do for you, but what you can do for Scholix. The system is only useful if organizations also contribute information about article-data links.

Send citations today! 

Following Adrian’s talk, Patricia Feeney (Crossref) and Helena Cousijn (DataCite) discussed with publishers and data repositories how they can add information about citations to their metadata and thereby make these openly available to others. The discussion revealed that several data repositories already do a lot of work to make this happen. They hire curators for manual curation and text-mine articles to ensure all links to datasets are discovered and made available. When they deposit their metadata with DataCite, they add information about the related identifier, the type of related identifier, and indicate what the relationship is between the dataset and article. This ensures publishers also have access to this information and can display links between articles and datasets.

The publishers often depend on authors to add information about underlying datasets to the manuscript or in the submission system. When authors don’t do this, they’re not aware of any links. Most of the publishers indicated they started working on data citation but are still optimizing their workflows. Better tagging of data citations, both in their own XML and in the XML that’s sent to Crossref, is part of that. Patricia explained that Crossref accepts data citations that are submitted as relations and also DataCite DOIs in the reference lists. This gives publishers two ways to make their data citations available.

The most important message? You can use your current (metadata) workflows to contribute article-data links. Both Crossref and DataCite are very happy to work with any organization that needs assistance in implementation or optimization of data citation workflows.

Make Data (Usage Metrics) Count

Following the discussion on why data citations are essential, and how publishers can contribute data citations to our infrastructure, we moved on to data usage metrics. Daniella Lowenberg (CDL), project lead for Make Data Count, explained how repositories can standardize their usage metrics (views, downloads) against the COUNTER Code of Practice for Research Data, contribute these usage logs to an open DataCite hub, and start displaying standardized usage metrics at the repository level. Repositories, check out our documentation here, and get in touch with Daniella if you would like to begin talking about implementation!

Event Data: for all your reuse questions

Closing out the workshop, Martin Fenner (DataCite) finished the session with an explanation of how you can consume data metrics. The links that are contributed following the Scholix framework are openly available and can therefore be used by all interested organizations. You can get these through Event Data, a service developed by Crossref and DataCite to capture references, mentions, and other events around DOIs that are not provided via DOI metadata.The Crossref and DataCite APIs support queries by DOI, DOI prefix, date range, relation type, and source. If you want to extract both usage statistics and data citations, you can obtain these through the DataCite API.  For more information, take a look at the documentation!

What does it all look like?

DataONE: Example from a Dryad dataset, all data citations displayed

Dash: Example view of standardized usage metrics & citations

COUNTER Code of Practice for Research Data Usage Metrics Release 1

Crossposted from COUNTER on September 13, 2018

There is a need for the consistent and credible reporting of research data usage. Such usage metrics are required as an important component in understanding how publicly available research data are being reused.

To address this need, COUNTER and members of the Make Data Count team (California Digital LibraryDataCite, and DataONE) collaborated in drafting the Code of Practice for Research Data Usage Metrics release 1.

The Code of Practice for Research Data Usage Metrics release 1 is aligned as much as possible with the COUNTER Code of Practice Release 5 which standardizes usage metrics for many scholarly resources, including journals and books. Many definitions, processing rules and reporting recommendations apply to research data in the same way as they apply to the other resources to which the COUNTER Code of Practice applies. Some aspects of the processing and reporting of usage data are unique to research data, and the Code of Practice for Research Data Usage Metrics thus deviates from the Code of Practice Release 5 and specifically address them.

The Code of Practice for Research Data Usage Metrics release 1 provides a framework for comparable data by standardizing the generation and distribution of usage metrics for research data. Data repositories and platform providers can now report usage metrics following common best practices and using a standard report format.

COUNTER welcomes feedback from the data repositories that implement this first release of the Code of Practice. Their experiences will help to refine and improve it and inform a second release.

Make Data Count Summer 2018 Update

It’s been two exciting months since we released the first iteration of our data-level-metrics infrastructure.  We are energized by the interest garnered and questions we’ve received and we wanted to share a couple of highlights!

Screen Shot 2018-08-06 at 10.40.02 AM

July Webinar

Soon after launch we hosted a webinar on “How-To” make your data count. Thank you to the 100 attendees that joined us for asking such thoughtful questions. For those that could not make it, or those that would like a recap, we have made all resources available on “Resources” tab of the website. Check out the July 10th webinar recording, webinar slide deck, and a transcript of the Q&A session.

If you still have questions, we encourage you to get in touch with us directly so that we can set up a group call with our team and yours. We have found our meetings with repositories and institutions to talk through the code of practice and log processing steps have been very helpful.

Zenodo Implemented the Code of Practice

A big congratulations and a thank you goes out to the Zenodo team for their implementation of standardized data usage metrics. Our project is only successful if we have as many repositories as possible standardize their data usage metrics so that we can truly have a comparable data metrics landscape. Zenodo is a global, popular, repository that was able to follow the Code of Practice for Research Data that we authored and standardize and display their views and downloads. We are looking forward to Zenodo displaying citations and contributing their usage metrics to our open-hub.

In-Person Team Meeting

Last week members from the DataCite, DataONE, and CDL teams were able to meet for a full day of planning the next quarter of the project. Prioritizing by project component, we were able to agree on where we would like to be by RDA Botswana. In broad terms – we would like to have citations integrated into the DataCite open hub (instead of as a separate entity in Event Data), we plan to gather user feedback on valued metrics, and we would like to spend time analyzing the citation landscape and the reasons why citations not making it to the hub. Follow along at our Github here.

IMG-0173

Our biggest goal is to get as many repositories as possible to make their data count. But beyond repositories, there is a role for all of us here:

Repositories:

  • If you are on a home grown platform, follow along our How-To guide. Let us know if you are implementing, and share with us your experience. The more that we can publicize repositories experiences and resources, the easier it will be for the community to adopt.
  • If you are a part of a larger platform community (fedora, dataverse, bepress), help us advocate for implementation!
  • Send your data citations through DataCite metadata. DataCite collects citation metadata as part of the DOI registration process. Enrich your metadata with links between literature (related resources) and data using the relatedIdentifier property.

Publishers:

  • Index your data citations with Crossref. When we first implemented MDC at our repositories we noticed that some known data citations were not appearing, and when looking in the Crossref API found that even when researchers added data citations they were in some cases stripped in the XML. When depositing article metadata, please ensure data citations are included as references (in the case of DataCite DOIs) or as relationships (in the case of other PIDs).

Funders, Librarians, and other Scholarly Communications Stakeholders:

  • Help us advocate for the implementation of data level metrics! Catch us at 5AM Conference, ICSTI, FORCE2018, or at RDA Botswana/International Data Week to learn more about our project and better equip yourself as an advocate.

Follow us on Twitter, join our Newsletter, or contact us directly here.

It’s Time to Make Your Data Count!

photo-1520246819288-8bcefb7ac966.jpeg

One year into our Sloan funded Make Data Count project, we are proud to release Version 1 of standardized data usage and citation metrics!

As a community that values research data it is important for us to have a standard and fair way to compare metrics for data sharing. We know of and are involved in a variety of initiatives around data citation infrastructure and best practices; including Scholix, Crossref and DataCite Event Data. But, data usage metrics are tricky and before now there had not been a group focused on processes for evaluating and standardizing data usage. Last June, members from the MDC team and COUNTER began talking through what a recommended standard could look like for research data.

Since the development of our COUNTER Code of Practice for Research Data we have implemented comparable, standardized data usage and citation metrics at Dash (CDL) and DataONE*, two project team repositories.

Screen Shot 2018-06-01 at 9.43.57 AM

Screen Shot 2018-06-05 at 6.33.12 AM
*DataONE UI coming soon

The repository pages above show how we Make Data Count:

  • Views and Downloads: Internal logs are processed against the Code of Practice and send standard formatted usage logs to a DataCite hub for public use and eventually, aggregation.
  • Citations: Citation information is pulled from Crossref Event Data.

The Make Data Count project team works in an agile “minimum viable product” methodology. This first release has focused on developing a standard recommendation, processing our logs against that Code of Practice to develop comparable data usage metrics, and display of both usage and citation metrics at the repository level. We know from work done in the prototype NSF funded Making Data Count project that the community value additional metrics. Hence future versions will include features such as:

  • details about where the data are being accessed
  • volume of data being accessed
  • citation details
  • social media activity

We just released our first iteration of data-level metrics infrastructure, what next?

1) Get Repositories Involved

For this project to be effective and for us to compare and utilize data-level metrics we need as many repositories as possible to join the effort. This is an open call for every repository with research data to Make Data Count. A couple of important resources to do so:

  • Check out our How-To Guide as described by the California Digital Library implementation of Make Data Count. Tips and tools (e.g. a Python Log Processor) are detailed in this guide and available on our public Github. Links in this guide also point to the DataCite documentation necessary for implementation.
  • Join our project team for a webinar on how to implement Make Data Count at your repository and learn more about the project on Tuesday, July 10th at 8am PST/11am EST. Webinar link: http://bit.ly/2xJEA4n.

2) Build Advocacy for Data-Level Metrics

Publishers:

When implementing this infrastructure in our repositories we became aware of how few publishers are indexing data citations properly. Very few datasets are correctly receiving citation credit in articles. If you are a publisher or are interested in advocating for proper data citation practices, check out the Scholix initiative and our brief guide here as well as DataCite’s recent blog on the current state of data citations.

Researchers & the research stakeholder community:

For the academic research community to value research data we need to talk about data-level metrics. This is a call out to researchers to utilize data-level metrics as they would with articles, and for academic governance to value these metrics as they do with articles.

With the first version of our data-level-metrics infrastructure released, we are excited to work as a community to further drive adoption of data metrics. For further updates, follow our twitter @makedatacount.

Publishers: Make Your Data Citations Count!

Many publishers have implemented open data policies and have publicly declared  their support of data as a valuable component of the research process. But to give credit to researchers and incentivize behavior for data publishing, the community needs to promote proper citation of data. Many publishers have also endorsed the FORCE Data Citation Principles, Scholix, and other data citation initiatives, but still we have not seen implementation or benefits of proper data citation indexing at the journal level. Make Data Count provides incentives and aims to show researchers the value of their research data by displaying data usage and citation metrics. However, to be able to expose citations, publishers need to promote and index data citations with Crossref so that repositories utilizing the Make Data Count infrastructure can pull citations, evaluate use patterns, and display them publicly.

So, how as a publisher, can you support open research data and incentivize researchers to think about data like articles?

  1. Implement policies that advise researchers to deposit data to a stable repository that gives a persistent, citable identifier for the dataset
  2. Guide researchers to cite their own data or other data related to their article in their references list
  3. Acknowledge data citations in the article, data availability statement, and/or reference list, tag it as a data citation, and send this in XML to Crossref via the references list or in the relationships type. Crossref has put together a simple guide here.