Join us for a 3-part series on key issues and opportunities for the community to prioritize on our journey to open data metrics.
By: John Chodacki, Martin Fenner, Daniella Lowenberg
Today, Zenodo announced their intentions to remove the altmetrics.com badges from their landing pages–and we couldn’t be more energized by their commitment to open infrastructure, supporting their mission to make scientific information open and free.
“We strongly believe that metadata about records including citation data & other data used for computing metrics should be freely available without barriers” – Zenodo Leadership
In the scholarly communications space, many organizations rally around the idea that we want the world’s knowledge to be discoverable, accessible, and auditable. However, we are not all playing by the same rules. While some groups work to build shared infrastructure, others work to build walls. This can be seen by the use of building barriers to entry around freely open information, or, information that should be open and free but isn’t.
In light of emerging needs for metrics and our work at Make Data Count (MDC) to build open infrastructure for data metrics, we believe that it is necessary for corporations or entities that provide analytics and researcher tools to share the raw data sources behind their work. In short, if we trust these metrics enough to display on our websites or add to our CVs, then we should also demand that they be available for us to audit.
This isn’t a new idea. The original movement to build Article Level Metrics (ALMs) and alternative metrics were founded on this principle. The challenge is that while infrastructure groups have continued to work to capture these raw metrics, the lopsided ecosystem has allowed corporations to productize and sell them, regardless of there being a true value-add on top of open information or not.
We believe that the open metrics space should be supported, through contributions and usage, by everyone: non-profits, corporations, and community initiatives alike. In supporting open metrics, though, it is particularly important to acknowledge the projects and membership organizations that have moved the needle by networking research outputs through PIDs and rich metadata. We can acknowledge these organizations by advocating for open science graphs and bibliometrics research to be based on their data, so that others can reproduce and audit the assumptions made. Other ideals that we believe should guide the development of the open metrics space include:
- Publishers and/or products that deal in building connections between research outputs should supply these assertions to community projects with full permissive CC0 license.
- Companies, projects, and products that collect and clean metrics data are doing hard work. We should applaud them. But we should also recognize when metrics are factual assertions (e.g., counts, citations), they should be openly accessible.
- Innovation must continue and, similarly, productization can and should help drive innovation. However, only as a value add. Aggregating, reporting, making data consumption easier, building analysis tools and creating impact indicators from open data can all be valuable. But, we should not reward any project that provides these services at the expense of the underlying data being closed to auditing and reuse.
- Show our work. We ask researchers to explain their methods and protocols and publish the data that underlies their research. We can and must do the same for the metrics we use to judge them by–and we must hold all actors in this space accountable in this regard as we work toward full transparency.
These principles are core to our mission to build the infrastructure for open data metrics. As emphasis shifts in scholarly communication toward “other research outputs” beyond the journal article, we believe it is important to build intentionally open infrastructure, not repeating mistakes made in the metrics systems developed for articles. We know that it is possible for the community to come together and develop the future of open metrics, in a non-prescriptive manner, and importantly built on completely open and reproducible infrastructure.
Since 2014, the Make Data Count (MDC) initiative has focused on building the social and technical infrastructure for the development of research data metrics. With funding from the National Science Foundation, Gordon and Betty Moore Foundation, and Alfred P. Sloan Foundation, the initiative has transformed from a research project with an aim to understand what researchers value about their data, to an infrastructure development project, and now into a full-fledged adoption initiative. The team is proud to announce additional funding from the Sloan Foundation to focus on widespread adoption of standardized data usage and data citation practices, the building blocks for open research data metrics.
Expanded team & expanded scope
In broadening our scope and refining our adoption efforts, we are thrilled to announce new MDC team members. By including key community players in the adoption and research landscapes, we can look beyond infrastructure development and more effectively reach our publisher and repository stakeholders.
- Crossref: We welcome Crossref, who will help guide our data citation work at publishers in conjunction with existing data citation initiatives (e.g., Scholix). By having an increased presence at publisher meetings and building up support in the Crossref member community, we aim to see many more journals properly contributing to the data citation landscape.
- Bibliometricians: With an increased pressure by research stakeholders to have data metrics at the ready, we are pleased to be working with a group of expert bibliometricians who will begin studies into researcher behavior around data re-use. It is essential that our driving motives for the development of data metrics are evidence based and we welcome Dr. Stefanie Haustein (University of Ottawa, Co-Director ScholCommLab) and Dr. Isabella Peters (ZBW – Leibniz Information Centre for Economics) and their labs to our team.
“I am excited to join and work closely together with the MDC team on the development of data metrics. Our team at the ScholCommLab in Canada and Isabella’s research group in Germany will use a mixed-methods approach and apply bibliometric as well as qualitative methods to analyze discipline-specific data citation and reuse patterns. We hope to provide much-needed evidence to develop meaningful data metrics that can help researchers showcase the importance of data sharing.”
– Dr. Stefanie Haustein
Our goals for the MDC initiative going forward are three-fold:
- Increased adoption of standardized data usage across repositories through enhanced processing and reporting services
- Increased implementations of proper data citation practices at publishers by working in conjunction with publisher advocacy groups and societies
- Promotion of bibliometrics qualitative and quantitative studies around data usage and citation behaviors
“The responsible use and application of data metrics and data citation must become a community norm across all disciplines if data creation, curation, stewardship, reuse and discovery are to be properly valued. By partnering with key infrastructure providers and researchers, Make Data Count is ensuring that the adoption of data metrics and data citation are researcher led, discipline specific and evidence based. This is crucial if we are to avoid the perverse consequences created by the misuse of article citations and metrics, such as those based on journal rank and impact factor.”
– Dr. Catriona MacCallum, Director of Open Science, Hindawi
“MDC has put data metrics at the center of the debate on data sharing. Now, it is time to make data metrics a reality. The development of an ambitious infrastructure for data metrics, supported by the research of Stefanie Haustein, Isabella Peters and colleagues, creates the unique environment to turn data metrics into a tangible reality; expanding the analytical toolset for scientometric research and science policy making. Such transformation is meant to contribute not only to increase the importance of data sharing in scientific practice, but also to radically transform how science is being currently developed, measured and evaluated.”
– Dr. Rodrigo Costas, Senior Researcher, CWTS, Leiden University
Driven by two separate grant funds, one focused on the deployment of data usage services, a bibliometrics dashboard, and publisher data citation campaigns (PI Lowenberg) and the other on understanding what is meaningful for data metrics (PI Haustein), the MDC team is moving full steam ahead on these adoption goals. The MDC initiative can only be effective with broad and diverse community participation. Follow along for announcements of webinars and events for community involvement and check out our announcement at the ScholCommLab blog for more details on the bibliometrics work ahead.
The Make Data Count team has been working on various infrastructure and outreach projects focused on how to measure the reach and impact of research data. While busy driving adoption of these frameworks and services, we have yet to discuss where we’re at in terms of high-level challenges and where we believe we need to go to.
To clarify to the community what our opinions and approaches are in terms of open data metrics, members from the Make Data Count team (Daniella Lowenberg, John Chodacki, Martin Fenner, Matt Jones) sat down and wrote a book that we hope will jump start a community conversation. We would love to hear your feedback and look forward to engaging with you on the topic.
Research data is at the center of science, and to date it has been difficult to understand its impact. To assess the reach of open data, and to advance data-driven discovery, the research and research supporting communities need open, trusted data metrics.
In Open Data Metrics: Lighting the Fire, the authors propose a path forward for the development of data metrics. They acknowledge historic players and milestones in the process and demonstrate the need for standardized, transparent, community-led approaches to establish open data metrics as the new normal.
Following advice from our workshop attendees at RDA13, we invite you to join us for our spring webinar.
Join us on May 8th at 8am PST/3pm GMT as we demo our new aggregation services at DataCite and DataONE. This webinar is intended to spotlight the features and services we can build off of our central infrastructure such as aggregated usage and citations. This webinar will be recorded and posted on our website.
Webinar Registration: bit.ly/MDCSpringWebinar
Join us on March 26th at 8:00am PST/4:00pm GMT for a webinar on repository implementation of our COUNTER Code of Practice for Research Data and Make Data Count recommendations. This webinar will feature a panel of developers from repositories that have implemented or about to release standardized data metrics: Dataverse, Dryad, and Zenodo. We will interview each repository on their implementation process. This recorded discussion between our technical team and repositories, providing various perspectives of implementation, should be a helpful guidance for any repository interested in implementing!
To register for our webinar, please fill out this form.
For those who cannot make it, a recording will be made available on our website. Please tweet to us any questions that you may want asked.
When: April 1st, 10:00am-12:00pm
Where: Loews Hotel, Philadelphia. Room: Congress C
Why: As we begin to wrap up our two-year grant, it is essential that we bring in the community to learn about our data metrics infrastructure and understand community feedback on adoption. In this workshop, we’ll be demonstrating how repositories and publishers can contribute data usage statistics and citations. Several repository implementers will be present to share their experiences and explain best practices. In addition, we will show how anyone interested can consume these open metrics. Adoption of comparable usage statistics and data citations is a critical first step towards usable research data metrics, so we urge all data providers and data stakeholders to come join our interactive session!
Crossposted from DataONE blog: https://www.dataone.org/news/new-usage-metrics
Publications have long maintained a citation standard for research papers, ensuring credit for cited work and ideas. Tracking use of data collections, however, has remained a challenge. DataONE is pleased to share our latest effort in overcoming this barrier and in demonstrating data reuse with new data usage and citation metrics.
The data usage and citation metrics available on our data search and discovery platform, https://search.dataone.org, include live counts of citations, downloads, and views for each dataset in our network of Earth and environmental repositories. These metrics are interactive and can be further explored to identify articles citing the data or, in the case of downloads and views, scaled over specific time-increments to dive into details of when the data was accessed. The implementation of these data usage and citation metrics results from our collaboration as part of the Make Data Count project and compliments metrics recently implemented in Dash. The metrics are also in compliance with the COUNTER Code of Practice for Research Data.
The usage counts are acquired from views and downloads occurring directly through the DataONE interface, via activity at the individual repository or through clients, such as an R script. Additionally, researchers might cite a dataset having received it directly from a colleague. For these reasons, values for citations, downloads and views do not always covary.
We encourage you to explore these new metrics on https://search.dataone.org. Click on each metric and pull up the interactive figures to visualize downloads and views across time for each dataset, or the full lists of citation information for research papers citing each dataset. You can also check out the DataONE documentation for further details on the metrics implementation.
With Make Data Count now in its second year, the focus is shifting from building infrastructure to driving adoption of our open data-level metrics infrastructure. As described in previous blog posts, we built and released infrastructure for data-level metrics (views, downloads, citations). While we developed a new COUNTER endorsed Code of Practice for Research Data and new processing pipelines for data usage metrics, we are using the outputs of the Scholix working group to count data citations. The most important next step? Get people, repositories, and publishers to use it. We teamed up with Scholix at FORCE2018 to explain to publishers and data repositories how they can implement and support research data metrics.
Scholix: it’s not a thing
Adrian Burton (ARDC), co-chair of the Scholix working group, started the session with a very important message: Scholix is not a thing, nobody is building it, and it’s not a piece of infrastructure. Scholix is an information model that was developed to allow organizations to exchange information. In practice, this now means that, through Crossref, DataCite, and OpenAire, you can exchange information about article-data links. These links form the basis for data citations that can be obtained by querying the APIs made available by these organizations. However, think not what Scholix can do for you, but what you can do for Scholix. The system is only useful if organizations also contribute information about article-data links.
Send citations today!
Following Adrian’s talk, Patricia Feeney (Crossref) and Helena Cousijn (DataCite) discussed with publishers and data repositories how they can add information about citations to their metadata and thereby make these openly available to others. The discussion revealed that several data repositories already do a lot of work to make this happen. They hire curators for manual curation and text-mine articles to ensure all links to datasets are discovered and made available. When they deposit their metadata with DataCite, they add information about the related identifier, the type of related identifier, and indicate what the relationship is between the dataset and article. This ensures publishers also have access to this information and can display links between articles and datasets.
The publishers often depend on authors to add information about underlying datasets to the manuscript or in the submission system. When authors don’t do this, they’re not aware of any links. Most of the publishers indicated they started working on data citation but are still optimizing their workflows. Better tagging of data citations, both in their own XML and in the XML that’s sent to Crossref, is part of that. Patricia explained that Crossref accepts data citations that are submitted as relations and also DataCite DOIs in the reference lists. This gives publishers two ways to make their data citations available.
The most important message? You can use your current (metadata) workflows to contribute article-data links. Both Crossref and DataCite are very happy to work with any organization that needs assistance in implementation or optimization of data citation workflows.
Make Data (Usage Metrics) Count
Following the discussion on why data citations are essential, and how publishers can contribute data citations to our infrastructure, we moved on to data usage metrics. Daniella Lowenberg (CDL), project lead for Make Data Count, explained how repositories can standardize their usage metrics (views, downloads) against the COUNTER Code of Practice for Research Data, contribute these usage logs to an open DataCite hub, and start displaying standardized usage metrics at the repository level. Repositories, check out our documentation here, and get in touch with Daniella if you would like to begin talking about implementation!
Event Data: for all your reuse questions
Closing out the workshop, Martin Fenner (DataCite) finished the session with an explanation of how you can consume data metrics. The links that are contributed following the Scholix framework are openly available and can therefore be used by all interested organizations. You can get these through Event Data, a service developed by Crossref and DataCite to capture references, mentions, and other events around DOIs that are not provided via DOI metadata.The Crossref and DataCite APIs support queries by DOI, DOI prefix, date range, relation type, and source. If you want to extract both usage statistics and data citations, you can obtain these through the DataCite API. For more information, take a look at the documentation!
What does it all look like?
DataONE: Example from a Dryad dataset, all data citations displayed
Dash: Example view of standardized usage metrics & citations
Crossposted from COUNTER on September 13, 2018
There is a need for the consistent and credible reporting of research data usage. Such usage metrics are required as an important component in understanding how publicly available research data are being reused.
To address this need, COUNTER and members of the Make Data Count team (California Digital Library, DataCite, and DataONE) collaborated in drafting the Code of Practice for Research Data Usage Metrics release 1.
The Code of Practice for Research Data Usage Metrics release 1 is aligned as much as possible with the COUNTER Code of Practice Release 5 which standardizes usage metrics for many scholarly resources, including journals and books. Many definitions, processing rules and reporting recommendations apply to research data in the same way as they apply to the other resources to which the COUNTER Code of Practice applies. Some aspects of the processing and reporting of usage data are unique to research data, and the Code of Practice for Research Data Usage Metrics thus deviates from the Code of Practice Release 5 and specifically address them.
The Code of Practice for Research Data Usage Metrics release 1 provides a framework for comparable data by standardizing the generation and distribution of usage metrics for research data. Data repositories and platform providers can now report usage metrics following common best practices and using a standard report format.
COUNTER welcomes feedback from the data repositories that implement this first release of the Code of Practice. Their experiences will help to refine and improve it and inform a second release.