Open Metrics Require Open Infrastructure

By: John Chodacki, Martin Fenner, Daniella Lowenberg

Today, Zenodo announced their intentions to remove the altmetrics.com badges from their landing pages–and we couldn’t be more energized by their commitment to open infrastructure, supporting their mission to make scientific information open and free.

“We strongly believe that metadata about records including citation data & other data used for computing metrics should be freely available without barriers” – Zenodo Leadership

In the scholarly communications space, many organizations rally around the idea that we want the world’s knowledge to be discoverable, accessible, and auditable. However, we are not all playing by the same rules. While some groups work to build shared infrastructure, others work to build walls. This can be seen by the use of building barriers to entry around freely open information, or, information that should be open and free but isn’t. 

In light of emerging needs for metrics and our work at Make Data Count (MDC) to build open infrastructure for data metrics, we believe that it is necessary for corporations or entities that provide analytics and researcher tools to share the raw data sources behind their work. In short, if we trust these metrics enough to display on our websites or add to our CVs, then we should also demand that they be available for us to audit. 

This isn’t a new idea. The original movement to build Article Level Metrics (ALMs) and alternative metrics were founded on this principle. The challenge is that while infrastructure groups have continued to work to capture these raw metrics, the lopsided ecosystem has allowed corporations to productize and sell them, regardless of there being a true value-add on top of open information or not. 

We believe that the open metrics space should be supported, through contributions and usage, by everyone: non-profits, corporations, and community initiatives alike. In supporting open metrics, though, it is particularly important to acknowledge the projects and membership organizations that have moved the needle by networking research outputs through PIDs and rich metadata. We can acknowledge these organizations by advocating for open science graphs and bibliometrics research to be based on their data, so that others can reproduce and audit the assumptions made. Other ideals that we believe should guide the development of the open metrics space include:

  • Publishers and/or products that deal in building connections between research outputs should supply these assertions to community projects with full permissive CC0 license. 
  • Companies, projects, and products that collect and clean metrics data are doing hard work.  We should applaud them. But we should also recognize when metrics are factual assertions (e.g., counts, citations), they should be openly accessible. 
  • Innovation must continue and, similarly, productization can and should help drive innovation. However, only as a value add. Aggregating, reporting, making data consumption easier, building analysis tools and creating impact indicators from open data can all be valuable. But, we should not reward any project that provides these services at the expense of the underlying data being closed to auditing and reuse.
  • Show our work. We ask researchers to explain their methods and protocols and publish the data that underlies their research. We can and must do the same for the metrics we use to judge them by–and we must hold all actors in this space accountable in this regard as we work toward full transparency.   

These principles are core to our mission to build the infrastructure for open data metrics. As emphasis shifts in scholarly communication toward “other research outputs” beyond the journal article, we believe it is important to build intentionally open infrastructure, not repeating mistakes made in the metrics systems developed for articles. We know that it is possible for the community to come together and develop the future of open metrics, in a non-prescriptive manner, and importantly built on completely open and reproducible infrastructure.

Publishers: Make Your Data Citations Count!

Many publishers have implemented open data policies and have publicly declared  their support of data as a valuable component of the research process. But to give credit to researchers and incentivize behavior for data publishing, the community needs to promote proper citation of data. Many publishers have also endorsed the FORCE Data Citation Principles, Scholix, and other data citation initiatives, but still we have not seen implementation or benefits of proper data citation indexing at the journal level. Make Data Count provides incentives and aims to show researchers the value of their research data by displaying data usage and citation metrics. However, to be able to expose citations, publishers need to promote and index data citations with Crossref so that repositories utilizing the Make Data Count infrastructure can pull citations, evaluate use patterns, and display them publicly.

So, how as a publisher, can you support open research data and incentivize researchers to think about data like articles?

  1. Implement policies that advise researchers to deposit data to a stable repository that gives a persistent, citable identifier for the dataset
  2. Guide researchers to cite their own data or other data related to their article in their references list
  3. Acknowledge data citations in the article, data availability statement, and/or reference list, tag it as a data citation, and send this in XML to Crossref via the references list or in the relationships type. Crossref has put together a simple guide here.

How to engage with MDC?

Making Data Count (MDC) team members are at the center of many initiatives that focus on aspects of metrics, including DLM. We leverage existing channels to build a new data usage standard and to promote integration and adoption amongst data centers and data consumers.

If you want to get in contact and start collaborating with us, please:

  • Join our mailing list
  • Follow us on Twitter
  • Contact us directly!

 

 

Understanding the problem

Journal articles are the currency of scholarly research. As a result, we as a community, use sophisticated methods to gauge the impact of research and measure the attention it receives by analyzing article citations, article page views and downloads, and social media metrics. While imprecise, these metrics offer us a way to identify relationships and better understand relative impact. One of the many challenges with these efforts is that scholarly research is made up of a much larger and richer set of outputs beyond traditional publications. Foremost among them is research data. In order to track and report the reach of research data, we must build and maintain new, unique methods for collecting metrics on complex research data. Our project will build the metrics infrastructure required to elevate data to a first class research output.

In 2014, members of this proposal group were involved in an NSF EAGER research grant entitled, Making Data Count: Developing a Data Metrics Pilot . That effort surveyed scholars and publishers and determined which metrics and approaches would offer the most value to the research community. We spent one-year researching the priorities of the community and exploring how ideas common to article level metrics (ALM) could be translated to conventions in data level metrics (DLM) and building a prototype DLM service. We determined that the community values data citation, data usage, and data download statistics more than they value the metrics focused on social media. Based on this research, the project partners went a step further and isolated the gaps in existing data metrics efforts:

  • there are no community-driven standards for data usage stats;
  • no open source tools to collect usage stats according to standards;
  • and no central place to store, index and access data usage stats, together with other DLM, in particular data citations.

This project proposes to fill these gaps by engaging in the following activities:

  1. We will work with COUNTER to develop and publish code of practice recommendations for how data usage should be measured and reported
  2. We will deploy a central online DLM hub based on the Lagotto software for acquiring, managing, and presenting these metrics
  3. We will integrate new data sources and clients of aggregated metrics to serve as exemplars for the integration of data repositories and discovery platforms into a robust DLM ecosystem
  4. We will encourage the growth and uptake of DLMs through an engaged stakeholder community that will advocate, grow, and help sustain DLM services

Throughout each of these activities, we will encourage the growth and uptake of DLM through an engaged stakeholder community that will advocate, grow and sustain the services. As a result, the community will finally have the infrastructure needed to build relationships and better understand the relative impact of research data.