Data metrics is a journey and we are at a pivotal moment of needing broad adoption of data usage and data citation best practices

In a myriad of ways, researchers use data to build on their research. However, when a researcher publishes data, our community still lacks a clear way to measure its reach and impact. While there has been a rise in efforts to best assess this, approaches have not been consolidated or agreed upon. This lack of auditable and understandable data metrics is one of the causes for the continued lack of credit for research data outputs. Without data metrics, researchers are trapped into publishing data only to tick a box and meet compliance requirements as opposed to being assessed or rewarded for their work. 

While it is important that we find ways to evaluate the investments made in research data from the perspectives of both the scientist and the wider research data infrastructure communities. It is essential that these metrics be developed in an open and responsible way. 

Open and responsibly created metrics

There have been several attempts to short circuit the metrics space and build indicators off of faulty or opaque indicators. These approaches usually rely on the familiar framing of the journal publishing space (data impact factors or data h-indexes) and bear the similarities of being poor indicators. If the creation of data metrics are not developed by the community but rather created by competing entities (i.e., commercial), we foresee the creation of metrics that do not actually measure dataset re-use and so will negatively impact the incentive to publish open data

As mentioned in the post “Open Metrics Require Open Infrastructure” it is required that metrics be developed on open infrastructure, including auditable logs of the raw data feeding into the metrics. Explained more in “Open Data Metrics: Lighting the Fire”, if the raw counts are black boxed and not transparent, we risk repeating mistakes made in the article world. Regardless of organization type, if the community ensures that data usage and citations are counted and aggregated in a traceable, open manner, we can implore much needed trust in the developed data metrics.

Emphasis on the building blocks

To date, data metrics efforts have been primarily split between data usage and data citation initiatives. Data usage refers to the views, downloads, and counts of how often a dataset is accessed. Data citation is the reference of a dataset either as a formal citation (e.g., inclusion in a reference list for a journal article), or a reference to the use or re-use of a dataset (e.g., the mention of a dataset in the methods section of an article). While these two aspects of data metrics are distinct, they are often coupled under the assessment of a dataset’s impact and reach. We believe that though these counts are not in fact metrics, normalized approaches to counting and sharing usage and citation are an important first step in the development of research data assessment indicators.

In addition to normalizing our approaches to counting, we need higher quality metadata that allows for the proper contextualization of these counts (e.g., disciplinary information, funding, authorship attributions). This holds true for both data usage and citation, and repositories publishing data and journals citing data have responsibilities to move towards a state where they submit high quality metadata to centralized sources for aggregation and re-use.

Basic bibliometric research investigating data sharing, data reuse and data citation practices remains relatively nascent and is required for this development. Using openly available and standardized counts for usage and citation, it is important for studies to understand how various facets affect data reuse and citation (e.g., across disciplines, career status, country, etc). Providing empirical evidence (through mixed method studies) on the role data citation and data reuse plays in the research process is essential to the development of appropriate and meaningful data metrics. 

The Way Forward

We have the opportunity to create responsible, transparent, and open data metrics and it is essential to bring the community on board with normalized, open approaches. As we build out support measuring the reach of research data and advocating for the inclusion of open data in responsible research assessment, we need to assure there is community accountability. Our success requires community engagement and we hope you can join us.