by Natsuko Nicholls
More than 200 attendees gathered in San Francisco for the 9th International Digital Curation Conference (IDCC14, Feb 24-27) to discuss data-driven transformations in research, education, business, and society. IDCC14 began with a stimulating pre-conference workshop and an inspiring (as well as entertaining) keynote speech. Both served to foster a positive atmosphere and a sense of belonging in the world of data curation. In his keynote presentation, Atul Buttel struck the audience with the exponential amount of clinical data that is already public and the potential research and business value of big data in biomedicine (video here and slides here). Since embarking on a career in digital curation, I have also been struck by accelerating increases in the size, diversity, and complexity of data across disciplines. Despite some differences among disciplines—for instance, in the understanding of what comprises ‘data’, which is influenced by one’s particular training and field of study—many scientific disciplines are having similar conversations about data sharing and research transparency.
As I was contemplating disciplinary similarities and dissimilarities, one tweet (#idcc14) caught my attention—“social science seems to be left out of these conversations.” This was tweeted during the panel presentation on preparing the workforce for digital curation, which shed light on emerging non-traditional employment opportunities in data science and engineering, data analysis, etc., some of which increasingly require strong backgrounds in the sciences. Apparently, many current leaders of data conversations and initiatives (e.g., another keynote speaker, a closing keynote speaker, a proponent of data sharing and open science, and a ‘rock star’ in the data world) all seem to be scientists by training. So, I asked myself: Are social scientists being left out of data conversations? If not, what are the on-going efforts and recent discussions concerning open data in my field by training—political science?
Short answer: Three-point summary
My short answer is no, social scientists are not being left out of the data conversation. IDCC14 attendees widely recognized the presence of information professionals, data librarians, and archivists in the social sciences (a.k.a. IASSISTers). Also, and perhaps more importantly, political scientists are increasingly paying attention to data sharing and research transparency. In particular, over the last few years, political scientists have been actively engaged in:
- Creating guidelines for data access and research transparency. Political scientists are tackling the primary challenge that both quantitative and qualitative political science research traditions have lacked explicit guidelines as to what kinds of data and research materials should be shared.
- Putting research guidelines into practice. Political scientists are tackling the challenge of putting guidelines into practice, including creating ways to incentivize data sharing and enabling research transparency via replication studies.
- Building an infrastructure that promotes data sharing. Political scientists are tackling the challenge of providing a technological framework to support data sharing, particularly for non-numeric, qualitative research data.
1. Creating guidelines for data access and research transparency
In October 2012, the American Association of Political Science (APSA) Council adopted new policies requiring transparency in political science research. These policies were a result of several years of discussions on data access, production transparency, and analysis transparency. The underlying principle is that openness is an indispensable element of credible research and rigorous analysis and is hence essential to both making and demonstrating scientific progress in political science. By August 2013, two documents, Guidelines for Data Access and Research Transparency for Quantitative Research in Political Science and Guidelines for Data Access and Research Transparency for Qualitative Research in Political Science, were drafted by the Data Access and Research Transparency (DA-RT) Ad Hoc Committee after having been circulated for comments—note that these are two separate guidelines. In January 2014, an issue of PS: Political Science & Politics (vol. 47, issue 1) featuring a series of articles on ‘Openness in Political Science’ was published. In this issue, nine scholars investigated the benefits of greater openness and offered ideas about how to make data access and research transparency more viable and desirable by all political scientists. A. Lupia and C. Elman make it clear that the new guidelines are “more consistent with current and emerging standards across the sciences.” C. Elman and D. Kapiszewski point out that political scientists’ shared interest in openness is best understood as a meta-standard that applies to all social inquiry. Indeed, the DA-RT project is an integral part of wider efforts in the social sciences to advance the cause of openness, transparency, legitimacy, and credibility. Yet, it is important to note that DA-RT focuses on political science and is developed by a political science community in which both specific challenges and opportunities have shaped research traditions in the field. Due to its community approach, DA-RT ideals are not imposed on political scientists—rather, DA-RT is a movement that anyone interested in political science can join.
2. Putting research guidelines into practice. (e.g., promoting research transparency via replication studies)
As stated by A. Lupia and C. Elman in this issue of PS, “recent discussions about openness are a rare and welcome example of dissimilar scholars finding opportunities for collaboration and common action.” As formulating guidelines for data access and research transparency is a big step forward, it is interesting to consider the factors that instigated the shared commitment to openness. In particular, there was growing concern that researchers could not replicate a significant number of empirical claims being made in political science’s leading journals. A practice of replication is nothing new to me—more than a decade ago, it was part of my graduate method training. Today, the same professor still teaches the same course for political science graduate students at Michigan with the same assignment called a replication and extension (or R&E) paper—and the assignment is still 35% of the grade! However, it was not until later that I fully understood the importance of sharing research data and the reasons behind replication studies.
The Washington Post’s ‘The Monkey Cage’ recently reported the results of a research project in which. N. Janz and her team distributed a survey to researchers on the Political Methodology mailing list to learn about replication assignments as part of graduate courses. They found that despite frequent replication practices, replication studies may still be under-utilized resources. One way to advance replication in political science, they argue, might be to create a website where researchers can widely share the replication studies conducted in graduate courses. What about formal publication venues for replicated studies? Unfortunately, unlike journals in the natural sciences that have a tradition of publishing replication studies, many political science or international relations journals still hesitate to publish replications, mostly due to scarce journal space, the lack of a place to store replication files, and underdeveloped journal policies regarding replication. In this issue of PS, J. Ishiyama cites a recent study revealing that out of 120 journals in political science and international relations, only 19 had explicit replication policies. Although I agree that some advances (e.g., developing publication venues for replication studies, adopting journal policies, promoting replication studies through graduate training) may improve reproducibility standards in political science, other issues, such as who is responsible for data provision and the enforcement of such provision, remain a challenge.
3. Building an infrastructure that promotes data sharing (e.g,. building a new home for qualitative social science data)
I wonder if some readers question why the APSA developed the two separate guidelines for data access and research transparency for quantitative and qualitative research if the common goal is to fuel a culture of openness that promotes effective knowledge transfer. It is because political science is a diverse discipline comprising multiple research communities that differ in their methodologies (e.g., ethnographic field work, laboratory-based experiments, statistical analysis of pre-existing datasets). Although the notion of openness has become increasingly relevant to the qualitative research tradition, data sharing and replication have been more prominent concerns for quantitative research for many years. For the quantitative research tradition, data archives and repositories (e.g., ICPSR, Dataverse) and the promotion of data sharing and re-use through data quality review have long been established. It is only recently, however, that qualitative data found a home in a new searchable repository for data that are not numbers. To provide a venue for storing, sharing, and preserving research data generated through qualitative and multi-method research in the social sciences, the Qualitative Data Repository (QDR) was recently launched at Syracuse University. As described by C. Elman, the QDR accepts many types of qualitative data including (but not limited to) unpublished primary sources, published primary sources, primary sources cited in secondary sources, secondary sources, and other research materials. As the QDR has only begun to call for pilot projects, it is too early to make judgments about the scale of data collections, the quality of data, and the discoverability of such data. It is safe to say, however, that this advancement certainly demonstrates the growing number of tools and infrastructures that have been recently developed for qualitative data sharing.
Within the weeks following IDCC14, I have tracked numerous tweets, news reports, blogs about data sharing, and announcements from the Public Library of Science (PLOS), Nature’s Scientific Data, and Figshare announcing data access policies, publication of data articles, and code and software sharing, respectively. Things are rapidly changing. Because data sharing and preservation tools and services are constantly evolving, I wanted to quietly sit down to re-capture the data conversations occurring in the social sciences, particularly political science. I hope this re-cap will help engage political scientists and social science data stewards in the same conversations and partnership in the emerging area like data publishing—just as V. Mitchell and J. Baker demonstrated at IDCC14 their collaborative data publishing pilot at the University of Oregon.