A Growing List of Data Journals

by Katherine Akers*

In one of my recent projects, I’ve been investigating the development and current state of data journals and other scholarly journals that publish data papers. Rather than forming arguments or drawing conclusions from data, data papers call attention to and describe particular research datasets, which may increase the likelihood that the datasets could be re-used or re-purposed by other researchers in the future. What I particularly like about data papers is that they are peer-reviewed, can be listed on CVs, and can accumulate citations just like traditional journal articles, thus providing an incentive for researchers to put time and effort into preparing their datasets for public access.

The Peer Review for Publication & Accreditation of Research Data in the Earth Sciences (PREPARDE) project’s list of data journals was really helpful to me as I started my investigation. However, the more I researched the topic, the more journals I found that publish articles describing datasets, databases, software, or models. To date, I have found 73 journals that either exclusively publish data (or database, software, or model) papers or that publish data papers as one article type. It seems that every time I scour the internet for information on data journals, I find another one, so this list is likely not exhaustive. Also, an increasing number of publishers are launching new data journals or starting to publish data papers, including Nature’s Scientific Data, suggesting that this list will only continue to grow.

  • Acta Crystallographica E: Structure Reports Online
  • Atomic Data and Nuclear Data Tables
  • Biodiversity Data Journal
  • BioRisk
  • BMC Bioinformatics
  • BMC Biology
  • BMC Biophysics
  • BMC Biotechnology
  • BMC Cancer
  • BMC Cell Biology
  • BMC Complementary & Alternative Medicine
  • BMC Dermatology
  • BMC Developmental Biology
  • BMC Ecology
  • BMC Evolutionary Biology
  • BMC Gastroenterology
  • BMC Genetics
  • BMC Genomics
  • BMC Geriatrics
  • BMC Immunology
  • BMC Infectious Diseases
  • BMC Medical Education
  • BMC Medical Genetics
  • BMC Medical Genomics
  • BMC Medical Imaging
  • BMC Medical Informatics & Decision Making
  • BMC Medical Research Methodology
  • BMC Microbiology
  • BMC Molecular Biology
  • BMC Musculoskeletal Disorders
  • BMC Nephrology
  • BMC Neuroscience
  • BMC Ophthalmology
  • BMC Palliative Care
  • BMC Pharmacology & Toxicology
  • BMC Plant Biology
  • BMC Pregnancy & Childbirth
  • BMC Psychiatry
  • BMC Public Health
  • BMC Pulmonary Medicine
  • BMC Research Notes
  • BMC Structural Biology
  • BMC Systems Biology
  • BMC Urology
  • Dataset Papers in Science
  • Earth System Science Data
  • Ecological Research
  • Ecology
  • F1000 Research
  • Genomics Data
  • Geochemistry, Geophysics, Geosystems
  • Geoscience Data Journal
  • Geoscientific Model Development
  • GigaScience
  • International Journal of Robotic Research
  • Internet Archaeology
  • Journal of Chemical and Engineering Data
  • Journal of Open Archaeology Data
  • Journal of Open Psychology Data
  • Journal of Open Research Software
  • Journal of Physical and Chemical Reference Data
  • Journal of Statistical Software
  • MycoKeys
  • Nature Conservation
  • NeoBiota
  • Neuroinformatics
  • Nuclear Data Sheets
  • Open Health Data
  • PhytoKeys
  • PLoS ONE
  • Scientific Data
  • Standards in Genomic Sciences
  • ZooKeys

* Katherine Akers is now a Biomedical Research Data Specialist at Wayne State University’s Shiffman Medical Library.

Posted in Uncategorized | Tagged , , | Leave a comment

Conference Highlights: Research Data Access & Preservation Summit 2014

by Fe Sferdean

In March I attended the 2014 Research Data Access and Preservation (RDAP14) Summit in San Diego. The three day Summit, focusing on research data management, access, and preservation, included invited panels and presentations, a poster session, lightning talks, hands-on workshops, resources, and tools developed by and for the community. With about 100 attendees, the Summit brought together research data managers and curators, librarians, researchers, and data scientists from the life sciences, physical sciences, social sciences, and humanities fields.

Focusing More on People, Less on Data

A recurring theme throughout the Summit, highlighted by MacKenzie Smith in her presentation called “Choosing the Right Problem: An Institutional Library Perspective”, emphasized focusing less on data and more on people – especially helping researchers with their career milestones by linking data curation to the processes involved in promotion and tenure. Her presentation considered questions such as: how can we coordinate with the research workflow and where does notification make sense? What ‘data events’ should the researcher notify us about? Focusing less on compliance and more on services supporting researchers was further discussed by the Summit panelists, emphasizing the people-centered approach both within the library and beyond.

A People-Centered Approach for Research Data Management Support

Day One panels including “Building a data management and curation program on a shoestring budget” (with speakers from Tufts, UMass Medical, and Virginia Commonwealth) and “Collaboration and tension between institutions and units providing data management support”, discussed the importance of reaching out and building relationships with stakeholders on campus such as the Office of Research, proposal development office, central IT, data storage services, and big data centers. To help develop services for researchers, the panelists discussed importance of training and supporting librarians for new roles in research data management in addition to their current teaching responsibilities. This seemed to resonate with a few of the attendees who voiced their concerns about suddenly having “data” attached to their current job titles. One panelist raised the question: where do the liaison librarian’s services end and the research data specialist’s services begin? It appears that the research data specialist must primarily take on the people-centered aspect of the data services by serving as a ‘bridge’ between the liaison librarians and the campus stakeholders. For one library service model, data-related consultations with faculty always includes the liaison librarian and the data specialist librarian. Another panelist referred to herself as a ‘data concierge’ as she is not a data expert herself but rather directs researchers to the data experts on campus.

Research data support services discussed by the panelists resembled an “embedded librarianship” approach that included outreach activities (e.g., brown bags, presentations) for faculty and researchers within their departments, and not “immediately putting researchers on tools (e.g., DMPTool)”, which helped knit stronger relationships that resulted in a higher number of consultations. Some panelists provided workshops for faculty and discussed infusing core competencies of research data management earlier in the researcher career by teaching graduate and undergraduate courses for credit.

The people-centered theme continued into the Day Two talks and poster sessions, which focused on the curation of research data. Maryann Martone began the day with a talk about the Neuroscience Information Framework, an inventory of global neuroscience resources that tackles the challenge of having “multiple data types, multiple scales, and multiple databases.” Jared Lyle of ICPSR lead a panel about Learning to Curate, where library data curators from Emory, Duke, and UCLA shared their experiences of applying curation theories to practice through actual data processing using the ICPSR data workflow model. Presentations about the NSF DataNet projects, a panel on federal requirements for public access, and poster sessions with topics ranging from data identifier taxonomy to designing data services ended the day. To finalize the people-centered theme of the Summit,  Lisa Hinchliffe of UIUC hosted a workshop on Day Three called “Learning to Teach, Teach for Learning: Instructional Practices for Data Services”. She delved into proven and efficient strategies into developing and delivering instruction to equip attendees involved with teaching research data management concepts to faculty, researchers, students, and colleagues.

In summary, the Summit provided the opportunity to interact with practitioners and researchers working in research data management, access, and preservation. The main theme focused on a people-centered approach that weaves together the different aspects of research data management from educating librarians to building relationships with campus stakeholders in order to provide the best support for researchers.

Resources about RDAP 2014

Posted in Uncategorized | Tagged , | Leave a comment

Deep Dive into Psychology Data

by Katherine Akers*

Continuing our research data education program for librarians at the University of Michigan Library, Susan Turkel (Psychology & Sociology Librarian) and I (eScience Librarian, CLIR Postdoctoral Fellow, and psychologist by training) recently led a Deep Dive into Psychology Data workshop that was attended by a total of 31 librarians. Similar to the Deep Dive into Ecology Data workshop led by Scott Martin (Biological Sciences Librarian), our workshop explored how librarians can become more familiar with the research data management landscape of a particular discipline, using psychology as an example.

Susan and I demonstrated how librarians can deepen their understanding of issues and current trends in research data management within the context of particular disciplines by investigating the data policies of three different entities: funding agencies, academic societies, and journals. Our deep dive into psychology data revealed several interesting findings that could be useful to psychology librarians who want to provide greater support for psychology research data management.

  • Funding agencies: At the University of Michigan, many psychology faculty receive funding from the National Science Foundation and the National Institutes of Health, which have clear policies on research data sharing. Many others, however, receive funding from the Department of Defense and smaller public or private funding agencies, which may have unclear or no data sharing policies.
  • Academic societies: The American Psychological Association (APA) expects psychologists to share their data with other competent professionals who wish to replicate or verify the claims made in publications, but only for that purpose. The American Psychological Society (APS) does not seem to have a formal data sharing policy.
  • Journals: Most psychology journals do not expect, encourage, or require authors to share the data underlying their articles, but there are some interesting exceptions. For instance, the Archives of Scientific Psychology is a new journal (published by APA) that requires authors to make the data underlying their analyses accessible to others. However, as potential data users must complete an extensive application for data access, some have questioned how much this journal actually supports the concept or practice of open data. Also, Psychological Science (published by APS) recently implemented a new system whereby authors can earn a digital badge affixed to their article if they make their data publicly available. Furthermore, the new Journal of Open Psychology Data is a data journal that publishes papers describing particular research datasets housed in data repositories.

We also found a paucity of repositories specifically designed for psychology research data. Some repositories that may be relevant to certain psychologists include the National Database for Autism Research, Interuniversity Consortium for Political and Social Research (ICPSR), National Archive of Computerized Data on Aging (part of ICPSR), and OpenfMRI. Therefore, as psychologists have few disciplinary data repositories to turn to, they may need to rely more on institutional repositories and general repositories such as Harvard Dataverse Network or figshare.

Finally, a major issue that complicates data sharing and preservation in this discipline is that much psychology research involves human subjects, meaning that research data are likely to contain personal and/or sensitive information that cannot be openly shared. I’ve often heard researchers say that they can’t share their data because the Institutional Review Board (IRB) requires data to be destroyed at a certain time point after collection. But the University of Michigan’s IRB states that identifiable information, not all data, should be deleted or destroyed. This means that psychology datasets can potentially be shared and openly preserved if personal information can be removed, such as by following the US Department of Health & Human Services’ guidelines for de-identifying datasets. Alternatively, psychology datasets containing personal and/or sensitive information could be placed in repositories, such as ICPSR and Harvard Dataverse Network, that can restrict access to certain users.

In summary, the research data management landscape of psychology is probably not altogether different from many other disciplines that do not have well-established cultures of data sharing. Although there may not be strict data sharing policies or a wealth of resources that support data sharing in psychology, our deep dive certainly enhanced our knowledge of current trends in psychology research data management and will help us raise awareness of options for the storage and dissemination of research data among psychology researchers.

* Katherine Akers is now a Biomedical Research Data Specialist at Wayne State University’s Shiffman Medical Library.

Posted in Uncategorized | Tagged , , | Leave a comment

Post-Conference Thoughts: Re-capturing Data Conversations in Political Science

by Natsuko Nicholls

More than 200 attendees gathered in San Francisco for the 9th International Digital Curation Conference (IDCC14, Feb 24-27) to discuss data-driven transformations in research, education, business, and society. IDCC14 began with a stimulating pre-conference workshop and an inspiring (as well as entertaining) keynote speech. Both served to foster a positive atmosphere and a sense of belonging in the world of data curation. In his keynote presentation, Atul Buttel struck the audience with the exponential amount of clinical data that is already public and the potential research and business value of big data in biomedicine (video here and slides here). Since embarking on a career in digital curation, I have also been struck by accelerating increases in the size, diversity, and complexity of data across disciplines. Despite some differences among disciplines—for instance, in the understanding of what comprises ‘data’, which is influenced by one’s particular training and field of study—many scientific disciplines are having similar conversations about data sharing and research transparency.

As I was contemplating disciplinary similarities and dissimilarities, one tweet (#idcc14) caught my attention—“social science seems to be left out of these conversations.” This was tweeted during the panel presentation on preparing the workforce for digital curation, which shed light on emerging non-traditional employment opportunities in data science and engineering, data analysis, etc., some of which increasingly require strong backgrounds in the sciences. Apparently, many current leaders of data conversations and initiatives (e.g., another keynote speaker, a closing keynote speaker, a proponent of data sharing and open science, and a ‘rock star’ in the data world) all seem to be scientists by training. So, I asked myself: Are social scientists being left out of data conversations? If not, what are the on-going efforts and recent discussions concerning open data in my field by training—political science?

Short answer: Three-point summary

My short answer is no, social scientists are not being left out of the data conversation. IDCC14 attendees widely recognized the presence of information professionals, data librarians, and archivists in the social sciences (a.k.a. IASSISTers). Also, and perhaps more importantly, political scientists are increasingly paying attention to data sharing and research transparency. In particular, over the last few years, political scientists have been actively engaged in:

  1. Creating guidelines for data access and research transparency. Political scientists are tackling the primary challenge that both quantitative and qualitative political science research traditions have lacked explicit guidelines as to what kinds of data and research materials should be shared.
  2. Putting research guidelines into practice. Political scientists are tackling the challenge of putting guidelines into practice, including creating ways to incentivize data sharing and enabling research transparency via replication studies.
  3. Building an infrastructure that promotes data sharing. Political scientists are tackling the challenge of providing a technological framework to support data sharing, particularly for non-numeric, qualitative research data.

Long answer

1. Creating guidelines for data access and research transparency

In October 2012, the American Association of Political Science (APSA) Council adopted new policies requiring transparency in political science research. These policies were a result of several years of discussions on data access, production transparency, and analysis transparency. The underlying principle is that openness is an indispensable element of credible research and rigorous analysis and is hence essential to both making and demonstrating scientific progress in political science. By August 2013, two documents, Guidelines for Data Access and Research Transparency for Quantitative Research in Political Science and Guidelines for Data Access and Research Transparency for Qualitative Research in Political Science, were drafted by the Data Access and Research Transparency (DA-RT) Ad Hoc Committee after having been circulated for comments—note that these are two separate guidelines. In January 2014, an issue of PS: Political Science & Politics (vol. 47, issue 1) featuring a series of articles on ‘Openness in Political Science’ was published. In this issue, nine scholars investigated the benefits of greater openness and offered ideas about how to make data access and research transparency more viable and desirable by all political scientists. A. Lupia and C. Elman make it clear that the new guidelines are “more consistent with current and emerging standards across the sciences.” C. Elman and D. Kapiszewski point out that political scientists’ shared interest in openness is best understood as a meta-standard that applies to all social inquiry. Indeed, the DA-RT project is an integral part of wider efforts in the social sciences to advance the cause of openness, transparency, legitimacy, and credibility. Yet, it is important to note that DA-RT focuses on political science and is developed by a political science community in which both specific challenges and opportunities have shaped research traditions in the field. Due to its community approach, DA-RT ideals are not imposed on political scientists—rather, DA-RT is a movement that anyone interested in political science can join.

2. Putting research guidelines into practice. (e.g., promoting research transparency via replication studies)

As stated by A. Lupia and C. Elman in this issue of PS, “recent discussions about openness are a rare and welcome example of dissimilar scholars finding opportunities for collaboration and common action.” As formulating guidelines for data access and research transparency is a big step forward, it is interesting to consider the factors that instigated the shared commitment to openness. In particular, there was growing concern that researchers could not replicate a significant number of empirical claims being made in political science’s leading journals. A practice of replication is nothing new to me—more than a decade ago, it was part of my graduate method training. Today, the same professor still teaches the same course for political science graduate students at Michigan with the same assignment called a replication and extension (or R&E) paper—and the assignment is still 35% of the grade! However, it was not until later that I fully understood the importance of sharing research data and the reasons behind replication studies.

The Washington Post’s ‘The Monkey Cage’ recently reported the results of a research project in which. N. Janz and her team distributed a survey to researchers on the Political Methodology mailing list to learn about replication assignments as part of graduate courses. They found that despite frequent replication practices, replication studies may still be under-utilized resources. One way to advance replication in political science, they argue, might be to create a website where researchers can widely share the replication studies conducted in graduate courses. What about formal publication venues for replicated studies? Unfortunately, unlike journals in the natural sciences that have a tradition of publishing replication studies, many political science or international relations journals still hesitate to publish replications, mostly due to scarce journal space, the lack of a place to store replication files, and underdeveloped journal policies regarding replication. In this issue of PSJ. Ishiyama cites a recent study revealing that out of 120 journals in political science and international relations, only 19 had explicit replication policies. Although I agree that some advances (e.g., developing publication venues for replication studies, adopting journal policies, promoting replication studies through graduate training) may improve reproducibility standards in political science, other issues, such as who is responsible for data provision and the enforcement of such provision, remain a challenge.

3. Building an infrastructure that promotes data sharing (e.g,. building a new home for qualitative social science data)

I wonder if some readers question why the APSA developed the two separate guidelines for data access and research transparency for quantitative and qualitative research if the common goal is to fuel a culture of openness that promotes effective knowledge transfer. It is because political science is a diverse discipline comprising multiple research communities that differ in their methodologies (e.g., ethnographic field work, laboratory-based experiments, statistical analysis of pre-existing datasets). Although the notion of openness has become increasingly relevant to the qualitative research tradition, data sharing and replication have been more prominent concerns for quantitative research for many years. For the quantitative research tradition, data archives and repositories (e.g., ICPSRDataverse) and the promotion of data sharing and re-use through data quality review have long been established. It is only recently, however, that qualitative data found a home in a new searchable repository for data that are not numbers. To provide a venue for storing, sharing, and preserving research data generated through qualitative and multi-method research in the social sciences, the Qualitative Data Repository (QDR) was recently launched at Syracuse University. As described by C. Elman, the QDR accepts many types of qualitative data including (but not limited to) unpublished primary sources, published primary sources, primary sources cited in secondary sources, secondary sources, and other research materials. As the QDR has only begun to call for pilot projects, it is too early to make judgments about the scale of data collections, the quality of data, and the discoverability of such data. It is safe to say, however, that this advancement certainly demonstrates the growing number of tools and infrastructures that have been recently developed for qualitative data sharing.

Concluding remarks

Within the weeks following IDCC14, I have tracked numerous tweets, news reports, blogs about data sharing, and announcements from the Public Library of Science (PLOS), Nature’s Scientific Data, and Figshare announcing data access policies, publication of data articles, and code and software sharing, respectively. Things are rapidly changing. Because data sharing and preservation tools and services are constantly evolving, I wanted to quietly sit down to re-capture the data conversations occurring in the social sciences, particularly political science. I hope this re-cap will help engage political scientists and social science data stewards in the same conversations and partnership in the emerging area like data publishing—just as V. Mitchell and J. Baker demonstrated at IDCC14 their collaborative data publishing pilot at the University of Oregon.

Posted in Uncategorized | Tagged , , , | Leave a comment

Deep Dive into Data

by Scott Martin

Following our successful two-part workshop in Research Data Concepts for librarians at the University of Michigan Library, the Data Education Working Group wanted to follow up with a series of workshops exploring subject-specific data landscapes. This presented an interesting challenge: since individual liaison librarians are responsible for discrete subjects, how do you present a subject-specific workshop in such a way as to make it useful to librarians who serve other subjects?

Our Deep Dive into Data subgroup members (including myself) concluded that the best approach was to use these workshops to explore a ‘method’ for investigating disciplinary data landscapes, using individual subjects as exemplars. Using this approach, we would be able to lead workshop participants through various data resources available for a particular subject, illustrating the ways in which exploration of one part of the landscape can lead organically to other parts. Data policies for funding agencies or publications, for example, may suggest specific repositories as possible locations for data deposit, and those repositories may in turn suggest or require specific metadata formats.

Diving spots

In order to appeal to a broad variety of subjects, we initially decided to offer separate workshops using a STEM discipline (Ecology), a Social Science discipline (Psychology), and an Arts & Humanities discipline as exemplars. During the course of our planning, it became apparent that it would be difficult to find enough resources supporting any single discipline in the humanities to serve as an effective example of the methodology. We opted instead to pursue a project-based approach for Arts & Humanities data, showcasing a variety of data projects native to these disciplines, while proceeding as planned with our STEM and Social Science examples.  As Biological Sciences Librarian, I crafted the outline of the Ecology session, which included the skeleton of the methodology we wished to present.  Other members of the Deep Dive subgroup provided feedback, and I partnered with group members Katherine Akers, Natsuko Nicholls, Angie Oehrli, and Susan Turkel to refine the approach and develop supplementary materials, including a brief Repository Description Tool for taking a quick snapshot of the essential features of a repository, as well as a more formal Deep Dive Workflow document to articulate our methodology independent of the workshops.

Deep dive into Ecology data, and what’s next?

I presented the Ecology Deep Dive to a group of 15 colleagues on February 28, leading them through 90 minutes of interactive exploration of the data policies of leading Ecology journals (e.g., Ecology, Journal of Ecology), a few key ecological data repositories (e.g., Dryad, Knowledge Network for Biocomplexity (KNB), LTER Data Portal), and other facets of the Ecology data landscape. Initial feedback was very positive, and we look forward to our next Deep Dive, in which Katherine Akers and Susan Turkel will lead an excursion into the research data landscape of Psychology.

Curious about Ecology data?

Check out some of these articles that surfaced in the course of my own investigations:

 Bendix, Nieschulze, and Michener (2012).  Data platforms in integrative biodiversity research. Ecological Informatics 11: 1-4.

 Enke et al. (2012).  The user’s view on biodiversity data sharing – Investigating facts of acceptance and requirements to realize a sustainable use of research data. Ecological Informatics 11: 25-33. 

 Fegraus, Andelman, Jones, and Schildhauer (2005).  Maximizing the value of ecological data with structured metadata: An introduction to Ecological Metadata Language (EML) and principles for metadata creation. Bulletin of the Ecological Society of America 86(3): 158-168.  

 Hampton et al. (2013).  Big data and the future of ecology. Frontiers in Ecology and the Environment 11(3): 156-162.  

 Michener and Jones (2011).  Ecoinformatics: Supporting ecology as a data-intensive science. Trends in Ecology & Evolution 27(2): 85-93. (See also other articles in this special issue on ecological and evolutionary informatics.)

Posted in Uncategorized | Tagged , , | 2 Comments

Research Lifecycle Model at UM

by Fe Sferdean

In the library’s efforts to support campus research and identify areas of engagement, a Research Lifecycle Committee was created to investigate, define, implement, and sustain the service model for supporting the research community. The committee is comprised of librarians from different areas such as Research, Learning Programs and Initiatives, Emerging Technologies, IT, Digital Preservation, Michigan Publishing, Technical Services, and the Health Sciences. With diverse expertise and perspectives, conversations during committee meetings often yielded many new insights. Examples of the topics discussed included research and data lifecycles for the social sciences, humanities, and basic and health sciences; strategies for assessing on-going campus needs; dealing with research data; data initiatives and projects; and ways of engaging more comprehensively with the campus by connecting researchers to the library services and other campus units throughout the research lifecycle.

Big task: Building and visualizing the research lifecycle model

One main focus of the committee was creating a generic research lifecycle model as a visual tool to communicate the research process and the library services to the U-M campus. Illustrating the research and data lifecycles helped the committee to analyze and discuss the definitions and overall layout of the different stages of research and how the data lifecycle is a part of the research process. Since it was challenging to agree on the best visualization, the committee decided to reconstruct the lifecycle model based on existing library services and consequently see the research process from the library’s perspective.

How did we tackle this task?

Three committee members formed a sub-group to tackle this challenge: Jeremy York (HathiTrust librarian from IT), Ye Li (Chemistry Librarian), and Fe Sferdean (CLIR postdoctoral fellow). After a list of services were compiled by the committee, the sub-group classified each one using two kinds of categories: service areas and research lifecycle stages or activities. Classifying services allowed us to identify research stages that the library specifically supports such as ideas and background research, find collaborators, seek funds, gather data, process and analyze data, manage data, create, publish, share, preserve, discovery, re-use, and entrepreneurship. Likewise, classifying services allowed us to identify the library’s specific service areas such as education and instruction, consultation, and infrastructure, which was divided into sub-categories: collections, computing and reproduction, access, spaces, and accessibility.

Visualization helps raise awareness

From this work, the sub-group created different visualizations. One was a poster geared for data-intensive researchers at the U-M Cyberinfrastructure (CI) Days Conference that listed the library’s data-related services offered at different stages of the research process. While the poster aimed to raise awareness of the library services, another visual—a heatmap showing the number of services offered at different stages of research—was also created with an internal, library audience in mind in order to identify gaps and potential areas for increasing support. 

Overall, the work has yielded insights into how the library can better engage with the research community and how it can serve as a useful referral tool that could help librarians to identify and partner with their colleagues with different expertise across multiple departments for supporting researchers and their projects.

Posted in Uncategorized | Tagged , , , | Leave a comment

Exploring the Possibility of Partnering with Disciplinary Data Repositories

by Katherine Akers

Academic librarians motivated to preserve the scholarly record of their institutions may promote the deposit of research data into institutional repositories instead of relevant disciplinary repositories, resulting in competition between the two types of repositories. Rather than being a competition, however, the relationship between institutional and disciplinary data repositories could be mutually beneficial to both universities and larger research communities.

That is, institutional, national, and international data repositories could be considered as ascending tiers of a ‘Data Pyramid’, with institutional repositories collecting a large swath of datasets that might otherwise be discarded or lost, and national or international repositories committing to ensuring access to datasets with the highest value, thereby increasing the visibility of the data to relevant communities of interest. This positive outcome depends on active partnerships between academic institutions and disciplinary data repositories. For instance, by virtue of being close to the source of research data, academic librarians or other local data curators could work directly with researchers to process and review data, create metadata and provide contextual information, and ingest data into institutional repositories, after which ‘archive-ready’ data packages could be pushed into disciplinary repositories.

Here at the University of Michigan, we think that the most effective way to support data sharing and preservation is to provide an institutional data repository as well as to facilitate the use of disciplinary data repositories. At the upcoming International Digital Curation Conference (IDCC), Jen Green will be presenting a paper authored by her and myself entitled ‘Toward a Symbiotic Relationship between Academic Libraries and Disciplinary Data Repositories: A Dryad and University of Michigan Case Study’, in which we propose three concrete ways in which our library could partner with Dryad, a repository for data associated with articles published in science and medical journals.

First, the library could become a Dryad member organization, allowing us to nominate and elect Dryad Board of Director members and vote on amendments to bylaws at the annual Dryad membership meeting. As the majority of current member organizations are journal publishers, the inclusion of academic libraries into the circle of Dryad members would widen the pool of stakeholder perspectives that guide the future of Dryad and shape the research data ecosystem.

Second, by becoming a Dryad member organization, the library could take advantage of a discounted pricing plan to financially assist researchers with submitting data to Dryad. That is, we could purchase vouchers that would cover the cost of future data submissions and establish a system for distributing those vouchers on our campus.

Third, the library could host a local Dryad curator. This curator could be trained in the Dryad data curation workflow, identify datasets that could be submitted to Dryad, and remotely ingest data into Dryad. Because this local curator could interact directly with campus researchers, he or she could also increase the likelihood that data can be used by others in the future by ensuring that data are adequately described in codebooks or ‘readme’ files, converting data files into non-proprietary formats, and verifying the completeness of data. 

Researchers tend to align with their disciplinary communities more than their institutions, and disciplinary repositories may make data more visible to relevant communities of interest. Therefore, libraries should go beyond promoting the deposit of data into institutional repositories and actively seek to partner with disciplinary data repositories, which could make a significant contribution to capturing and preserving important research datasets that otherwise might be discarded or lost.

Posted in Uncategorized | Tagged , , , | Leave a comment