Monday, November 30, 2015

Sharing the load: Jisc RDM Shared Services events

Sharing the load: Jisc RDM Shared Services events. Chris Awre. Digital Archiving blog. 25 November 2015.
     This post is a summary of the Jisc event he attended that was looking at shared services for research data management.  Most academic institutions are struggling to manage research data and some form of shared service provision will be of benefit.  The presentation "Digital Preservation Requirements for Research Data Management" that he and Jenny Mitcham gave "highlighted the importance of digital preservation as part of a full RDM service, stressing of how a lack of digital preservation planning has led to data loss over time, and how consideration of requirements has been based on long established principles from the OAIS Reference Model". Any RDM shared service should include digital preservation capabilities. There is a need to provide a suit of shared services, including providing a shared service platform for digital preservation and providing independent digital preservation tools.

Friday, November 27, 2015

High-speed digitization of historic artifacts

CultLab3D’s 3D scanning conveyor belt allows high-speed digitization of historic artifacts. Benedict. 3D printer and 3D printing news website. Nov 19, 2015.
     Researchers at the Fraunhofer Institute for Computer Graphics Research IGD have developed CultLab3D: a 3D scanning system that can create digital images of 3D objects. The project aim is to provide mass digitization, annotation and storage of historical artifacts for museums and other places of preservation. Quotes and notes from the article:
  • "Digital preservation is one of the most important methods of sustaining our cultural history."
  • digital preservation makes it possible to created and maintain scans of written texts
  • "Digital preservation of texts is one thing, but the preservation of physical artifacts is quite another."
  • while there is no real substitute for an authentic historical artifact, something should be done to preserve historical artifacts
This organization believes that the digital preservation of historical artifacts via 3D scanning is undoubtedly a worthwhile endeavor.

Wednesday, November 25, 2015

Tool Time, or a Discussion on Picking the Right Digital Preservation Tools for Your Program: An NDSR Project Update

Tool Time, or a Discussion on Picking the Right Digital Preservation Tools for Your Program: An NDSR Project Update. John Caldwell; Erin Engle. The Signal. November 17, 2015    
     "There are lots of tools out there, from checksum validators to digital forensics suites and wholesale preservation solutions." Instead of wanting the latest tool, ask if this right tool is right for you for this situation?  The NDSA project is looking at:

  •     studying current workflows;
  •     benchmarking current policies against best practices;
  •     reviewing and testing potential digital curation applications;
  •     proposing sustainable workflows that align with current digital curation standards; and
  •     producing a white paper to sum up current processes and propose next steps.

In order to determine what the right tool, there are some things you need to know:
  1. know your records: how electronic records are being managed, how archivists are processing them, and what happens with the materials after.
  2. what do you want the end result to be. 
  3. what tool to use for the task
    1. Placement: Where does the tool fit into your process?
    2. Purpose: What does the tool actually do?
    3. Utility: How easy is the tool to use and does its output make sense?
"The seemingly straightforward question of utility is fundamentally tied to the question of purpose, and also the viability question: is the tool a long-term solution or a quick fix for today?" They are finding that they need to add preservation metadata to the records and establish the record integrity as early in the lifecycle as possible.
 An interesting comment on the blog post: "Digital preservation systems are precisely that: systems. Systems are a complex set of elements (people, technologies) and the connections between them (policies, procedures). Without all of these pieces, there really isn’t a system. There is just a tool. A hammer isn’t a house, just as a tool isn’t a digital preservation system."

Tuesday, November 24, 2015

Five Takeaways from AOIR 2015

Five Takeaways from AOIR 2015. Rosalie Lack. Netpreserve blog. 18 November 2015. 
     A blog post on the annual Association of Internet Researchers (AOIR) conference in
Phoenix, AZ. The key takeaways in the article:
  1. Digital Methods Are Where It’s At.  Researchers are recognizing that digital research skills are essential. And, if you have some basic coding knowledge, all the better. The Digital Methods Initiative from Europe has tons of great information, including an amazing list of tools.
  2. Twitter API Is also Very Popular
  3. Social Media over Web Archives. Researchers used social media more than web archived materials.  
  4. Fair Use Needs a PR Movement. There is a lot of misunderstanding or limited understanding of fair use, even for those scholars who had previously attended a fair use workshop. Many admitted that they did not conduct particular studies because of a fear of violating copyright. 
  5. Opportunities for Collaboration.  Many researchers were unaware of tools or services they can use and/or that their librarians/archivists have solutions.
There is a need for librarians/archivists to conduct more outreach to researchers and to talk with them about preservation solutions, good data management practices and copyright.

Monday, November 23, 2015

Introduction to Metadata Power Tools for the Curious Beginner

Introduction to Metadata Power Tools for the Curious Beginner. Maureen Callahan, Regine Heberlein, Dallas Pillen. SAA Archives 2015. August 20, 2015.   PowerPoint  Google Doc 
      "At some point in his or her career, EVERY archivist will have to clean up messy data, a task which can be difficult and tedious without the right set of tools." A few notes from the excellent slides and document:

Basic Principles of Working with Power Tools
  • Create a Sandbox Environment: have backups. It is ok to break things
  • Think Algorithmically: Break a big problem down into smaller steps
  • Choosing a Tool: The best tools, works for your problem and skill set
  • Document: Successes, failures, procedures
Dare to Make Mistakes
  • as long as you know how to recognize and undo them!
  • view mistakes as an opportunity
  • mistakes can teach you as much about your data as about your tool
  • share your mistakes so others may benefit
  • realize that everybody makes them
General Principles
  • Know the applicable standards
  • Know your data
  • Know what you want
  • Normalize your data before you start a big project
  • The problem is intellectual, not technical
  • Use the tools available to you
  • Don’t do what a machine can do for you
  • Think about one-off operations vs. tools you might re-use or re-purpose
  • Think about learning tools in terms of raising the level of staff skill
  • XPath
  • Regex
  • XQuery
  • XQuery Update
  • XSLT
  • batch
  • Linux command line
  • Python
  • AutoIt

The Provenance of Web Archives

The Provenance of Web Archives. Andy Jackson; Jason Webber. UK Web Archive blog. 20 November 2015.
     More researchers are taking an interest in web archives.  The post author says their archive has "tried to our best to capture as much of our own crawl context as we can." In addition to the WARC request and response records, they store other information that can answer how and why a particular resource has been archived:
  • links that the crawler found when it analysed each resource 
  • the full crawl log, which records DNS results and other situations
  • the crawler configuration, including seed lists, scope rules, exclusions etc.
  • the versions of the software we used  
  • rendered versions of original seeds and home pages  and associated metadata.
Th archive doesn't "document every aspect of our curatorial decisions, e.g. precisely why we choose to pursue permissions to crawl specific sites that are not in the UK domain. Capturing every mistake, decision or rationale simply isn’t possible, and realistically we’re only going to record information when the process of doing so can be largely or completely automated". In the future, there "will be practical ways of summarizing provenance information in order to describe the systematic biases within web archive collections, but it’s going to take a while to work out how to do this, particularly if we want this to be something we can compare across different web archives."

No archive is perfect. They "can only come to be understood through use, and we must open up to and engage with researchers in order to discover what provenance we need and how our crawls and curation can be improved. " There are problems need to be documented, but researchers "can’t expect the archives to already know what they need to know, or to know exactly how these factors will influence your research questions."

Saturday, November 21, 2015

How Much Of The Internet Does The Wayback Machine Really Archive?

How Much Of The Internet Does The Wayback Machine Really Archive? Kalev Leetaru. Forbes.  November 16, 2015.
     "The Internet Archive turns 20 years old next year, having archived nearly two decades and 23 petabytes of the evolution of the World Wide Web. Yet, surprisingly little is known about what exactly is in the Archive’s vaunted Wayback Machine." The article looks at how the Internet Archive archives sites and suggests "that far greater understanding of the Internet Archive’s Wayback Machine is required before it can be used for robust reliable scholarly research on the evolution of the web." It requires a more "systematic assessment of the collection’s holdings." Archive the open web uses enormous technical resources.

Maybe the important lesson to learn is that we have little understanding of what is actually in the data we use and few researchers really explore the questions about the data.  The archival landscape of the Wayback Machine was far more complex than original realized, and it is unclear how the Wayback Machine has been constructed. This insight is critical. "When archiving an infinite web with finite resources, countless decisions must be made as to which narrow slices of the web to preserve." The selection can be either random or prioritized by some element.  Each approach has distinct benefits and risks.

Libraries have formalized over time how they make collection decisions. Web archives must adopt similar processes.  The web is "disappearing before our very eyes" which can be seen in the fact that  up to 14% of all online news monitored by the GDELT Project is no longer accessible after two months".  We must "do a better job of archiving the online world and do it before this material is lost forever."

Friday, November 20, 2015

Hydra: Get a head on your repository

Hydra: Get a head on your repository.  Hydra Project website. November 2015.
  • Hydra is a Repository Solution:  Hydra is an open source software repository solution used by institutions worldwide to provide access to their digital content.  Hydra software provides a versatile and feature rich environment for end-users and repository administrators.
  • Hydra is a Community: Hydra is a large, multi-institutional collaboration that gives institutions the ability to combine their repository development efforts into a collective solution beyond the capacity of any individual institution to create, maintain or enhance on its own. The project motto is “if you want to go fast, go alone.  If you want to go far, go together.”
  • Hydra is a Technical Framework: Hydra is an ecosystem of components that lets institutions build and deploy robust and durable digital repositories supporting multiple “heads”, which are fully-featured digital asset management applications and tailored workflows.  Its principal platforms are the Fedora Commons repository software, Solr, Ruby on Rails and Blacklight.  Hydra does not yet support “out-of-the-box” deployments but the Community is working towards such “solution bundles”, particularly “Hydra in a Box” and Avalon.

Developing Best Practices in Digital Library Assessment: Year One Update

Developing Best Practices in Digital Library Assessment: Year One Update. Joyce Chapman, Jody DeRidder, Santi Thompson. D-Lib Magazine. November 2015.
     While research and cultural institutions have increased focus on online access to special collections in the past decade, methods for assessing digital libraries have yet to be standardized. Because of limited resources and increasing demands for online access, assessment has become increasingly important. Library staff do not know how to begin to assess the costs, impact, use, and usability of digital libraries. The Digital Library Federation Assessment Interest Group is working to develop best practices and guidelines in digital library assessment. The definition of a digital library used is "the collections of digitized or digitally born items that are stored, managed, serviced, and preserved by libraries or cultural heritage institutions, excluding the digital content purchased from publishers."

They are considering two basic questions:
  1.     What strategic information do we need to collect to make intelligent decisions?
  2.     How can we best collect, analyze, and share that information effectively?
There are no "standardized criteria for digital library evaluation. Several efforts that are devoted to developing digital library metrics have not produced, as yet, generalizable and accepted metrics, some of which may be used for evaluation. Thus, evaluators have chosen their own evaluation criteria as they went along. As a result, criteria for digital library evaluation fluctuate widely from effort to effort." Not much has changed in the last 10 years in the area in regards to digitized primary source materials and institutional repositories. "Development of best practices and guidelines requires a concerted engagement of the community to whom the outcome matters most: those who develop and support digital libraries". The article shares "what progress we have made to date, as well as to increase awareness of this issue and solicit participation in an evolving effort to develop viable solutions."

Thursday, November 19, 2015

Old formats, new challenges: preservation in the digital world

Old formats, new challenges: preservation in the digital world. Kevin Bunch. C & G News. November 13, 2015.
     Without proper preservation, digital materials are going to degrade and become useless. Digital preservation is "basically coming up with policies and procedures to address mostly the obsolescence that happens with digital content. We know file formats die, we know operating systems and platforms die at some point, so how do we sustain this digital content through time?” In addition to hardware and media failing, there are also difficulties in reading old formats. Archivists generally try to convert files from on old format to an “open” format that will hopefully be in use for some time into the future. Some people work at converting analog media, like audio and video recordings, to open digital formats. It can be challenging as older equipment is outdated and fails. Analog magnetic media formats like VHS and audio cassettes are also "at an ever-increasing risk of deterioration, especially those from the 1980s or 1990s, and should be digitized as soon as possible".

Wednesday, November 18, 2015

iPRES workshop report: Using Open-Source Tools to Fulfill Digital Preservation Requirements

iPRES workshop report: Using Open-Source Tools to Fulfill Digital Preservation Requirements.  Jenny Mitcham. Digital Archiving at the University of York. 12 November 2015.
     The ‘Using Open-Source Tools to Fulfill Digital Preservation Requirements’ workshop provided a place to talk about open-source software and share experiences about implementing open-source solutions. Archivematica, Archivespace, Islandora and BitCurator (and BitCurator Access) were also discussed.

Sam Meister of the Educopia Institute talked about a project proposal called OSSArcFlow. "This project will attempt to help institutions combine open source tools in order to meet their institutional needs. It will look at issues such as how systems can be combined and how integration and hand-offs (such as transfer of metadata) can be successfully established". The lessons learned (including workflow models, guidance and training) will be available to others besides the 11 partners. 

Digital Preservation Videos for the Classroom

Back to School: Digital Preservation Videos for the Classroom. Erin Engle. The Signal, Library of Congress. August 30, 2013.
     There have been some educational programs created geared toward students and about the K-12 Web Archiving Program.  There is a Digital Preservation Video Series and here is a list of videos that educators may find most relevant. Some of those videos include:

Tuesday, November 17, 2015

Born Digital: Guidance for Donors, Dealers, and Archival Repositories

Born Digital: Guidance for Donors, Dealers, and Archival Repositories. Gabriela Redwine, et al. Council on Library and Information Resources. October 2013. [PDF]
     "Until recently, digital media and files have been included in archival acquisitions largely as an afterthought." People may not have understood how to deal with digital materials, or staff may not be prepared to manage digital acquisitions. The object is to offer guidance to rare book and manuscript dealers, donors, repository staff, and other custodians to help ensure that digital materials are handled, documented appropriately, and arrive at repositories in good condition, and each section provides recommendations for donors, dealers, and repository staff..

The sections of the report cover:
  • Initial Collection Review
  • Privacy and Intellectual Property
  • Key Stages in Acquiring Digital Materials
  • Post-Acquisition Review by the Repository
  • Appendices, which include: 
    • Potential Staffing Activities for the Repository
    • Preparing for the Unexpected: Recommendations
    • Checklist of Recommendations for Donors and Dealers, and Repositories
Some thoughts and quotes from the report:
  • it is vital to convince all parties to be mindful of how they handle, document, ship, and receive digital media and files.
  • Early communication also helps repository staff take preliminary steps to ensure the archival and file integrity, as well as the usability of digital materials over time.
  • A repository’s assessment criteria may include technical characteristics, nature of the relationship between born-digital and paper materials within a collection, information about context and content, possible transfer options, and particular preservation challenges.
  • Understand if there is a possibility that the digital records include the intellectual property of people besides the creator or donor of the materials.
  • Clarify in writing what digital materials will be transferred by a donor to a repository
    (e.g., hard drives, disks, e-mail archives, websites)
  • It is strongly recommended that donors and dealers seek the
    guidance of archival repositories before any transfer takes place.
  • To avoid changing the content, formatting, and metadata associated with the files, repositories
    must establish clear protocols for the staff’s handling of these materials.
The good practices in this report can help reduce archival problems with digital materials. "Early
archival intervention in records and information management will help shape the impact on archives of user and donor idiosyncrasies around file management and data backup."

Monday, November 16, 2015

Fixity Architecting for Integrity

Fixity Architecting for Integrity. Scott Rife, Library of Congress, presentation. Designing Storage Architectures for Digital Collections 2015. September 2015. [PDF]
     The Problem: “This is an Archive. We can’t afford to lose anything!” They are custodians to the history of the United States and do not want to consider that the loss of data is likely to happen. The current solutions:
  • At least 2 copies of everything digital
  • Test and monitor for failures or errors
  • Refresh the damaged copy from the good copy
  • This process must be as automated as possible
  • Recognize that someday data loss will occur
Fixity is the process of verifying that a digital object has not been altered or corrupted. It is a function of the whole architecture of Archive/Long Term Storage (hardware, software, network, processes, people, budget)
What costs are reasonable to reduce the loss of data?
Need to understand the possible solutions.  How much more secure will our customers content be if:
  • There is a third, fourth or fifth copy?
  • All content is verified once a year versus every 5 years?
  • More money is spent on higher quality storage?
  • More staff are hired
RAID, erasure encoding, is at risk due to larger disk sizes. With storage, there is a wide variation in price, performance and reliability. Performance and reliability are not always correlated with price. Choose hardware combinations to limit likely failures based on your duty cycle

Background reading list for Designing Storage Architectures for Digital Collections

Background reading list. Designing Storage Architectures for Digital Collections. Library of Congress. September 9, 2015.
     A list of items that may be representative of materials and projects related to the meeting topics. They might be useful to provide context for the meeting topics:

Friday, November 13, 2015

Alternatives for Long-Term Storage Of Digital Information

Alternatives for Long-Term Storage Of  Digital Information. Chris Erickson, Barry Lunt. iPres 2015. November 2015.   Poster  Abstract
     This is the poster and abstract that Dr. Lunt and I created and was presented at iPres 2015. The most fundamental component of digital preservation is storing the digital objects in archival repositories. Preservation Repositories must archive digital objects and associated metadata on an affordable and reliable type of digital storage. There are many storage options available; each institution should evaluate the available storage options in order to determine which options are best for their particular needs. This poster examines three criteria in order to help preservationists determine the best storage option for their institution:
  1. Cost
  2. Longevity
  3. Migration Time frame
Each institution may have different storage policies and environments. Not every situation will be the same. By considering the criteria above (the storage costs, the average lifespan of the media and the migration time frame), institutions can make a more informed choice about their archival digital storage environment. The poster has more recent cost information than what is in the abstract.

Thursday, November 12, 2015

Digital Curation Decision Form

Digital Curation Decision Form. Chris Erickson. Harold B. Lee Library. November 13, 2015.
Latest version is found here: Policies and Procedures
     This is the [former] version of our Digital Curation Decision Form (old version). The form is used by subject specialists (curators, subject librarians, or faculty responsible for collections) to determine
  • what materials should be included in our Rosetta Digital Archive; 
  • whether additional copies are needed, including copies on M-Discs; and 
  • whether or not the digital collection is a preservation priority. 
Additional questions ask about access to the preservation copies; the preservation actions needed; and directions on content options if format migration is needed. The form was created to help subject specialists determine what should be preserved, even if they are unaware of digital preservation topics. In practice, we complete the form during an interview with new subject specialists. Documentation will be added when the final version is approved.

Monday, November 09, 2015

Web Archiving Questions for the Smithsonian Institution Archives

Five Questions for the Smithsonian Institution Archives’ Lynda Schmitz Fuhrig. Erin Engle. The Signal. October 6, 2015.   
     Article about the Smithsonian's Archives and what they are doing. Looks at the Smithsonian Institution archives its own sites and the process. Many of the sites contain significant content of historical and research value that is now not found elsewhere. These are considered records of the Institution that evolve over time and they consider that it would irresponsible as an archives to only rely upon other organizations to archive the websites. They use Archive-It to capture most of these sites and they retain copies of the files in their collections. Other tools are used to capture specific tweets or hashtags or sites that are a little more challenging due to the site construction and the dynamic nature of social media content.

Public-facing websites are usually captured every 12 to 18 months, though it may happen more frequently if a redesign is happening, in which case the archiving will happen before and after the update. An archivist appraises the content on the social media sites to determine if it has been replicated and captured elsewhere.

The network servers at the Smithsonian are backed up, but that is the not the same as archiving. Web crawls provide a snapshot in time of the look and feel of a website. "Backups serve the purpose of having duplicate files to rely upon due to disaster or failure" and are only saved for a certain time period. The website archiving we do is kept permanently. Typically, website captures may not going to have everything because of excluded content, blocked content, or dynamic content such as Flash elements or calendars that are generated by databases. Capturing the web is not perfect.

Monday, November 02, 2015

Emulation as a Tool. What Can Emulation Do for You?

Emulation as a Tool. What Can Emulation Do for You? Dr. Klaus Rechert. CurateGear 2015. January 7, 2015.
     Emulation can be used as a tool for:
  • Contextualization, To identify, describe and preserve object environments
  • Generalization. To allow the environment to be run everywhere
  • Preservation Planning. Prepare environments to run long term
  • Publication & Access. Provide citation of objects in context; allow reuse
Emulation as a Service (EaaS)
  • Encapsulation of different emulators and technology to common component 
  • Centralize technical services
  • Hide technical complexity of emulation through web interfaces
  • Browser-based access
Preservation of and access to inherited personal digital assets
  • Provides citation support
  • Available with simple browser-based access 
  • Make emulated content embeddable and shareable like Youtube videos 

The Shanghai Library Selects Ex Libris Rosetta

The Shanghai Library Selects Ex Libris Rosetta. Press release. Ex Libris. November 2, 2015.
     The Shanghai Library, the second largest library in China and one of the world’s largest public libraries, chose Rosetta to manage and preserve its vast collection of digitized records such as ancient books, sound recordings, manuscripts, genealogy resources, archives (such as the Sheng Xuanhuai Archives, books and journals published in the Republic period, and the North China Daily News). Rosetta’s support for multiple languages and its customized Chinese interface will enable library staff to deposit diverse content into the system and expose a wide range of rich Chinese heritage to the world. "Rosetta was the only solution on the market that supports the whole spectrum of digital asset management and preservation, from ingest and export, to collection management and publishing.”

Research data management: A case study

Research data management:  A case study. Gary Brewerton. Ariadne, 74. October 12, 2015.
Loughborough University faced a number of challenges in meeting the expectations of its research funders, especially in three areas:
  • publishing the metadata describing the research data that it holds
  • where appropriate providing access to the research data
  • preserving the research data for at least ten years since last accessed
They did a survey of their research groups to determine existing data management practices and storage requirements. The data could take a variety of formats and vary dramatically in size. Also, not all the data collected by the researchers would need to be preserved. This made it hard to predict the amount of storage needed. Instead of using the existing institutional repository, at possible archiving and discovery solutions and decided on two:
  • Arkivum: a digital archiving service guaranteeing long-term preservation of data deposited
  • figshare: a cloud-based research sharing repository
Each of these answered a different need: "Arkivum could provide the storage and preservation required, whilst figshare addressed the light-touch deposit process and discoverability of the research data." Both suppliers were asked to work together to develop a platform to meet all the University’s needs, and a two tier implementation occurred, and faculty reaction to the platform has been very positive to the interface and the deposit workflow.  It "remains to be seen how researchers will engage with the platform in the mid- to long- term, but it is clear that advocacy will need to remain an ongoing process if the platform is going to achieve continued success."