This is the repository for McMaster University Libraries presentation for OLA Superconference 2012, "A Digital Scholarship Centre? What is that!", presented on February 2nd, 2012 by Dale Askey, John Fink and Nick Ruest.
z, ? | toggle help (this) |
space, → | next slide |
shift-space, ← | previous slide |
d | toggle debug mode |
## <ret> | go to slide # |
c, t | table of contents (vi) |
f | toggle footer |
r | reload slides |
n | toggle notes |
p | run preshow |
This is the repository for McMaster University Libraries presentation for OLA Superconference 2012, "A Digital Scholarship Centre? What is that!", presented on February 2nd, 2012 by Dale Askey, John Fink and Nick Ruest.
Relax and breathe. Intro yourself.
Could not do this with standard admin. Note that some work scenarios demand that outside parties have access (even root!) with inside machines: using VM insures that we can provide that if needed. Also, ~10-15 minutes to spin up a completely new machine ready to go, can't beat that.
KVM is a particular kind of Linux virtualization technology that requireshypervisor support in hardware. .deb is because I am a Debian fascist. Explain clusterssh. Designed for my comfort.
I am one guy and am not undergoing cellular mitosis. So at this point we are sort of OK, but scaling beyond is going to get hard. What's stopping me? laziness. Puppet/Chef are new. Laziness will always be a driver, so at the point where it's more lazy to use Puppet or Chef than do things clusterssh, I will switch.
At McMaster sadly enough historically most of the attention has been on STEM. However this is also AWESOME really because underserver population = a market that will think you are wonderful for helping them out. Grad students especially.
This is hard. It's part of that plumbing thing that needs being done but is distinctly unsexy and a bit of a slog. But without it, people (can/(will?)) just roll right over you. Have a structure document is important. Deciding on that structure doco has to (like a lot of stuff) be a collaborative endeavour. Documentation is (almost) always the thing that gets done as an afterthought.
Hey, man, this is a service profession innit? Still here, even with this.So we're planning on having in-person consults, some species of regular office hours, as well as being available online.
We're still really figuring this out. It's both a good and a bad thing. Early on, it's really great to cast your net as wide as possible, get some projects under your belt so you have something to point to, but later on if (when!) you get super popular, this becomes less scalable and that's where the SLA comes in.
Source code as the new peer review. This is an easy way we can demonstrate our utility as well as give back and make a name for ourselves generally. Viz CHNM, with Zotero, etc. VERY important that we have license flexibility (DFSG compliant) as we'd like to default to an open source license (as part of SLA? maybe). github is pre-eminent source code hosting service. http://github.com/scds/ We did our slide deck there, f'rinstance. I sometimes elevator-speech it as "Google Docs on steroids".
We will try to be flexible on many fronts -- languages, operating systems, service hours and service levels, but I would like to stick with one standard for documenting and sharing code. We've been using git for a while. Having documented changes and distributed backups of stuff is a Good Idea. BUT doing version control -- not just git, any VC -- requires a major rethinking of workflow and may be a harder sell to more conservative types.
A lot of times clients think they know what they need but maybe they're not 100%. That's part of what PM is, working out specs. Mention tribulations with Ali/Sevigny project. Most/(All?) of our projects also live on an internal bug tracker (Redmine) that also functions as project management software and might/will/possibly function also as CRM system too.
I am an odd duck.
I don't really fit in anywhere, but I fit in everywhere?
A position like mine does not fit perfectly into any single traditional library department. I like to joke that I have a lot of feet since I have a foot in nearly every department. I have a natural connection to IT given the tech behind most of what I do. But, at its heart, digital collections/preservation is just another form of collection development. I've bounced around a few departments over my time at Mac, but I think I have finally found a home in the centre. It is the best possible scenario at this point.
one of my standard phases at work is INFRASTRUCTURE! Given that when I started at Mac there was nearly zero infrastructure for digitization or repository structure, we had to basically start from scratch. Over the last few years we have built some decent infrastructure for digitization. We have a hosted institutional repository from Bepress (DigitalCommons) and we have the begun the work of building a solid local digital repository with Fedora Commons and Islandora. We're also slowly working on policy and best practices for digital preservation.
Digitization is one thing we are really good at. We have digitized an insane amount of materials over the last few years. But, the problem is a lot of it isn't publically available. This should begin to change very soon with the new digital repository. These are some local projects we have completed or are still ongoing. talk about each project briefly.
besides or local projects, we have partnered with a couple of vendors to digitize entire collections. Normally hearing the word vendor partnership is something I would immediately scoff at, but these partnerships are not too bad when you think about them. talk about each partnership briefly.
As I previously touched on, in terms of digitization we have grown very quickly. This has caused a number of issues in regards to infrastrutre. Yes, we may have scanned a couple thousand books - but are they publically available? NO. What is needed is a health balance between digitization and infrastructure. What I mean by infrastructure is having a repository to put all this stuff and make it publicly available. Thousands upons thousands of digital objects sitting on a storage arrays is of no use to anybody, and presents any number of issues in regards to digital preservation. What is truly needed is proper staffing, hardware infrastructure, and repository platform combined with some robust workflows and policy. None of which is easy in a spartan environment.
architecting a digital repository
I don't have time to go into a full blown digital preservation talk here, so I will keep it really brief. At its heart, digital preservation is a series or organizational changes. Your organization must be commited to these changes in practice else all will be for naught. So, basically there will be a lot of documentation. Document everything you do. Documention every change to a object, and transaction with that object. Policy-wise, all of your digital preservation practices must be tied to some sort of organization policy. Something to fall back on. And, as for the technical stuff, you need a solid infrastructure. A repository, and microservices to take care of the tedious nitty-gritty stuff like automatically converting tifs to jpeg2000s upon ingest. There is a wonderful community around all of this, and it is without a doubt your bestfriend.
Give the case scenario of PW20C vs. Virtual musuem of the holocaust and resitance.
Policy on what is required to ingest a collection into the repository - this is minumum requirements for objects and associated metadata. Consulting on best practices for digital preservation for a project, our us taking ownership of the objects. Harvesting - knowledge sharing about what harvesting is, and what can be done with it. Metadata - best practices, and guidance for setting up metadata requirements for a project. Best practices is just a catch all for everything that doesn't neatly fall into one of these areas. Project management - Guidance and knowledge sharing on project management. We've all had our hands in a number of projects, and lead our fair share. Here we share our best practices and can provide a sort of mentoring role, or actual project lead. Version control - John touched on this earlier, but we are huge version control advocates, specifically git. We can provide a lot of support and guidance in transitioning a project to version control, or from another system to git, or just the odd git question.
I am an odd duck.
I don't really fit in anywhere, but I fit in everywhere?
A position like mine does not fit perfectly into any single traditional library department. I like to joke that I have a lot of feet since I have a foot in nearly every department. I have a natural connection to IT given the tech behind most of what I do. But, at its heart, digital collections/preservation is just another form of collection development. I've bounced around a few departments over my time at Mac, but I think I have finally found a home in the centre. It is the best possible scenario at this point.
one of my standard phases at work is INFRASTRUCTURE! Given that when I started at Mac there was nearly zero infrastructure for digitization or repository structure, we had to basically start from scratch. Over the last few years we have built some decent infrastructure for digitization. We have a hosted institutional repository from Bepress (DigitalCommons) and we have the begun the work of building a solid local digital repository with Fedora Commons and Islandora. We're also slowly working on policy and best practices for digital preservation.
Digitization is one thing we are really good at. We have digitized an insane amount of materials over the last few years. But, the problem is a lot of it isn't publically available. This should begin to change very soon with the new digital repository. These are some local projects we have completed or are still ongoing. talk about each project briefly.
besides or local projects, we have partnered with a couple of vendors to digitize entire collections. Normally hearing the word vendor partnership is something I would immediately scoff at, but these partnerships are not too bad when you think about them. talk about each partnership briefly.
As I previously touched on, in terms of digitization we have grown very quickly. This has caused a number of issues in regards to infrastrutre. Yes, we may have scanned a couple thousand books - but are they publically available? NO. What is needed is a health balance between digitization and infrastructure. What I mean by infrastructure is having a repository to put all this stuff and make it publicly available. Thousands upons thousands of digital objects sitting on a storage arrays is of no use to anybody, and presents any number of issues in regards to digital preservation. What is truly needed is proper staffing, hardware infrastructure, and repository platform combined with some robust workflows and policy. None of which is easy in a spartan environment.
check out the infograph here: http://ruebot.net/macrepo-visual/index.html
I don't have time to go into a full blown digital preservation talk here, so I will keep it really brief. At its heart, digital preservation is a series or organizational changes. Your organization must be commited to these changes in practice else all will be for naught. So, basically there will be a lot of documentation. Document everything you do. Documention every change to a object, and transaction with that object. Policy-wise, all of your digital preservation practices must be tied to some sort of organization policy. Something to fall back on. And, as for the technical stuff, you need a solid infrastructure. A repository, and microservices to take care of the tedious nitty-gritty stuff like automatically converting tifs to jpeg2000s upon ingest. There is a wonderful community around all of this, and it is without a doubt your bestfriend.
Give the case scenario of PW20C vs. Virtual musuem of the holocaust and resitance.
Policy on what is required to ingest a collection into the repository - this is minumum requirements for objects and associated metadata. Consulting on best practices for digital preservation for a project, our us taking ownership of the objects. Harvesting - knowledge sharing about what harvesting is, and what can be done with it. Metadata - best practices, and guidance for setting up metadata requirements for a project. Best practices is just a catch all for everything that doesn't neatly fall into one of these areas. Project management - Guidance and knowledge sharing on project management. We've all had our hands in a number of projects, and lead our fair share. Here we share our best practices and can provide a sort of mentoring role, or actual project lead. Version control - John touched on this earlier, but we are huge version control advocates, specifically git. We can provide a lot of support and guidance in transitioning a project to version control, or from another system to git, or just the odd git question.