Photographing Documents

Welcome back everyone in HIST 4006! Another semester beginning means that the blogposts are due to start once again. I’m not sure how many of you recall, but in my blogpost reflecting on my seminar lead I mentioned wanting to start my preparations for my next seminar lead over the Christmas break. This unfortunately did not happen and as a result, I largely repeated the same process over again, stressing myself out the entire way! So, here is my pre-seminar blogpost.

This week we will be discussing photographing and digitizing documents. Our readings focused on a variety of topics from types of images to various digital technologies that can be used to capture these photos.

Our readings went over the definition and intended use for the file types TIFF, GIF, JPG, RAW, BMP and PSD/PSP. They highlighted the pros and cons of each, as well as detailing the various compression capacities each file format is capable of. The readings also discussed things to keep in mind when undertaking a digitization project including costs, legal issues, research, and preservation techniques among other things.

There are two types of compressions that were discussed in the readings: lossless and lossy. Lossless is a compression format that discards no information, whereas lossy is a format that accepts some loss of information in order to have smaller file sizes.

We were required to watch an assortment of videos this week as well, which explained various functions of some of the high-tech scanners that are available at some institutions. In addition to the scanners, some of the videos were propaganda that institutions were using to spread the word about their digitization efforts to a wider audience. Some of these were done successfully, and I found to be very interesting, while others were quite dry.

Some questions to consider for class tomorrow:

  • RAW images are commonly not compatible when switching between devices. Do you believe these are a valuable resource to consider when photographing documents? Why/Why not?
  • Some of the videos we were required to watch for class today, despite attempting to gather interest into digitization initiatives, were quite dry. What are some things about these videos that we should aim to avoid in our publicizing practices for our own exhibit? What are some things we maybe should consider adopting?
  • One of the articles we were required to read today stated that a large aim of the project they were working on was to bring access to these documents to audiences outside of academia to read ‘in coffee shops or on the bus’. Do you think this is something people will actually do? If not, what are some ways these institutions can encourage people to take interest in these documents?

 

See you all in class,

EJG.

Preparing for Week 4

This week we are looking at Github and the implications of creating an Open Notebook on the World Wide Web!  Here are some questions/ideas to consider when looking at this week’s readings.

1) What barriers are there to an open source notebook? Consider a collogue or academic that does not want their work to be shared? What implications does this have? Consider Ian Milligan’s article and this idea of, “it’s our data, we collected it, and if somebody else wants the data, they should collect it themselves.”. What about the articles from WIRED and how do they relate to the creation of barriers?

2) In a world such as ours with the explosion of social media and online presence, how must we consider moving forward with online collaboration? Much like the idea that we cannot live without Facebook, Twitter or Instagram will we eventually not be able to live in an academic world without online collaboration?

3) Is this view by Caleb McDaniel too optimistic, “The truth is that we often don’t realize the value of what we have until someone else sees it. By inviting others to see our work in progress, we also open new avenues of interpretation, uncover new linkages between things we would otherwise have persisted in seeing as unconnected, and create new opportunities for collaboration with fellow travelers. These things might still happen through the sharing of our notebooks after publication, but imagine how our publications might be enriched and improved if we lifted our gems to the sunlight before we decided which ones to set and which ones to discard?” Do you think that others (in the academic sphere) have similar views? If not, why do you think this is?

4) These articles ask us to imagine Github used in a wide spread context amongst the world of Education. Considering the challenges that we have faced in class (and during our own time), do you think that Github will become wide spread? What are some tools that could assist the push of Digitizing History?


Final Food for Thought!

Digitizing history can add many values to our work as historians but consider the previous power outage. What will happen to our work if something happens to the internet? Further more, what will happen to our pre-existing institutions, if we move towards total internet collaboration and hosting (i.e the library and archives)?

Happy Reading!

One week left…

I’ve spent much of the summer preparing for teaching this new course and all too quickly the beginning of school is threatening. I need to finalize the reading list still, I haven’t submitted the syllabus to the department and I am still putting together tutorials for the exercises we’ll be doing in class. So, I am somewhat panicking while also very excited to start a new class. My biggest fear is that tools will be updated in between me writing up exercises and us working on them – but I guess this is par for the course in DH.

The process of working on this course has challenged many of my expectations about what students should be learning and working on. This has changed how I think and work on things. For example, having been wedded to WYSWYG word processors since Word came into being (I still mourn WordPerfect’s blue background), I now have an appreciation for work with very simple text editors. Using Markdown or working with code (HTML, TEI) is easier than I thought it would be and offers a certain purity of expression where the author has much control over defining the meaning of text elements, not just its appearance. I have found myself far less obsessed with worrying about layout and the perfect font than I have been. Also, I find that I am more rational about the purpose everything in a document (or in the class) and how it all fits together.

There is still much work to be done before next week and still much work to be done over the course of the year to make sure everything stays on track, but I think overall this preparation has helped motivate me to try to do more. Each stage that I thought was insurmountable began to seem, with time and practice, to be straightforward and reasonable. I hope this course pushes me to embrace this field and keep going with it. I already have so many ideas of how to integrate this with my research….

IIIF Toolkit: Trials and Tribulations

We’ve returned home to Ottawa and we’re ready to start uploading our IIIF content to our Omeka site. Nothing works, everything seems broken. With moral low, we began to frantically google our problems. We were missing one very important part to make the toolkit work:

We needed to install and host a IIIF image server.

We first attempted to host our images on an online server, the Internet Archive. With the API still in alpha and the documentation still in progress, our results we far from perfect. Our images had their default resolutions set to thumbnail size and we could not manage to create proper manifests through archivelab.org. Perhaps we need our own Image Server that we can point to our Omeka install?

The next IIIF image server I attempted to install was Loris. But without dedicated server space, the server would be run locally (and therefore not always accessible). This wasn’t a viable solution for our project. We needed server space.

After discussing it with others in the department, we realized that we should get server space through our institution and allow an experienced computer scientists from the university install our IIIF Image server for us. This means a lot less challenges in the Terminal for me, and less of a chance that the image server will cause us issues (due to my modest installation attempt) in the future.

Post DHSI brainstorming

After being at the DHSI at UVic last week, my idea of what we can accomplish over the course a two term class has changed. This realization comes in part from my own movement from someone who had never coded to someone increasingly comfortable with editing .json files and simple html. So, if a non-techy medievalist such as myself can learn it in a week (≈ 24 hours of instruction) I guess I can expect students to pick up a fair bit of it over two terms (in class, ≈ 24 x 3 hours = 72). We do have a lot of book history to learn, mind you.

If we structure the class correctly, I think, then we can lead students on a quest –starting with social media and ending with simple programming. Students will be expected to use #medievaltwitter (for networking, watching for calls for papers or finding ideas about what is joining on in medieval DH), to blog about their thoughts on the exercises and readings, before moving over to the relatively user-friendly CMS Omeka  as well as the more arcane worlds of GitHub and IIIF to display their work in progress.

The assignments are organized around a spiral curriculum, in which students address a topic and then return to it for at least one further pass. We will start by getting people used to the online environment by using familiar environments – Twitter and blogs. These platforms allow the students to share their experience using other tools. But the real focus is on the uncatalogued medieval material in the holdings of Carleton University. Over the course of the year , each student will work on a single folio to describe, transcribe and analyze it. Their folio then becomes the central focus of students’ work as they consider it from different perspectives and with different tools. What was it like, for example, transcribing a medieval document with pencil and paper? And then how did it differ when using a tool such as Recogito or Transkribus? What changed when inputting a catalogue record modelled on manuscript catalogues to putting that information into Omeka, which uses Dublin Core. By the end of the first term, students will have used Omeka to present a detailed description of the folio. In the second term, I will ask students to present much of the same information, but this time they will be encoding the information as a TEI file that they will make available through a Github Jekyll site.

As I think about all the possibilities, I realize that the key to teaching this course is having a lot of material prepared beforehand. Having templates created and tested beforehand, as well as workflow sorted out will mean that the class should run smoothly. To help with workflow, I created a slack group today for the class and integrated a google drive (where all the manuscript images are currently located), a google calendar (to keep track of deadlines and the presentation schedule) as well as other things I don’t know if they’re useful or not (such as Polly – to allow easy polling of the slack members, and Todo, to create todo lists).

And so, we march on.