Digitization in the Humanities

Friday, April 5 – Sunday, April 7, 2013
Farnsworth Pavilion, Rice Student Center

The workshop offers hands-on introductions to some of the basic tools and methods in the emerging field of Digital Humanities. Collaborating with Oxford University, Rice University has invited six gurus of the field to teach markup, text-mining, network analysis and mega-data management. The workshop provides opportunities to create collaboration between computational- and humanities-oriented research on campus, as well as with researchers outside of Rice University.

This workshop is organized by Anne Chao (Rice University), Hilde De Weerdt (King's College London), and Judith Pfeiffer (University of Oxford).

SPACE IS LIMITED. Register by sending an email to laurenk@rice.edu with the subject line "DH Workshop". The schedule and topic descriptions are listed below.

The structure of each session will be a 2 hour workshop and 1 hour open lab.


Friday, April 5

Derek Ruths, McGill University
Network Analysis and Graph Theory Fundamentals

In this session participants will be introduced to core concepts in graph theory and network analysis. We will discuss the basic vocabulary of network connectivity, clustering, centrality, and community structure. Several datasets ranging from road systems to character interaction networks will be used to show how network concepts can be used to qualitatively and quantitatively investigate questions of interest within the humanities.
Marcus Bingenheimer, Temple University
TEI - Markup

The TEI (Text Encoding Initiative) standard is the oldest and most widely used markup standard in the Digital Humanities. Highly expressive, it allows editors and researchers to produce high-end digital editions. Based on a robust, flexible data-model, and with provisions to include structured metadata TEI is often used as the master format for digital editions. XSLT, XQuery or other scripting languages can be used to transform TEI-editions into a variety of output formats (pdf, odt, docx, html...). TEI is therefore a basic technology for all Digital Humanities projects dealing with structured text, which needs to be preserved and exchanged long-term.

Saturday, April 6

Timothy Tangherlini, University of California, Los Angeles
In this workshop, we will explore what Ian Gregory and Paul Ell, in their book, “Historical GIS”, mention as “tools and techniques [that] allow [ scholars] to re-examine radically the way that space is used” in the Humanities. The first part of the workshop will focus on some classic problems in GIS for the Humanities: geographic entity detection, placename disambiguation, and point data. The underlying question is, how can maps help us in the exploration of a Humanities problem? As an illustration, we will explore some applications of GIS in the study of folklore, medieval Icelandic manuscript production, and the biography of Henrik Ibsen. These projects are generalizable to other Humanities problems.
Dennis Tenen, Columbia University
Applied Network Analysis

In this workshop we will build on the fundamentals of network analysis and graph theory to explore a data set of social interactions on a popular book piracy forum. Participants will learn to use Gephi--an open-access platform for visualizing and exploring networks and complex systems. Topics covered will include the data cycle, exploratory data analysis, network metrics, clustering algorithms, and elements of visual design.

Sunday, April 7

Shih-Pei Chen, Institute for Quantitative Social Science, Harvard University   8:30am-11:00am
Text Extraction using Regular Expressions

In this workshop we will show how regular expressions, an effective way of extracting information with regular written patterns such as dates, can help humanists with searching, organizing, and data extraction for text corpuses.
David Mimno, Princeton University

It can be difficult to approach a large text collection. Reading every document carefully maybe impossible, and simple word counting tools can provide an incomplete or confusing picture of the corpus. Topic models provide a scalable alternative: using statistical techniques we can automatically identify groups of words that correspond to themes or topics. Participants in this workshop will prepare a corpus for modeling, train models, and then evaluate and use the topic representation.


Humanities Research Center
Chao Center for Asian Studies
Fondren Library
Ken Kennedy Institute for Information Technology
Office of Graduate and Postdoctoral Studies
Ancient Mediterranean Civilizations
University of Oxford
King's College London
Texas A&M