Text-mining as a Research Tool in the Humanities and Social Sciences

These slides are from a presentation I made at the Duke Libraries on September 20, 2012 as part of their Text > Data speaker series, and again on March 5, 2013 as part of their RCR Forum series. You may also be interested in this archive of Adeline Koh’s live-tweeting of the presentation.

Below is a list of resources suitable for further exploration of the topics I covered. This list is incomplete and heavily biased toward text analysis in the humanities rather than the social sciences more broadly. However, it should provide a good starting point for learning about text analysis research methods.







  • Sapping Attention is the blog of Ben Schmidt. Schmidt is a graduate student in history at Princeton University, and the Visiting Graduate Fellow at the Cultural Observatory at Harvard, where he helped create Bookworm. He uses text mining of large corpora to study the history of concepts, and blogs regularly about the techniques he uses.
  • The Stone and the Shell is the blog of Ted Underwood. Underwood is an English professor at the University of Illinois using text mining to study eighteenth- and nineteenth-century literature. He blogs his experiments, often providing data and code as well.
  • Lisa @ Work is the blog of Lisa Rhody, a Ph.D. candidate in English at the University of Maryland who is applying computational text analysis to ekphrastic poetry (poems that take the visual arts as their subject) by contemporary women poets.
  • Scott Weingart is a doctoral student in the digital humanities with a knack for explaining complicated topics on his blog. See for example his post on Topic Modeling for Humanists.

Online Syllabi