A Digital Gazetteer of North Carolina
UNC SILS, INLS 490-186, Spring 2021
This schedule is still in flux. Specifically, the parts in gray are subject to change.
January 19
Introductions
Total amount of required reading for this meeting: 1,800 words
Today we’ll introduce ourselves and talk about:
- the idea for the course,
- what each of us hopes to get out of the course, and
- what a course-based research experience means.
In addition to the topics above, please come prepared to talk about:
- yourself, and
- a place in North Carolina that you either know well or wish you knew more about.
📖 To read before this meeting:
-
Shaw, Ryan. “A Digital Gazetteer of North Carolina,” 2019. PDF.
January 21
Tools of the trade
Total amount of required reading for this meeting: 4,700 words
Today we’ll look at Loomio, the collaboration software that we’ll be using this semester, and we’ll start examining the dataset we compiled last spring. We’ll also start talking about some of the various tools for working with data that we’ll be using: text editors, regular expressions, command line utilities, scripting languages, makefiles, and OpenRefine.
📖 To read before this meeting:
-
Powell, William S. Preface: North Carolina Gazetteer (1st Editon), 1968. https://www.ncpedia.org/gazetteer/prefaces#1st.
-
Hill, Michael. Preface to the Second Edition, 2009. https://www.ncpedia.org/gazetteer/prefaces#2nd.
January 26
Organizing North Carolina places
Total amount of required reading for this meeting: 6,300 words
Today we’ll meet with Kristen Merryman, Digital Projects Librarian at the North Carolina Digital Heritage Center. We’ll try to get a sense of what it means to have a collection based on a place like “North Carolina.” And we’ll talk about how geographic metadata is integrated into catalog records and authority records, the process of georeferencing, and how places can be a challenge for curators of collections.
📖 To read before this meeting:
-
Buckland, Michael, Aitao Chen, Fredric C. Gey, Ray R. Larson, Ruth Mostern, and Vivien Petras. “Geographic Search: Catalogs, Gazetteers, and Maps.” College and Research Libraries 68, no. 5 (2007): 376–387. https://doi.org/10.5860/crl.68.5.376.
January 28
OpenRefine
Today we’ll introduce OpenRefine, a tool for cleaning and enriching data.
Before coming to class, please install OpenRefine and complete one of the following OpenRefine tutorials:
- Getting started with OpenRefine by Miriam Posner
- Getting started with OpenRefine by Thomas Padilla
- Using OpenRefine to clean Your data by Jeremy Rue & Richard Koci Hernandez
- OpenRefine tutorial by the Open Data Literacy Project
If you run into trouble installing or running OpenRefine, ask for help on the Loomio site! Don’t wait until class to tell us you weren’t able to get it running.
February 2
Digital gazetteers
Total amount of required reading for this meeting: 10,400 words
Today we’ll introduce the concept of a digital gazetteer as a specific kind of networked knowledge organization system.
📖 To read before this meeting:
-
Hill, Linda L. “Core Elements of Digital Gazetteers: Placenames, Categories, and Footprints.” In Research and Advanced Technology for Digital Libraries, edited by José Borbinha and Thomas Baker, 1923/2000:280–290. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer Berlin Heidelberg, 2000. PDF.
-
Cope, Aaron Straup, and Nathaniel Vaughn Kelso. “Who’s On First · Mapzen.” Blog. Mapzen, August 18, 2015. https://web.archive.org/web/20200106201029/https://www.mapzen.com/blog/who-s-on-first/.
Reading tips
You might also explore the Who’s On First gazetteer: https://www.whosonfirst.org
February 4
Regular expressions
Today we will have a hands-on activity that will involve digging into the digital version of Powell’s gazetteer using regular expressions.
Before coming to class please complete this regular expression tutorial.
📖 To read before this meeting:
-
RegexOne, 2019. https://regexone.com.
February 9
Historical gazetteers
Total amount of required reading for this meeting: 6,600 words
Historical gazetteers aim to record and describe not just place names currently in use, but also place names used in the past.
📖 To read before this meeting:
-
Southall, Humphrey, Ruth Mostern, and Merrick Lex Berman. “On Historical Gazetteers.” International Journal of Humanities & Arts Computing 5, no. 2 (2011): 127–45. PDF.
February 11
Reconciliation
Today we’ll learn how to match (reconcile) records in OpenRefine with records in another database such as Wikidata.
Before coming to class, read the OpenRefine documentation page on reconciliation.
You may also want to check out these two videos:
February 16
Wellness day
Class will not meet.
February 18
GeoJSON
Total amount of required reading for this meeting: 4,500 words
There are several standard formats for recording the spatial “footprints” of places. The one that is easiest to work with is called GeoJSON. As its name implies, GeoJSON uses JSON (JavaScript Object Notation) to record geographic coordinates and polygons.
📖 To read before this meeting:
-
Shah, Raivat. “An Introduction to JSON.” Towards Data Science, 2019. https://towardsdatascience.com/an-introduction-to-json-c9acb464f43e. PDF.
-
MacWright, Tom. “More than You Ever Wanted to Know about GeoJSON,” 2015. https://macwright.com/2015/03/23/geojson-second-bite.html.
February 23
Linked data gazetteers
Total amount of required reading for this meeting: 3,800 words
Some digital gazetteers specifically aim to link together disparate datasets that relate to the same places. An approach to publishing known as linked data is well suited to this purpose.
📖 To read before this meeting:
-
Simon, Rainer, Leif Isaksen, Elton Barker, and Pau de Soto Cañamares. “The Pleiades Gazetteer and the Pelagios Project.” In Placing Names: Enriching and Integrating Gazetteers, 97–109, 2016. PDF.
February 25
Linked data
Total amount of required reading for this meeting: 3,600 words
Linked data is less a specific technology than a set of best practices for publishing data on the web. Today we’ll introduce the basic concepts of linked data, including the Resource Description Framework (RDF) data model, and we’ll learn to write RDF by hand using the Turtle syntax.
📖 To read before this meeting:
-
Posner, Miriam. What Is Linked Open Data?, 2021. https://youtu.be/VZBpFiLbi-Y.
-
Stardog. “Graph Data Model,” 2018. https://web.archive.org/web/20190306231928/https://www.stardog.com/tutorials/data-model/.
-
OptionalVerborgh, Ruben, and Seth van Hooland. “Modelling.” In Linked Data for Libraries, Archives and Museums: How to Clean, Link and Publish Your Metadata, 11–70. Facet Publishing, 2014. http://book.freeyourmetadata.org/chapters/1/modelling.pdf.
March 2
Vague places
Total amount of required reading for this meeting: 5,300 words
One of the things that distinguishes gazetteers from more standard Geographic Information System (GIS) tools is that they can be also be used to record information about places with ill-defined or unknown locations: ancient places, mythical places, or—as we will discuss today—vague places.
Content warning “Perceptual Regions in Texas” contains discussion of ethnic slurs.
📖 To read before this meeting:
-
Jordan, Terry G. “Perceptual Regions in Texas.” Geographical Review 68, no. 3 (July 1978): 293. https://doi.org/10.2307/215048.
-
Chisholm, Matt, and Ross Cohen. The Neighborhood Project, 2005. https://hood.theory.org.
March 4
JSON-LD / Linked Places Format
GeoJSON gives us a convenient way to express geospatial information (“footprints”). RDF gives us a convenient way to express other kinds of statements about places, such as statements about the things (people, organizations, events, other places) to which they are related, or the categories (feature types) to which they belong. JSON-LD is a data format that allows us to combine the strengths of GeoJSON and RDF.
📖 To read before this meeting:
-
Sporny, Manu. What Is JSON-LD?, 2012. https://www.youtube.com/watch?v=vioCbTo3C-4.
-
Grossner, Karl. “The Linked Places Format,” January 10, 2021. https://github.com/LinkedPasts/linked-places#readme.
March 9
Querying linked data: SPARQL
Total amount of required reading for this meeting: 7,300 words
SPARQL is to RDF triplestores what SQL is to relational databases. A good way to get started with SPARQL is to try the Wikidata Query Service.
📖 To read before this meeting:
-
“A Gentle Introduction to the Wikidata Query Service.” In Wikidata, n.d. https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/A_gentle_introduction_to_the_Wikidata_Query_Service.
-
DuCharme, Bob. “Jumping Right in: Some Data and Some Queries.” In Learning SPARQL: Querying and Updating with SPARQL 1.1, 2nd ed., 1–17. Sebastopol: O’Reilly Media, 2013. PDF.
-
OptionalDuCharme, Bob. “SPARQL Queries: A Deeper Dive.” In Learning SPARQL: Querying and Updating with SPARQL 1.1, 2nd ed., 47–102. Sebastopol: O’Reilly Media, 2013. PDF.
March 11
Wellness day
Class will not meet.
March 16
Knowledge infrastructure
Total amount of required reading for this meeting: 11,500 words
The motivating idea behind this class is that we can make new things possible by taking the contents of a book (Powell’s gazetteer) and putting it into a different form, using networked computers. This is an old idea. Creating new “knowledge infrastructure” is hard, despite a century of dreaming about it.
📖 To read before this meeting:
-
Wells, H. G.“The Idea of a World Encyclopædia [Excerpt].” Nature 138 (1936): 920–924. https://doi.org/10.1038/138917a0. PDF.
Reading tips
H.G. Wells was an English writer now best known for his science fiction including The Time Machine and The War of the Worlds. During his own lifetime, however, he was prominent as a “futurist” who devoted his literary talents to the development of a progressive vision on a global scale.
This reading is an excerpt from a speech Wells gave in 1936 at The Royal Institution of Great Britain, an organization for scientific education and research founded in 1799.
-
Licklider, J. C. R.“The View from the Half Way Point on a Journey to the Future: A Progress Report on the Interaction between Libraries and Information Technology [Excerpt].” In Large Libraries and New Technological Developments: Proceedings of a Symposium Held on the Occasion of the Inauguration of the New Building of the Royal Library, The Hague, 29 September-1 October 1982, edited by C. Reedijk, Carol K. Henry, and W. R. H. Koops, 13–34. München: K.G. Saur, 1984. PDF.
Reading tips
J.C.R. Licklider was an American psychologist and computer scientist who was one of the first to foresee modern-style interactive computing and its application to all manner of activities. As director of ARPA’s Information Processing Techniques Office he funded research which led to the canonical graphical user interface, and the ARPANET, the direct predecessor to the Internet.
In the early 1960s The Council on Library Resources recruited Licklider to address the question how could technology help libraries gather, index, organize, store and make accessible the growing body of recorded information. Licklider gathered a small team of engineers and psychologists to explore “concepts and problems of libraries of the future”. Licklider wrote a summary report of the project which appeared as the book Libraries of the Future in 1965.
This reading is an excerpt from a 1982 speech in which Licklider looks back at the changes he anticipated in 1965.
-
Hillis, Danny. “A Short Introduction to the Underlay.” Notes from the Knowledge Futures Group (blog), August 2, 2020. https://notes.knowledgefutures.org/pub/underlay-short-intro.
Reading tips
Danny Hillis is an American computer scientist who pioneered parallel computers and their use in artificial intelligence. In 2005, Hillis founded Metaweb Technologies to develop a semantic data storage infrastructure for the Internet, and Freebase, an open, structured database of the world’s knowledge. That company was acquired by Google, and its technology became the basis of the Google Knowledge Graph.
This reading is a post giving an overview of Hillis’ latest project, the Underlay.
March 18
Geotagging text
Total amount of required reading for this meeting: 2,500 words
Gazetteers can be combined with annotation tools and/or natural language processing (NLP) tools to “geotag” text. Geotagging involves identifying place names mentioned in a text in order to get get a sense of the spatial coverage of the content, or to link the the text to other relevant texts, images, or data.
📖 To read before this meeting:
-
Rambsy, Kenton. “Geo-Tagging Edward P. Jones & Washington, DC.” In Lost in the City: An Exploration of Edward P. Jones’s Short Fiction, by Kenton Rambsy and Peace Ossom-Williamson. Publishing Without Walls, 2019. https://iopn.library.illinois.edu/scalar/lost-in-the-city-a-exploration-of-edward-p-joness-short-fiction-/chapter-1—-section-2?path=chapter-1—-data-mining-edward-p—joness-short-fiction.
Reading tips
You can find the raw data created for this project at https://doi.org/10.18738/T8/BB70Z2.
-
Mallon, Kilian. “Review: Recogito: Visualizing, Mapping, and Annotating Ancient Texts.” Society for Classical Studies (blog), 2019. https://classicalstudies.org/scs-blog/kilian-mallon/review-recogito-visualizing-mapping-and-annotating-ancient-texts.