A Digital Gazetteer of North Carolina

UNC SILS, INLS 490-186, Spring 2021

This schedule is still in flux. Specifically, the parts in gray are subject to change.

January 19
Introductions

Total amount of required reading for this meeting: 1,800 words

Today we’ll introduce ourselves and talk about:

the idea for the course,
what each of us hopes to get out of the course, and
what a course-based research experience means.

In addition to the topics above, please come prepared to talk about:

yourself, and
a place in North Carolina that you either know well or wish you knew more about.

📖 To read before this meeting:

Shaw, Ryan. “A Digital Gazetteer of North Carolina,” 2019. PDF.

1,800 words

January 21
Tools of the trade

Total amount of required reading for this meeting: 4,700 words

Today we’ll look at Loomio, the collaboration software that we’ll be using this semester, and we’ll start examining the dataset we compiled last spring. We’ll also start talking about some of the various tools for working with data that we’ll be using: text editors, regular expressions, command line utilities, scripting languages, makefiles, and OpenRefine.

📖 To read before this meeting:

Powell, William S. Preface: North Carolina Gazetteer (1st Editon), 1968. https://www.ncpedia.org/gazetteer/prefaces#1st.

2,400 words
Hill, Michael. Preface to the Second Edition, 2009. https://www.ncpedia.org/gazetteer/prefaces#2nd.

2,300 words

January 26
Organizing North Carolina places

Total amount of required reading for this meeting: 6,300 words

Today we’ll meet with Kristen Merryman, Digital Projects Librarian at the North Carolina Digital Heritage Center. We’ll try to get a sense of what it means to have a collection based on a place like “North Carolina.” And we’ll talk about how geographic metadata is integrated into catalog records and authority records, the process of georeferencing, and how places can be a challenge for curators of collections.

📖 To read before this meeting:

Buckland, Michael, Aitao Chen, Fredric C. Gey, Ray R. Larson, Ruth Mostern, and Vivien Petras. “Geographic Search: Catalogs, Gazetteers, and Maps.” College and Research Libraries 68, no. 5 (2007): 376–387. https://doi.org/10.5860/crl.68.5.376.

6,300 words

January 28
OpenRefine

Today we’ll introduce OpenRefine, a tool for cleaning and enriching data.

Before coming to class, please install OpenRefine and complete one of the following OpenRefine tutorials:

Getting started with OpenRefine by Miriam Posner
Getting started with OpenRefine by Thomas Padilla
Using OpenRefine to clean Your data by Jeremy Rue & Richard Koci Hernandez
OpenRefine tutorial by the Open Data Literacy Project

If you run into trouble installing or running OpenRefine, ask for help on the Loomio site! Don’t wait until class to tell us you weren’t able to get it running.

February 2
Digital gazetteers

Total amount of required reading for this meeting: 10,400 words

Today we’ll introduce the concept of a digital gazetteer as a specific kind of networked knowledge organization system.

📖 To read before this meeting:

Hill, Linda L. “Core Elements of Digital Gazetteers: Placenames, Categories, and Footprints.” In Research and Advanced Technology for Digital Libraries, edited by José Borbinha and Thomas Baker, 1923/2000:280–290. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer Berlin Heidelberg, 2000. PDF.

4,800 words
Cope, Aaron Straup, and Nathaniel Vaughn Kelso. “Who’s On First · Mapzen.” Blog. Mapzen, August 18, 2015. https://web.archive.org/web/20200106201029/https://www.mapzen.com/blog/who-s-on-first/.

5,600 words

Reading tips

You might also explore the Who’s On First gazetteer: https://www.whosonfirst.org

February 4
Regular expressions

Today we will have a hands-on activity that will involve digging into the digital version of Powell’s gazetteer using regular expressions.

Before coming to class please complete this regular expression tutorial.

📖 To read before this meeting:

RegexOne, 2019. https://regexone.com.

February 9
Historical gazetteers

Total amount of required reading for this meeting: 6,600 words

Historical gazetteers aim to record and describe not just place names currently in use, but also place names used in the past.

📖 To read before this meeting:

Southall, Humphrey, Ruth Mostern, and Merrick Lex Berman. “On Historical Gazetteers.” International Journal of Humanities & Arts Computing 5, no. 2 (2011): 127–45. PDF.

6,600 words

February 11
Reconciliation

Today we’ll learn how to match (reconcile) records in OpenRefine with records in another database such as Wikidata.

Before coming to class, read the OpenRefine documentation page on reconciliation.

You may also want to check out these two videos:

February 16
Wellness day

Class will not meet.

February 18
GeoJSON

Total amount of required reading for this meeting: 4,500 words

There are several standard formats for recording the spatial “footprints” of places. The one that is easiest to work with is called GeoJSON. As its name implies, GeoJSON uses JSON (JavaScript Object Notation) to record geographic coordinates and polygons.

📖 To read before this meeting:

Shah, Raivat. “An Introduction to JSON.” Towards Data Science, 2019. https://towardsdatascience.com/an-introduction-to-json-c9acb464f43e. PDF.

1,200 words
MacWright, Tom. “More than You Ever Wanted to Know about GeoJSON,” 2015. https://macwright.com/2015/03/23/geojson-second-bite.html.

3,300 words

February 23
Linked data gazetteers

Total amount of required reading for this meeting: 3,800 words

Some digital gazetteers specifically aim to link together disparate datasets that relate to the same places. An approach to publishing known as linked data is well suited to this purpose.

📖 To read before this meeting:

Simon, Rainer, Leif Isaksen, Elton Barker, and Pau de Soto Cañamares. “The Pleiades Gazetteer and the Pelagios Project.” In Placing Names: Enriching and Integrating Gazetteers, 97–109, 2016. PDF.

3,800 words

February 25
Linked data

Total amount of required reading for this meeting: 3,600 words

Linked data is less a specific technology than a set of best practices for publishing data on the web. Today we’ll introduce the basic concepts of linked data, including the Resource Description Framework (RDF) data model, and we’ll learn to write RDF by hand using the Turtle syntax.

📖 To read before this meeting:

Posner, Miriam. What Is Linked Open Data?, 2021. https://youtu.be/VZBpFiLbi-Y.
Stardog. “Graph Data Model,” 2018. https://web.archive.org/web/20190306231928/https://www.stardog.com/tutorials/data-model/.

3,600 words
Optional

Verborgh, Ruben, and Seth van Hooland. “Modelling.” In Linked Data for Libraries, Archives and Museums: How to Clean, Link and Publish Your Metadata, 11–70. Facet Publishing, 2014. http://book.freeyourmetadata.org/chapters/1/modelling.pdf.

22,000 words

March 2
Vague places

Total amount of required reading for this meeting: 5,300 words

One of the things that distinguishes gazetteers from more standard Geographic Information System (GIS) tools is that they can be also be used to record information about places with ill-defined or unknown locations: ancient places, mythical places, or—as we will discuss today—vague places.

Content warning “Perceptual Regions in Texas” contains discussion of ethnic slurs.

📖 To read before this meeting:

Jordan, Terry G. “Perceptual Regions in Texas.” Geographical Review 68, no. 3 (July 1978): 293. https://doi.org/10.2307/215048.

5,300 words
Chisholm, Matt, and Ross Cohen. The Neighborhood Project, 2005. https://hood.theory.org.

March 4
JSON-LD / Linked Places Format

GeoJSON gives us a convenient way to express geospatial information (“footprints”). RDF gives us a convenient way to express other kinds of statements about places, such as statements about the things (people, organizations, events, other places) to which they are related, or the categories (feature types) to which they belong. JSON-LD is a data format that allows us to combine the strengths of GeoJSON and RDF.

📖 To read before this meeting:

Sporny, Manu. What Is JSON-LD?, 2012. https://www.youtube.com/watch?v=vioCbTo3C-4.
Grossner, Karl. “The Linked Places Format,” January 10, 2021. https://github.com/LinkedPasts/linked-places#readme.

March 9
Querying linked data: SPARQL

Total amount of required reading for this meeting: 7,300 words

SPARQL is to RDF triplestores what SQL is to relational databases. A good way to get started with SPARQL is to try the Wikidata Query Service.

📖 To read before this meeting:

“A Gentle Introduction to the Wikidata Query Service.” In Wikidata, n.d. https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/A_gentle_introduction_to_the_Wikidata_Query_Service.

2,200 words
DuCharme, Bob. “Jumping Right in: Some Data and Some Queries.” In Learning SPARQL: Querying and Updating with SPARQL 1.1, 2nd ed., 1–17. Sebastopol: O’Reilly Media, 2013. PDF.

5,100 words
Optional

DuCharme, Bob. “SPARQL Queries: A Deeper Dive.” In Learning SPARQL: Querying and Updating with SPARQL 1.1, 2nd ed., 47–102. Sebastopol: O’Reilly Media, 2013. PDF.

14,100 words

March 11
Wellness day

Class will not meet.

March 16
Knowledge infrastructure

Total amount of required reading for this meeting: 11,500 words

The motivating idea behind this class is that we can make new things possible by taking the contents of a book (Powell’s gazetteer) and putting it into a different form, using networked computers. This is an old idea. Creating new “knowledge infrastructure” is hard, despite a century of dreaming about it.

📖 To read before this meeting:

Wells, H. G.“The Idea of a World Encyclopædia [Excerpt].” Nature 138 (1936): 920–924. https://doi.org/10.1038/138917a0. PDF.

4,300 words

Reading tips

H.G. Wells was an English writer now best known for his science fiction including The Time Machine and The War of the Worlds. During his own lifetime, however, he was prominent as a “futurist” who devoted his literary talents to the development of a progressive vision on a global scale.

This reading is an excerpt from a speech Wells gave in 1936 at The Royal Institution of Great Britain, an organization for scientific education and research founded in 1799.
Licklider, J. C. R.“The View from the Half Way Point on a Journey to the Future: A Progress Report on the Interaction between Libraries and Information Technology [Excerpt].” In Large Libraries and New Technological Developments: Proceedings of a Symposium Held on the Occasion of the Inauguration of the New Building of the Royal Library, The Hague, 29 September-1 October 1982, edited by C. Reedijk, Carol K. Henry, and W. R. H. Koops, 13–34. München: K.G. Saur, 1984. PDF.

5,300 words

Reading tips

J.C.R. Licklider was an American psychologist and computer scientist who was one of the first to foresee modern-style interactive computing and its application to all manner of activities. As director of ARPA’s Information Processing Techniques Office he funded research which led to the canonical graphical user interface, and the ARPANET, the direct predecessor to the Internet.

In the early 1960s The Council on Library Resources recruited Licklider to address the question how could technology help libraries gather, index, organize, store and make accessible the growing body of recorded information. Licklider gathered a small team of engineers and psychologists to explore “concepts and problems of libraries of the future”. Licklider wrote a summary report of the project which appeared as the book Libraries of the Future in 1965.

This reading is an excerpt from a 1982 speech in which Licklider looks back at the changes he anticipated in 1965.
Hillis, Danny. “A Short Introduction to the Underlay.” Notes from the Knowledge Futures Group (blog), August 2, 2020. https://notes.knowledgefutures.org/pub/underlay-short-intro.

1,900 words

Reading tips

Danny Hillis is an American computer scientist who pioneered parallel computers and their use in artificial intelligence. In 2005, Hillis founded Metaweb Technologies to develop a semantic data storage infrastructure for the Internet, and Freebase, an open, structured database of the world’s knowledge. That company was acquired by Google, and its technology became the basis of the Google Knowledge Graph.

This reading is a post giving an overview of Hillis’ latest project, the Underlay.

March 18
Geotagging text

Total amount of required reading for this meeting: 2,500 words

Gazetteers can be combined with annotation tools and/or natural language processing (NLP) tools to “geotag” text. Geotagging involves identifying place names mentioned in a text in order to get get a sense of the spatial coverage of the content, or to link the the text to other relevant texts, images, or data.

📖 To read before this meeting:

Rambsy, Kenton. “Geo-Tagging Edward P. Jones & Washington, DC.” In Lost in the City: An Exploration of Edward P. Jones’s Short Fiction, by Kenton Rambsy and Peace Ossom-Williamson. Publishing Without Walls, 2019. https://iopn.library.illinois.edu/scalar/lost-in-the-city-a-exploration-of-edward-p-joness-short-fiction-/chapter-1—-section-2?path=chapter-1—-data-mining-edward-p—joness-short-fiction.

1,200 words

Reading tips

You can find the raw data created for this project at https://doi.org/10.18738/T8/BB70Z2.
Mallon, Kilian. “Review: Recogito: Visualizing, Mapping, and Annotating Ancient Texts.” Society for Classical Studies (blog), 2019. https://classicalstudies.org/scs-blog/kilian-mallon/review-recogito-visualizing-mapping-and-annotating-ancient-texts.

1,300 words

March 23
Finalizing project proposals

March 25
Finalizing project proposals

March 30
To be decided (based on what projects we decide to pursue)

April 1
To be decided (based on what projects we decide to pursue)

April 6
To be decided (based on what projects we decide to pursue)

April 8
To be decided (based on what projects we decide to pursue)

April 13
To be decided (based on what projects we decide to pursue)

April 15
To be decided (based on what projects we decide to pursue)

April 20
To be decided (based on what projects we decide to pursue)

April 22
To be decided (based on what projects we decide to pursue)

April 27
To be decided (based on what projects we decide to pursue)

April 29
Project presentations

May 4
Project presentations

May 14

Group project due

January 19 Introductions

January 21 Tools of the trade

January 26 Organizing North Carolina places

January 28 OpenRefine

February 2 Digital gazetteers

February 4 Regular expressions

February 9 Historical gazetteers

February 11 Reconciliation

February 16 Wellness day

February 18 GeoJSON

February 23 Linked data gazetteers

February 25 Linked data

March 2 Vague places

March 4 JSON-LD / Linked Places Format

March 9 Querying linked data: SPARQL

March 11 Wellness day

March 16 Knowledge infrastructure

March 18 Geotagging text

March 23 Finalizing project proposals

March 25 Finalizing project proposals

March 30 To be decided (based on what projects we decide to pursue)

April 1 To be decided (based on what projects we decide to pursue)

April 6 To be decided (based on what projects we decide to pursue)

April 8 To be decided (based on what projects we decide to pursue)

April 13 To be decided (based on what projects we decide to pursue)

April 15 To be decided (based on what projects we decide to pursue)

April 20 To be decided (based on what projects we decide to pursue)

April 22 To be decided (based on what projects we decide to pursue)

April 27 To be decided (based on what projects we decide to pursue)

April 29 Project presentations

May 4 Project presentations

May 14 Group project due

January 19
Introductions

January 21
Tools of the trade

January 26
Organizing North Carolina places

January 28
OpenRefine

February 2
Digital gazetteers

February 4
Regular expressions

February 9
Historical gazetteers

February 11
Reconciliation

February 16
Wellness day

February 18
GeoJSON

February 23
Linked data gazetteers

February 25
Linked data

March 2
Vague places

March 4
JSON-LD / Linked Places Format

March 9
Querying linked data: SPARQL

March 11
Wellness day

March 16
Knowledge infrastructure

March 18
Geotagging text

March 23
Finalizing project proposals

March 25
Finalizing project proposals

March 30
To be decided (based on what projects we decide to pursue)

April 1
To be decided (based on what projects we decide to pursue)

April 6
To be decided (based on what projects we decide to pursue)

April 8
To be decided (based on what projects we decide to pursue)

April 13
To be decided (based on what projects we decide to pursue)

April 15
To be decided (based on what projects we decide to pursue)

April 20
To be decided (based on what projects we decide to pursue)

April 22
To be decided (based on what projects we decide to pursue)

April 27
To be decided (based on what projects we decide to pursue)

April 29
Project presentations

May 4
Project presentations

May 14
Group project due