A Digital Gazetteer of North Carolina

UNC SILS, INLS 490-186, Spring 2022

January 11
Introductions

Total amount of required reading for this meeting: 1,800 words

Today we’ll introduce ourselves and talk about:

  • the idea for the course,
  • what each of us hopes to get out of the course, and
  • what a course-based research experience means.

In addition to the topics above, please come prepared to talk about:

  • yourself, and
  • a place in North Carolina that you either know well or wish you knew more about.

📖 To read before this meeting:

  1. Shaw, Ryan. “A Digital Gazetteer of North Carolina,” 2019. PDF.
    1,800 words

January 13
Tools of the trade

Total amount of required reading for this meeting: 4,700 words

Today we’ll look at Loomio, the collaboration software that we’ll be using this semester, and we’ll start examining the dataset we compiled last spring. We’ll also start talking about some of the various tools for working with data that we’ll be using: text editors, regular expressions, command line utilities, scripting languages, makefiles, and OpenRefine.

📖 To read before this meeting:

  1. Powell, William S. Preface: North Carolina Gazetteer (1st Editon), 1968. https://www.ncpedia.org/gazetteer/prefaces#1st.
    2,400 words
  2. Hill, Michael. Preface to the Second Edition, 2009. https://www.ncpedia.org/gazetteer/prefaces#2nd.
    2,300 words

January 18
Catalogs, gazetteers, and maps

Total amount of required reading for this meeting: 10,300 words

Today we’ll talk about how geographic metadata is integrated into catalog records and authority records, the process of georeferencing, and how places can be a challenge for curators of collections.

📖 To read before this meeting:

  1. Buckland, Michael, Aitao Chen, Fredric C. Gey, Ray R. Larson, Ruth Mostern, and Vivien Petras. “Geographic Search: Catalogs, Gazetteers, and Maps.” College and Research Libraries 68, no. 5 (2007): 376–387. https://doi.org/10.5860/crl.68.5.376.
    6,300 words
  2. Buchel, Olha, and Linda L Hill. “Treatment of Georeferencing in Knowledge Organization Systems: North American Contributions to Integrated Georeferencing.” In Proceedings from the North American Symposium on Knowledge Organization, 2009. http://journals.lib.washington.edu/index.php/nasko/article/view/12807/11289.
    4,000 words

January 20
OpenRefine

Today we’ll introduce OpenRefine, a tool for cleaning and enriching data.

Before coming to class, please install OpenRefine and complete one of the following OpenRefine tutorials:

If you run into trouble installing or running OpenRefine, ask for help on the Loomio site! Don’t wait until class to tell us you weren’t able to get it running.

January 25
Digital gazetteers

Total amount of required reading for this meeting: 10,400 words

Today we’ll introduce the concept of a digital gazetteer as a specific kind of networked knowledge organization system.

📖 To read before this meeting:

  1. Hill, Linda L. “Core Elements of Digital Gazetteers: Placenames, Categories, and Footprints.” In Research and Advanced Technology for Digital Libraries, edited by José Borbinha and Thomas Baker, 1923/2000:280–290. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer Berlin Heidelberg, 2000. PDF.
    4,800 words
  2. Cope, Aaron Straup, and Nathaniel Vaughn Kelso. “Who’s On First · Mapzen.” Blog. Mapzen, August 18, 2015. https://web.archive.org/web/20200106201029/https://www.mapzen.com/blog/who-s-on-first/.
    5,600 words
    Reading tips

    You might also explore the Who’s On First gazetteer: https://www.whosonfirst.org

January 27
Regular expressions

Today we will have a hands-on activity that will involve digging into the digital version of Powell’s gazetteer using regular expressions.

Before coming to class please complete this regular expression tutorial.

📖 To read before this meeting:

  1. RegexOne, 2019. https://regexone.com.

January 27
OpenRefine and regular expressions assignment handed out

February 1
GeoJSON

Total amount of required reading for this meeting: 8,800 words

There are several standard formats for recording the spatial “footprints” of places. The one that is easiest to work with is called GeoJSON. As its name implies, GeoJSON uses JSON (JavaScript Object Notation) to record geographic coordinates and polygons.

📖 To read before this meeting:

  1. Shaw, Ryan. “The Forms of Descriptions (Part 1),” 2022. PDF.
    4,300 words
  2. Shah, Raivat. “An Introduction to JSON.” Towards Data Science, 2019. https://towardsdatascience.com/an-introduction-to-json-c9acb464f43e. PDF.
    1,200 words
  3. NEON. “About Vector Data,” 2020. PDF.
  4. MacWright, Tom. “More than You Ever Wanted to Know about GeoJSON,” 2015. https://macwright.com/2015/03/23/geojson-second-bite.html.
    3,300 words

February 3
No meeting today

Today we won’t meet since I will be (virtually) attending Graphs and Networks in the Humanities 2022. The amount of reading for next week is the largest of the semester (25,000 words total) so use this time to get a head start on it.

February 3
OpenRefine and regular expressions assignment due

February 8
Historical gazetteers

Total amount of required reading for this meeting: 12,900 words

Historical gazetteers aim to record and describe not just place names currently in use, but also place names used in the past.

📖 To read before this meeting:

  1. Southall, Humphrey, Ruth Mostern, and Merrick Lex Berman. “On Historical Gazetteers.” International Journal of Humanities & Arts Computing 5, no. 2 (2011): 127–45. PDF.
    6,600 words
  2. Shaw, Ryan. “Gazetteers Enriched: A Conceptual Basis for Linking Gazetteers with Other Kinds of Information.” In Placing Names: Enriching and Integrating Gazetteers, 51–64. Indiana University Press, 2016. PDF.
    6,300 words

February 10
Linked data

Total amount of required reading for this meeting: 12,100 words

Linked data is less a specific technology than a set of best practices for publishing data on the web. Today we’ll introduce the basic concepts of linked data, including the Resource Description Framework (RDF) data model, and we’ll learn to write RDF by hand using the Turtle syntax.

📖 To read before this meeting:

  1. Shaw, Ryan. “The Forms of Descriptions (Part 2),” 2022. PDF.
    8,500 words
  2. Posner, Miriam. What Is Linked Open Data?, 2021. https://youtu.be/VZBpFiLbi-Y.
  3. Optional
    Verborgh, Ruben, and Seth van Hooland. “Modelling.” In Linked Data for Libraries, Archives and Museums: How to Clean, Link and Publish Your Metadata, 11–70. Facet Publishing, 2014. http://book.freeyourmetadata.org/chapters/1/modelling.pdf.
    22,000 words

February 15
Linked data gazetteers

Total amount of required reading for this meeting: 10,400 words

Some digital gazetteers specifically aim to link together disparate datasets that relate to the same places. An approach to publishing known as linked data is well suited to this purpose.

📖 To read before this meeting:

  1. Elliott, Tom. “The Pleiadic Gaze: Looking at Archaeology from the Perspective of a Digital Gazetteer.” In Classical Archaeology in the Digital Age – The AIAC Presidential Panel: Panel 12.1. Heidelberg: Propylaeum, 2021. PDF.
    3,700 words
  2. Simon, Rainer, Leif Isaksen, Elton Barker, and Pau de Soto Cañamares. “The Pleiades Gazetteer and the Pelagios Project.” In Placing Names: Enriching and Integrating Gazetteers, 97–109, 2016. PDF.
    3,800 words
  3. Grossner, Karl, and Ruth Mostern. “Linked Places in World Historical Gazetteer.” In Proceedings of the 5th ACM SIGSPATIAL International Workshop on Geospatial Humanities, 40–43. GeoHumanities’21. New York, NY, USA: Association for Computing Machinery, 2021. https://doi.org/10.1145/3486187.3490203.
    2,900 words
  4. Grossner, Karl, Ruth Mostern, Susan Grunewald, and Ali Straub. “Linked Traces: Connecting Places via Historical Events, People, Objects and Concepts,” 2020. http://blog.whgazetteer.org/pubs/traces_poster_Dec2020_a4_600.pdf.
  5. Optional
    Regalia, Blake, Krzysztof Janowicz, Gengchen Mai, Dalia Varanka, and E. Lynn Usery. “GNIS-LD: Serving and Visualizing the Geographic Names Information System Gazetteer as Linked Data.” In The Semantic Web, edited by Aldo Gangemi, Roberto Navigli, Maria-Esther Vidal, Pascal Hitzler, Raphaël Troncy, Laura Hollink, Anna Tordai, and Mehwish Alam, 528–40. Lecture Notes in Computer Science. Cham: Springer International Publishing, 2018. https://doi.org/10.1007/978-3-319-93417-4_34.
    4,900 words

February 17
JSON-LD / Linked Places Format

GeoJSON gives us a convenient way to express geospatial information (“footprints”). RDF gives us a convenient way to express other kinds of statements about places, such as statements about the things (people, organizations, events, other places) to which they are related, or the categories (feature types) to which they belong. JSON-LD is a data format that allows us to combine the strengths of GeoJSON and RDF.

📖 To read before this meeting:

  1. Sporny, Manu. What Is JSON-LD?, 2012. https://www.youtube.com/watch?v=vioCbTo3C-4.
  2. Grossner, Karl. “The Linked Places Format,” January 10, 2021. https://github.com/LinkedPasts/linked-places#readme.

February 17
Linked geographical data assignment handed out

February 22
Vague places

Total amount of required reading for this meeting: 12,900 words

One of the things that distinguishes gazetteers from more standard Geographic Information System (GIS) tools is that they can be also be used to record information about places with ill-defined or unknown locations: ancient places, mythical places, or—as we will discuss today—vague places.

Content warning “Perceptual Regions in Texas” contains discussion of ethnic slurs.

📖 To read before this meeting:

  1. Jordan, Terry G. “Perceptual Regions in Texas.” Geographical Review 68, no. 3 (July 1978): 293. https://doi.org/10.2307/215048.
    5,300 words
  2. Zelinsky, Wilbur. “North America’s Vernacular Regions.” Annals of the Association of American Geographers 70, no. 1 (March 1980): 1–16. https://doi.org/10.1111/j.1467-8306.1980.tb01293.x.
    7,600 words
  3. Chisholm, Matt, and Ross Cohen. The Neighborhood Project, 2005. https://hood.theory.org.
  4. Optional
    Rossum, Sonja, and Stephen Lavin. “Where Are the Great Plains? A Cartographic Analysis.” The Professional Geographer 52, no. 3 (August 1, 2000): 543–52. https://doi.org/10.1111/0033-0124.00245.
    5,300 words

February 24
Querying linked data: SPARQL

Total amount of required reading for this meeting: 7,300 words

SPARQL is to RDF triplestores what SQL is to relational databases. A good way to get started with SPARQL is to try the Wikidata Query Service.

📖 To read before this meeting:

  1. “A Gentle Introduction to the Wikidata Query Service.” In Wikidata, n.d. https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/A_gentle_introduction_to_the_Wikidata_Query_Service.
    2,200 words
  2. DuCharme, Bob. “Jumping Right in: Some Data and Some Queries.” In Learning SPARQL: Querying and Updating with SPARQL 1.1, 2nd ed., 1–17. Sebastopol: O’Reilly Media, 2013. PDF.
    5,100 words

February 24
Linked geographical data assignment due

March 1
Knowledge infrastructure

Total amount of required reading for this meeting: 11,500 words

The motivating idea behind this class is that we can make new things possible by taking the contents of a book (Powell’s gazetteer) and putting it into a different form, using networked computers. This is an old idea. Creating new “knowledge infrastructure” is hard, despite a century of dreaming about it.

📖 To read before this meeting:

  1. Wells, H. G.“The Idea of a World Encyclopædia [Excerpt].” Nature 138 (1936): 920–924. https://doi.org/10.1038/138917a0. PDF.
    4,300 words
    Reading tips

    H.G. Wells was an English writer now best known for his science fiction including The Time Machine and The War of the Worlds. During his own lifetime, however, he was prominent as a “futurist” who devoted his literary talents to the development of a progressive vision on a global scale.

    This reading is an excerpt from a speech Wells gave in 1936 at The Royal Institution of Great Britain, an organization for scientific education and research founded in 1799.

  2. Licklider, J. C. R.“The View from the Half Way Point on a Journey to the Future: A Progress Report on the Interaction between Libraries and Information Technology [Excerpt].” In Large Libraries and New Technological Developments: Proceedings of a Symposium Held on the Occasion of the Inauguration of the New Building of the Royal Library, The Hague, 29 September-1 October 1982, edited by C. Reedijk, Carol K. Henry, and W. R. H. Koops, 13–34. München: K.G. Saur, 1984. PDF.
    5,300 words
    Reading tips

    J.C.R. Licklider was an American psychologist and computer scientist who was one of the first to foresee modern-style interactive computing and its application to all manner of activities. As director of ARPA’s Information Processing Techniques Office he funded research which led to the canonical graphical user interface, and the ARPANET, the direct predecessor to the Internet.

    In the early 1960s The Council on Library Resources recruited Licklider to address the question how could technology help libraries gather, index, organize, store and make accessible the growing body of recorded information. Licklider gathered a small team of engineers and psychologists to explore “concepts and problems of libraries of the future”. Licklider wrote a summary report of the project which appeared as the book Libraries of the Future in 1965.

    This reading is an excerpt from a 1982 speech in which Licklider looks back at the changes he anticipated in 1965.

  3. Hillis, Danny. “A Short Introduction to the Underlay.” Notes from the Knowledge Futures Group (blog), August 2, 2020. https://notes.knowledgefutures.org/pub/underlay-short-intro.
    1,900 words
    Reading tips

    Danny Hillis is an American computer scientist who pioneered parallel computers and their use in artificial intelligence. In 2005, Hillis founded Metaweb Technologies to develop a semantic data storage infrastructure for the Internet, and Freebase, an open, structured database of the world’s knowledge. That company was acquired by Google, and its technology became the basis of the Google Knowledge Graph.

    This reading is a post giving an overview of Hillis’ latest project, the Underlay.

March 3
Assignment 2 review / SPARQL

Today we’ll go over assignment 2, and look at SPARQL some more if there is time.

March 8
Wikidata

Total amount of required reading for this meeting: 14,800 words

Wikidata is the open, collaboratively edited knowledge graph that underpins Wikipedia and other Wikimedia projects. For better or for worse, it is becoming a centralized hub for most linked data openly published on the Web.

📖 To read before this meeting:

  1. Hyman, Malcolm D., and Jürgen Renn. “Toward an Epistemic Web.” In The Globalization of Knowledge in History. MPRL – Studies. Berlin: Max-Planck-Gesellschaft zur Förderung der Wissenschaften, 2012. https://doi.org/10.34663/9783945561232-36.
    7,800 words
  2. Vrandečić, Denny, and Markus Krötzsch. “Wikidata: A Free Collaborative Knowledgebase.” Communications of the ACM 57, no. 10 (September 23, 2014): 78–85. https://doi.org/10.1145/2629489.
    5,100 words
  3. Godby, Jean, Karen Smith-Yoshimura, Bruce Washburn, Kalan Knudson Davis, Karen Detling, Christine Fernsebner Eslao, Steven Folsom, et al. “Creating Library Linked Data with Wikibase: Lessons Learned from Project Passage.” OCLC, December 27, 2021. PDF.
    1,900 words
    Reading tips

    This is an excerpt from a longer report. If you’re interested in reading the whole report, you can find it on the OCLC website.

  4. Optional
    Vanderbilt University. “Learn Wikidata.” Accessed January 7, 2022. https://www.learnwikidata.net/.

March 10
More SPARQL

Total amount of required reading for this meeting: 14,100 words

Today we’ll continue learning SPARQL and get a little more advanced.

📖 To read before this meeting:

  1. DuCharme, Bob. “SPARQL Queries: A Deeper Dive.” In Learning SPARQL: Querying and Updating with SPARQL 1.1, 2nd ed., 47–102. Sebastopol: O’Reilly Media, 2013. PDF.
    14,100 words

March 10
SPARQL assignment handed out

March 15
Spring break

March 17
Spring break

March 22
Geotagging text

Total amount of required reading for this meeting: 16,500 words

Gazetteers can be combined with annotation tools and/or natural language processing (NLP) tools to “geotag” text. Geotagging involves identifying place names mentioned in a text in order to get get a sense of the spatial coverage of the content, or to link the the text to other relevant texts, images, or data.

📖 To read before this meeting:

  1. Rambsy, Kenton. “Geo-Tagging Edward P. Jones & Washington, DC.” In Lost in the City: An Exploration of Edward P. Jones’s Short Fiction, by Kenton Rambsy and Peace Ossom-Williamson. Publishing Without Walls, 2019. https://iopn.library.illinois.edu/scalar/lost-in-the-city-a-exploration-of-edward-p-joness-short-fiction-/chapter-1—-section-2?path=chapter-1—-data-mining-edward-p—joness-short-fiction.
    1,200 words
    Reading tips

    You can find the raw data created for this project at https://doi.org/10.18738/T8/BB70Z2.

  2. Foka, Anna, Osman Cenk Demiroglu, Elton Barker, Nasrin Mostofian, Kyriaki Konstantinidou, Brady Kiesling, Linda Talatas, and Kajsa Palm. “Visualizing Pausanias’s Description of Greece with Contemporary GIS.” Digital Scholarship in the Humanities, November 25, 2021, fqab093. https://doi.org/10.1093/llc/fqab093.
    4,400 words
  3. Cooper, David, and Ian N Gregory. “Mapping the English Lake District: A Literary GIS.” Transactions of the Institute of British Geographers 36, no. 1 (2011): 89–108. https://doi.org/10.1111/j.1475-5661.2010.00405.x.
    10,900 words

March 24
Recogito / spaCy

Total amount of required reading for this meeting: 1,300 words

Geotagging or mining references to places from texts can be done through manual annotation, or by using natural language processing (NLP) software, or by combining the two. Recogito is a popular tool for manual annotation, while spaCy is a powerful NLP tool.

📖 To read before this meeting:

  1. Mallon, Kilian. “Review: Recogito: Visualizing, Mapping, and Annotating Ancient Texts.” Society for Classical Studies (blog), 2019. https://classicalstudies.org/scs-blog/kilian-mallon/review-recogito-visualizing-mapping-and-annotating-ancient-texts.
    1,300 words
  2. “Recogito in 10 Minutes.” Accessed January 7, 2022. https://recogito.pelagios.org/help/tutorial.
  3. Hiippala, Tuomo. “Processing Texts Using SpaCy.” In Applied Language Technology, 2020. https://applied-language-technology.mooc.fi/html/notebooks/part_ii/03_basic_nlp.html.
  4. Mattingly, W.J.B.“The Basics of SpaCy.” In Introduction to SpaCy 3, 2021. http://spacy.pythonhumanities.com/01_01_install_and_containers.html.

March 24
SPARQL assignment due

March 29
Project brainstorming

March 31
Choosing projects

April 5
Project working groups

April 7
Rosenwald Schools

April 12
Cultural Industries

April 14
Wellness Day

Class will not meet today.

April 19
NC High Schools

April 21
Language / Dialect

April 26
Project presentations

May 3
Group project due