A Digital Gazetteer of North Carolina
UNC SILS, INLS 490-186, Spring 2022
January 11
Introductions
Total amount of required reading for this meeting: 1,800 words
Today we’ll introduce ourselves and talk about:
- the idea for the course,
- what each of us hopes to get out of the course, and
- what a course-based research experience means.
In addition to the topics above, please come prepared to talk about:
- yourself, and
- a place in North Carolina that you either know well or wish you knew more about.
📖 To read before this meeting:
-
Shaw, Ryan. “A Digital Gazetteer of North Carolina,” 2019. PDF.
January 13
Tools of the trade
Total amount of required reading for this meeting: 4,700 words
Today we’ll look at Loomio, the collaboration software that we’ll be using this semester, and we’ll start examining the dataset we compiled last spring. We’ll also start talking about some of the various tools for working with data that we’ll be using: text editors, regular expressions, command line utilities, scripting languages, makefiles, and OpenRefine.
📖 To read before this meeting:
-
Powell, William S. Preface: North Carolina Gazetteer (1st Editon), 1968. https://www.ncpedia.org/gazetteer/prefaces#1st.
-
Hill, Michael. Preface to the Second Edition, 2009. https://www.ncpedia.org/gazetteer/prefaces#2nd.
January 18
Catalogs, gazetteers, and maps
Total amount of required reading for this meeting: 10,300 words
Today we’ll talk about how geographic metadata is integrated into catalog records and authority records, the process of georeferencing, and how places can be a challenge for curators of collections.
📖 To read before this meeting:
-
Buckland, Michael, Aitao Chen, Fredric C. Gey, Ray R. Larson, Ruth Mostern, and Vivien Petras. “Geographic Search: Catalogs, Gazetteers, and Maps.” College and Research Libraries 68, no. 5 (2007): 376–387. https://doi.org/10.5860/crl.68.5.376.
-
Buchel, Olha, and Linda L Hill. “Treatment of Georeferencing in Knowledge Organization Systems: North American Contributions to Integrated Georeferencing.” In Proceedings from the North American Symposium on Knowledge Organization, 2009. http://journals.lib.washington.edu/index.php/nasko/article/view/12807/11289.
January 20
OpenRefine
Today we’ll introduce OpenRefine, a tool for cleaning and enriching data.
Before coming to class, please install OpenRefine and complete one of the following OpenRefine tutorials:
- Getting started with OpenRefine by Miriam Posner
- Getting started with OpenRefine by Thomas Padilla
- Using OpenRefine to clean Your data by Jeremy Rue & Richard Koci Hernandez
- OpenRefine tutorial by the Open Data Literacy Project
If you run into trouble installing or running OpenRefine, ask for help on the Loomio site! Don’t wait until class to tell us you weren’t able to get it running.
January 25
Digital gazetteers
Total amount of required reading for this meeting: 10,400 words
Today we’ll introduce the concept of a digital gazetteer as a specific kind of networked knowledge organization system.
📖 To read before this meeting:
-
Hill, Linda L. “Core Elements of Digital Gazetteers: Placenames, Categories, and Footprints.” In Research and Advanced Technology for Digital Libraries, edited by José Borbinha and Thomas Baker, 1923/2000:280–290. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer Berlin Heidelberg, 2000. PDF.
-
Cope, Aaron Straup, and Nathaniel Vaughn Kelso. “Who’s On First · Mapzen.” Blog. Mapzen, August 18, 2015. https://web.archive.org/web/20200106201029/https://www.mapzen.com/blog/who-s-on-first/.
Reading tips
You might also explore the Who’s On First gazetteer: https://www.whosonfirst.org
January 27
Regular expressions
Today we will have a hands-on activity that will involve digging into the digital version of Powell’s gazetteer using regular expressions.
Before coming to class please complete this regular expression tutorial.
📖 To read before this meeting:
-
RegexOne, 2019. https://regexone.com.
January 27
OpenRefine and regular expressions assignment handed out
February 1
GeoJSON
Total amount of required reading for this meeting: 8,800 words
There are several standard formats for recording the spatial “footprints” of places. The one that is easiest to work with is called GeoJSON. As its name implies, GeoJSON uses JSON (JavaScript Object Notation) to record geographic coordinates and polygons.
📖 To read before this meeting:
-
Shaw, Ryan. “The Forms of Descriptions (Part 1),” 2022. PDF.
-
Shah, Raivat. “An Introduction to JSON.” Towards Data Science, 2019. https://towardsdatascience.com/an-introduction-to-json-c9acb464f43e. PDF.
-
NEON. “About Vector Data,” 2020. PDF.
-
MacWright, Tom. “More than You Ever Wanted to Know about GeoJSON,” 2015. https://macwright.com/2015/03/23/geojson-second-bite.html.
February 3
No meeting today
Today we won’t meet since I will be (virtually) attending Graphs and Networks in the Humanities 2022. The amount of reading for next week is the largest of the semester (25,000 words total) so use this time to get a head start on it.
February 3
OpenRefine and regular expressions assignment due
February 8
Historical gazetteers
Total amount of required reading for this meeting: 12,900 words
Historical gazetteers aim to record and describe not just place names currently in use, but also place names used in the past.
📖 To read before this meeting:
-
Southall, Humphrey, Ruth Mostern, and Merrick Lex Berman. “On Historical Gazetteers.” International Journal of Humanities & Arts Computing 5, no. 2 (2011): 127–45. PDF.
-
Shaw, Ryan. “Gazetteers Enriched: A Conceptual Basis for Linking Gazetteers with Other Kinds of Information.” In Placing Names: Enriching and Integrating Gazetteers, 51–64. Indiana University Press, 2016. PDF.
February 10
Linked data
Total amount of required reading for this meeting: 12,100 words
Linked data is less a specific technology than a set of best practices for publishing data on the web. Today we’ll introduce the basic concepts of linked data, including the Resource Description Framework (RDF) data model, and we’ll learn to write RDF by hand using the Turtle syntax.
📖 To read before this meeting:
-
Shaw, Ryan. “The Forms of Descriptions (Part 2),” 2022. PDF.
-
Posner, Miriam. What Is Linked Open Data?, 2021. https://youtu.be/VZBpFiLbi-Y.
-
Stardog. “Graph Data Model,” 2018. https://web.archive.org/web/20190306231928/https://www.stardog.com/tutorials/data-model/.
-
OptionalVerborgh, Ruben, and Seth van Hooland. “Modelling.” In Linked Data for Libraries, Archives and Museums: How to Clean, Link and Publish Your Metadata, 11–70. Facet Publishing, 2014. http://book.freeyourmetadata.org/chapters/1/modelling.pdf.
February 15
Linked data gazetteers
Total amount of required reading for this meeting: 10,400 words
Some digital gazetteers specifically aim to link together disparate datasets that relate to the same places. An approach to publishing known as linked data is well suited to this purpose.
📖 To read before this meeting:
-
Elliott, Tom. “The Pleiadic Gaze: Looking at Archaeology from the Perspective of a Digital Gazetteer.” In Classical Archaeology in the Digital Age – The AIAC Presidential Panel: Panel 12.1. Heidelberg: Propylaeum, 2021. PDF.
-
Simon, Rainer, Leif Isaksen, Elton Barker, and Pau de Soto Cañamares. “The Pleiades Gazetteer and the Pelagios Project.” In Placing Names: Enriching and Integrating Gazetteers, 97–109, 2016. PDF.
-
Grossner, Karl, and Ruth Mostern. “Linked Places in World Historical Gazetteer.” In Proceedings of the 5th ACM SIGSPATIAL International Workshop on Geospatial Humanities, 40–43. GeoHumanities’21. New York, NY, USA: Association for Computing Machinery, 2021. https://doi.org/10.1145/3486187.3490203.
-
Grossner, Karl, Ruth Mostern, Susan Grunewald, and Ali Straub. “Linked Traces: Connecting Places via Historical Events, People, Objects and Concepts,” 2020. http://blog.whgazetteer.org/pubs/traces_poster_Dec2020_a4_600.pdf.
-
OptionalRegalia, Blake, Krzysztof Janowicz, Gengchen Mai, Dalia Varanka, and E. Lynn Usery. “GNIS-LD: Serving and Visualizing the Geographic Names Information System Gazetteer as Linked Data.” In The Semantic Web, edited by Aldo Gangemi, Roberto Navigli, Maria-Esther Vidal, Pascal Hitzler, Raphaël Troncy, Laura Hollink, Anna Tordai, and Mehwish Alam, 528–40. Lecture Notes in Computer Science. Cham: Springer International Publishing, 2018. https://doi.org/10.1007/978-3-319-93417-4_34.
February 17
JSON-LD / Linked Places Format
GeoJSON gives us a convenient way to express geospatial information (“footprints”). RDF gives us a convenient way to express other kinds of statements about places, such as statements about the things (people, organizations, events, other places) to which they are related, or the categories (feature types) to which they belong. JSON-LD is a data format that allows us to combine the strengths of GeoJSON and RDF.
📖 To read before this meeting:
-
Sporny, Manu. What Is JSON-LD?, 2012. https://www.youtube.com/watch?v=vioCbTo3C-4.
-
Grossner, Karl. “The Linked Places Format,” January 10, 2021. https://github.com/LinkedPasts/linked-places#readme.
February 17
Linked geographical data assignment handed out
February 22
Vague places
Total amount of required reading for this meeting: 12,900 words
One of the things that distinguishes gazetteers from more standard Geographic Information System (GIS) tools is that they can be also be used to record information about places with ill-defined or unknown locations: ancient places, mythical places, or—as we will discuss today—vague places.
Content warning “Perceptual Regions in Texas” contains discussion of ethnic slurs.
📖 To read before this meeting:
-
Jordan, Terry G. “Perceptual Regions in Texas.” Geographical Review 68, no. 3 (July 1978): 293. https://doi.org/10.2307/215048.
-
Zelinsky, Wilbur. “North America’s Vernacular Regions.” Annals of the Association of American Geographers 70, no. 1 (March 1980): 1–16. https://doi.org/10.1111/j.1467-8306.1980.tb01293.x.
-
Chisholm, Matt, and Ross Cohen. The Neighborhood Project, 2005. https://hood.theory.org.
-
OptionalRossum, Sonja, and Stephen Lavin. “Where Are the Great Plains? A Cartographic Analysis.” The Professional Geographer 52, no. 3 (August 1, 2000): 543–52. https://doi.org/10.1111/0033-0124.00245.
February 24
Querying linked data: SPARQL
Total amount of required reading for this meeting: 7,300 words
SPARQL is to RDF triplestores what SQL is to relational databases. A good way to get started with SPARQL is to try the Wikidata Query Service.
📖 To read before this meeting:
-
“A Gentle Introduction to the Wikidata Query Service.” In Wikidata, n.d. https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/A_gentle_introduction_to_the_Wikidata_Query_Service.
-
DuCharme, Bob. “Jumping Right in: Some Data and Some Queries.” In Learning SPARQL: Querying and Updating with SPARQL 1.1, 2nd ed., 1–17. Sebastopol: O’Reilly Media, 2013. PDF.
February 24
Linked geographical data assignment due
March 1
Knowledge infrastructure
Total amount of required reading for this meeting: 11,500 words
The motivating idea behind this class is that we can make new things possible by taking the contents of a book (Powell’s gazetteer) and putting it into a different form, using networked computers. This is an old idea. Creating new “knowledge infrastructure” is hard, despite a century of dreaming about it.
📖 To read before this meeting:
-
Wells, H. G.“The Idea of a World Encyclopædia [Excerpt].” Nature 138 (1936): 920–924. https://doi.org/10.1038/138917a0. PDF.
Reading tips
H.G. Wells was an English writer now best known for his science fiction including The Time Machine and The War of the Worlds. During his own lifetime, however, he was prominent as a “futurist” who devoted his literary talents to the development of a progressive vision on a global scale.
This reading is an excerpt from a speech Wells gave in 1936 at The Royal Institution of Great Britain, an organization for scientific education and research founded in 1799.
-
Licklider, J. C. R.“The View from the Half Way Point on a Journey to the Future: A Progress Report on the Interaction between Libraries and Information Technology [Excerpt].” In Large Libraries and New Technological Developments: Proceedings of a Symposium Held on the Occasion of the Inauguration of the New Building of the Royal Library, The Hague, 29 September-1 October 1982, edited by C. Reedijk, Carol K. Henry, and W. R. H. Koops, 13–34. München: K.G. Saur, 1984. PDF.
Reading tips
J.C.R. Licklider was an American psychologist and computer scientist who was one of the first to foresee modern-style interactive computing and its application to all manner of activities. As director of ARPA’s Information Processing Techniques Office he funded research which led to the canonical graphical user interface, and the ARPANET, the direct predecessor to the Internet.
In the early 1960s The Council on Library Resources recruited Licklider to address the question how could technology help libraries gather, index, organize, store and make accessible the growing body of recorded information. Licklider gathered a small team of engineers and psychologists to explore “concepts and problems of libraries of the future”. Licklider wrote a summary report of the project which appeared as the book Libraries of the Future in 1965.
This reading is an excerpt from a 1982 speech in which Licklider looks back at the changes he anticipated in 1965.
-
Hillis, Danny. “A Short Introduction to the Underlay.” Notes from the Knowledge Futures Group (blog), August 2, 2020. https://notes.knowledgefutures.org/pub/underlay-short-intro.
Reading tips
Danny Hillis is an American computer scientist who pioneered parallel computers and their use in artificial intelligence. In 2005, Hillis founded Metaweb Technologies to develop a semantic data storage infrastructure for the Internet, and Freebase, an open, structured database of the world’s knowledge. That company was acquired by Google, and its technology became the basis of the Google Knowledge Graph.
This reading is a post giving an overview of Hillis’ latest project, the Underlay.
March 3
Assignment 2 review / SPARQL
Today we’ll go over assignment 2, and look at SPARQL some more if there is time.
March 8
Wikidata
Total amount of required reading for this meeting: 14,800 words
Wikidata is the open, collaboratively edited knowledge graph that underpins Wikipedia and other Wikimedia projects. For better or for worse, it is becoming a centralized hub for most linked data openly published on the Web.
📖 To read before this meeting:
-
Hyman, Malcolm D., and Jürgen Renn. “Toward an Epistemic Web.” In The Globalization of Knowledge in History. MPRL – Studies. Berlin: Max-Planck-Gesellschaft zur Förderung der Wissenschaften, 2012. https://doi.org/10.34663/9783945561232-36.
-
Vrandečić, Denny, and Markus Krötzsch. “Wikidata: A Free Collaborative Knowledgebase.” Communications of the ACM 57, no. 10 (September 23, 2014): 78–85. https://doi.org/10.1145/2629489.
-
Godby, Jean, Karen Smith-Yoshimura, Bruce Washburn, Kalan Knudson Davis, Karen Detling, Christine Fernsebner Eslao, Steven Folsom, et al. “Creating Library Linked Data with Wikibase: Lessons Learned from Project Passage.” OCLC, December 27, 2021. PDF.
Reading tips
This is an excerpt from a longer report. If you’re interested in reading the whole report, you can find it on the OCLC website.
-
OptionalVanderbilt University. “Learn Wikidata.” Accessed January 7, 2022. https://www.learnwikidata.net/.
March 10
More SPARQL
Total amount of required reading for this meeting: 14,100 words
Today we’ll continue learning SPARQL and get a little more advanced.
📖 To read before this meeting:
-
DuCharme, Bob. “SPARQL Queries: A Deeper Dive.” In Learning SPARQL: Querying and Updating with SPARQL 1.1, 2nd ed., 47–102. Sebastopol: O’Reilly Media, 2013. PDF.
March 10
SPARQL assignment handed out
March 15
Spring break
March 17
Spring break
March 22
Geotagging text
Total amount of required reading for this meeting: 16,500 words
Gazetteers can be combined with annotation tools and/or natural language processing (NLP) tools to “geotag” text. Geotagging involves identifying place names mentioned in a text in order to get get a sense of the spatial coverage of the content, or to link the the text to other relevant texts, images, or data.
📖 To read before this meeting:
-
Rambsy, Kenton. “Geo-Tagging Edward P. Jones & Washington, DC.” In Lost in the City: An Exploration of Edward P. Jones’s Short Fiction, by Kenton Rambsy and Peace Ossom-Williamson. Publishing Without Walls, 2019. https://iopn.library.illinois.edu/scalar/lost-in-the-city-a-exploration-of-edward-p-joness-short-fiction-/chapter-1—-section-2?path=chapter-1—-data-mining-edward-p—joness-short-fiction.
Reading tips
You can find the raw data created for this project at https://doi.org/10.18738/T8/BB70Z2.
-
Foka, Anna, Osman Cenk Demiroglu, Elton Barker, Nasrin Mostofian, Kyriaki Konstantinidou, Brady Kiesling, Linda Talatas, and Kajsa Palm. “Visualizing Pausanias’s Description of Greece with Contemporary GIS.” Digital Scholarship in the Humanities, November 25, 2021, fqab093. https://doi.org/10.1093/llc/fqab093.
-
Cooper, David, and Ian N Gregory. “Mapping the English Lake District: A Literary GIS.” Transactions of the Institute of British Geographers 36, no. 1 (2011): 89–108. https://doi.org/10.1111/j.1475-5661.2010.00405.x.
March 24
Recogito / spaCy
Total amount of required reading for this meeting: 1,300 words
Geotagging or mining references to places from texts can be done through manual annotation, or by using natural language processing (NLP) software, or by combining the two. Recogito is a popular tool for manual annotation, while spaCy is a powerful NLP tool.
📖 To read before this meeting:
-
Mallon, Kilian. “Review: Recogito: Visualizing, Mapping, and Annotating Ancient Texts.” Society for Classical Studies (blog), 2019. https://classicalstudies.org/scs-blog/kilian-mallon/review-recogito-visualizing-mapping-and-annotating-ancient-texts.
-
“Recogito in 10 Minutes.” Accessed January 7, 2022. https://recogito.pelagios.org/help/tutorial.
-
Hiippala, Tuomo. “Processing Texts Using SpaCy.” In Applied Language Technology, 2020. https://applied-language-technology.mooc.fi/html/notebooks/part_ii/03_basic_nlp.html.
-
Mattingly, W.J.B.“The Basics of SpaCy.” In Introduction to SpaCy 3, 2021. http://spacy.pythonhumanities.com/01_01_install_and_containers.html.
March 24
SPARQL assignment due
March 29
Project brainstorming
March 31
Choosing projects
April 5
Project working groups
April 7
Rosenwald Schools
April 12
Cultural Industries
April 14
Wellness Day
Class will not meet today.