Foundations of Information Science

UNC SILS, INLS 201, Fall 2021

Week starting August 24
First meeting

View slides Updated Monday 8/23 7:16 PM

Our first meeting of the semester is at 11AM on Tuesday August 24th in Chapman Hall 211.

During our first meeting the instructors will introduce ourselves, and we’ll go over the structure of the course, access to resources such as the weekly readings, and guidelines for success.

August 24
Recitations begin meeting

All recitation sections will begin meeting this week. See the recitation schedule.

What is information?

Information is the result of a process that begins with a bunch of meaningful stuff and ends with something usable. The “bunch of meaningful stuff” can be practically anything: words on pages, recorded sounds, photographic images, tables of numbers, records of transactions, 3D models… the list goes on. The “something usable” is what we call information.

The process that leads from “stuff” to “information” can vary widely, but in this course we’ll focus on situations in which there is “too much stuff” for one person to carry out the process on their own. When that’s the case, people need to work together and construct systems to carry out the process of producing usable information. We’ll call systems that select usable information out of a mass of too much stuff “selecting systems.”

In the first part of the course we’ll introduce some concepts and terminology that will allow us to be a bit more precise when thinking and talking about what it is that selecting systems do.

Week starting August 31
Meaningful stuff: documents

View slides Updated Monday 8/30 4:04 PM

Total amount of required reading for this week: 9,200 words

There’s a lot of stuff in the world, but only some of it is meaningful. If you see someone eating soup, you’re unlikely to ask them, “What does that soup mean?” On the other hand, if you see someone painting a mural, asking them what it means would be perfectly appropriate.

Not all meaningful stuff sticks around. Spoken words or messages written in sand will disappear without a trace unless they are somehow captured and made persistent. We’ll refer to persistent, meaningful stuff as “documents.”

The word “documents” might bring to mind things like college applications or tax forms. Those are certainly documents, but many other things can be documents too: photographs, pop songs, tweets, video games, even zoo animals. What makes some thing a document is not some special property it has, but the way it is used: how it is created, exchanged, understood, modified, collected, described, stored, etc.

A selecting system consists of various operations involving documents: collecting and creating them, transforming them, and arranging them.

📖 To read before this meeting:

  1. Buckland, Michael. “Introduction.” In Information and Society, 1–19. MIT Press, 2017. PDF.
    3,800 words
  2. Buckland, Michael. “Document and Evidence.” In Information and Society, 21–49. MIT Press, 2017. PDF.
    5,400 words
  3. Optional
    Brown, John Seely, and Paul Duguid. “Reading the Background.” In The Social Life of Information, 173–205. Boston: Harvard Business School Press, 2000. PDF.
    10,600 words
    Reading tips

    In this chapter from their book The Social Life of Information, John Seely Brown and Paul Duguid explain why despite 50+ years of digital computers and networks, we still use a lot of paper documents.

Week starting September 7
Producing meaning: semiosis

View slides Updated Monday 9/6 6:36 PM

Total amount of required reading for this week: 6,500 words

Documents are persistent meaningful stuff. The process through which something comes to have meaning is known as semiosis, and semiotics is the study of that process.

Semiotics does not provide a theory for explaining semiosis. What it provides are conceptual tools for thinking more precisely about the production of meaning.

Selecting systems take a bunch of meaningful stuff as input and produce as output usable information—the meaning of which is somehow related to the meaning of the stuff that was input. Semiotic concepts are thus particularly useful for thinking about what selecting systems do.

📖 To read before this meeting:

  1. Shaw, Ryan. “Semiosis at an Intersection.” In Selecting Systems, 2021. PDF.
    2,600 words
  2. Shaw, Ryan. “Semiosis on the Front Page.” In Selecting Systems, 2021. PDF.
    3,900 words
  3. Optional
    Daylight, Russell. “The Semiotic Abstraction.” Semiotica 2017, no. 218 (January 26, 2017). PDF.
    3,700 words
    Reading tips

    This article compares how the concept of abstraction is understood by computer scientists and semioticians. The author argues that semiotic systems should be understood as “machines for creating differences,” of which computers are one kind.

Week starting September 14
Measuring patterns: information theory

View slides Updated Monday 9/13 9:34 PM

Total amount of required reading for this week: 6,600 words

Semiotics provides conceptual tools for analyzing the meaning of “meaningful stuff.” Information theory provides conceptual tools for analyzing the stuff.

Information theory starts from the recognition that in order for stuff to be potentially meaningful, it has to be patterned in some way. Information theory is the study of those patterns, and it provides mathematical tools for comparing and measuring those patterns.

Those mathematical tools have turned out to be useful for many purposes, including for the construction of selecting systems. But unlike semiotics, information theory has nothing at all to say about meaning—it is concerned only with patterns, not with what those patterns might mean.

In other words, “information theory” is a misleading name. The word “information” in “information theory” does not mean “the result of a process that begins with a bunch of meaningful stuff and ends with something usable.” A better name for information theory would be “pattern theory.”

📖 To read before this meeting:

  1. Weaver, Warren. “Recent Contributions to The Mathematical Theory of Communication,” September 1949. PDF.
    6,000 words
    Reading tips

    Claude Shannon, an engineer who worked at Bell Labs, developed a mathematical theory of communication that came to be known as “information theory.” The papers in which Shannon developed his theory were originally published in 1948 in two parts in the Bell System Technical Journal. A year later, Warren Weaver published this summary of Shannon’s work.

    There is some math in this report. If you’re not mathematically inclined, just skip over it—it isn’t necessary to understand the math in order to understand the basic ideas.

  2. Shannon, Claude. “The Bandwagon.” IRE Transactions on Information Theory 2, no. 3 (1956): 3. PDF.
    600 words
    Reading tips

    About six years after information theory made its debut, Shannon wrote this one-page editorial.

  3. Optional
    Eckersley, Peter. “A Primer on Information Theory and Privacy.” Electronic Frontier Foundation, August 10, 2020. https://www.eff.org/deeplinks/2010/01/primer-information-theory-and-privacy.
    700 words
    Reading tips

    This short article use the information theoretic concept of entropy to explain why it is so easy to identify individual people based on their web browsing activity.

  4. Optional
    Gleick, James. “Information Theory.” In The Information, 1st ed., 204–232. New York: Pantheon Books, 2011. PDF.
    9,100 words
    Reading tips

    This chapter from science writer James Gleick’s book The Information is an engaging mini-biography of Claude Shannon, but it is also an accessible introduction to information theory.

September 17
Exam #1 handed out

Week starting September 21
Exam #1

There is no new material this week, as you will be working on exam #1.

During our usual lecture time, there will be an open Q&A, before and during which you can submit questions about difficulties you might be having with the exam.

The recitations this week will also focus on discussing and helping each other think about how to answer the exam questions.

September 24
Exam #1 due

Systematically grouping documents

Both semiotics and information theory provide tools for understanding how documents are built up out of groups of more basic meaningful things: a text is a group of words, an image is a group of figures and grounds, an electronic record is a group of keys and values…

A selecting system carries out various operations on these groups: collecting, arranging, and transforming them into new groups (and groups of groups, and groups of groups of groups…). The goal is to select out of a mass of stuff some specific group: the group of videos that will keep you watching, the group of students that are likely to succeed in college, the group of hypotheses consistent with the data.

Designing and implementing a selecting system typically requires:

  1. The development of a systematic way to define and describe groups, and
  2. a way to formally describe and reason about operations on those groups.

Building upon what we learned in the first part of the course, in the second part we’ll examine these two requirements.

We’ll start by considering how we draw distinctions and group things as we think and communicate about the world around us, and how the desire to coordinate these activities across broader scales motivates standardization and systematization.

Then, we'll consider and contrast two different ways of formally describing operations on groups: Boolean algebra (deductively describing and reasoning about operations on groups) and Bayesian inference (inductively describing and reasoning about operations on groups).

Week starting September 28
Establishing systematic groups

View slides Updated Tuesday 9/28 8:57 AM

Total amount of required reading for this week: 10,100 words

Categories are that groups that have names. This week we’ll examine how loose, everyday categories become standardized and systematized into classifications, in order to support some kind of collective action.

For example, scientists seek to develop universal classifications rather than relying on locally-specific categories. Establishing and maintaining universal classifications is difficult, as the history of the scientific classification of clouds demonstrates. It’s not just a matter of agreeing on categories, but also a matter of establishing and documenting observational practices that make clouds classifiable.

Science is not the only institution that seeks to systematically classify things in order to coordinate collective action across great distances and over long periods of time. Law, medicine, trade and finance, engineering—every variety of large-scale coordination has its own techniques of making things classifiable (though we can identify some common features).

📖 To read before this meeting:

  1. Daston, Lorraine. “Cloud Physiognomy.” Representations 135, no. 1 (August 1, 2016): 45–71. https://doi.org/10.1525/rep.2016.135.1.45.
    10,100 words
  2. Optional
    Dupré, John. “Scientific Classification.” Theory, Culture & Society 23, no. 2–3 (May 1, 2006): 30–32. PDF.
    1,200 words
  3. Optional
    Glushko, Robert J, Paul P Maglio, Teenie Matlock, and Lawrence W Barsalou. “Categorization in the Wild.” Trends in Cognitive Sciences 12, no. 4 (April 2008): 129–35. http://dx.doi.org/10.1016/j.tics.2008.01.007.
    5,000 words

Week starting October 5
Deductively reasoning about groups: Boolean algebra

View slides Updated Monday 10/4 8:03 PM

Total amount of required reading for this week: 14,400 words

This week we will look at one common way of formally describing operations on groups: Boolean algebra.

Boolean algebra relies on the following “common-sense” assumptions:

  • The world consists of individual objects or entities.
  • These entities have attributes that can be counted and described.
  • Entities can be sorted into groups based on the presence or absence or values of their attributes.

If we make these assumptions, we can define groups using Boolean algebraic expressions. We can then manipulate these expressions according to the rules of Boolean algebra to deductively reason about operations on those groups (for example combining, intersecting, and negating them).

We call this formal reasoning because it depends only on the forms (the symbols and operators) of the mathematical expressions—the actual groups of things that those symbols represent are irrelevant.

📖 To read before this meeting:

  1. Hunter, Eric. “What Is Classification? / Classification in an Information System / Faceted Classification.” In Classification Made Simple, 3rd ed. Farnham: Ashgate, 2009. PDF.
    5,600 words
  2. Berkeley, Edmund C. “Boolean Algebra (the Technique for Manipulating AND, OR, NOT and Conditions).” The Record 26 part II, no. 54 (1937): 373–414. PDF.
    3,400 words
    Reading tips

    This article is by Edmund Berkeley, a pioneer of computer science and co-founder of the Association for Computing Machinery, which is still the primary scholarly association for computer scientists. But he wrote this article in 1937, before he became a computer scientist—because computers had yet to exist. At the time he was a mathematician working at the Prudential life insurance company, where he recognized the usefulness of Boolean algebra for modeling insurance data. He published this article in a professional journal for actuaries (people who compile and analyze statistics and use them to calculate insurance risks and premiums).

    Berkeley uses some frightening-looking mathematical notation in parts of this article, but everything he discusses is actually quite simple. The most important parts are:

    pages 373–374, where he gives a simple explanation of Boolean algebra,

    pages 380–381, where he considers practical applications of Boolean algebra, and

    pages 383 on, where he pays close attention to translation back and forth between Boolean algebra and English.

  3. Kent, William. “Attributes / Types and Categories and Sets / Models.” In Data and Reality, 77–94. Amsterdam: North-Holland, 1978. PDF.
    5,400 words
    Reading tips

    This is an excerpt from one of my favorite books, Data and Reality by Bill Kent. Kent was a computer programmer and database designer at IBM and Hewlett-Packard, during the era when the database technologies we use today were first being developed. He thought deeply and carefully about the challenges of data modeling and management, which he recognized were not primarily technical challenges.

    The fixed-width typewriter font makes this reading look old-fashioned, but nothing in it is out-of-date. These are precisely the same issues data modelers and “data scientists” struggle with today.

  4. Optional
    Evans, Eric. “Crunching Knowledge.” In Domain-Driven Design. Boston: Addison-Wesley, 2004. PDF.
    3,000 words

Week starting October 12
Wellness break

Please take time this week to care for yourself and those you love.

Week starting October 19
Fall break

Due to Fall Break neither the lecture nor recitations will meet.

In lieu of coming to lecture, please watch the first 37 minutes of this lecture from the Spring semester, which will inform you about what you’ll be working on during the last part of this course.

Week starting October 26
Inductively reasoning about groups: Bayesian inference

View slides Updated Monday 10/25 5:27 PM

Total amount of required reading for this week: 18,400 words

Boolean algebra makes it possible to formally specify precise rules for grouping. Yet it’s often the case that we are able to distinguish different groups, but we cannot precisely specify rules for doing so.

An example is the grouping of texts by subject. Grouping together books or journal articles that are about the same things doesn’t seem so difficult, assuming that we can read and understand them. But it turns out to be difficult to precisely specify rules for doing this.

As an alternative one can approach the problem statistically: perhaps there are patterns of correlation between the attributes of texts (for example, the words that appear in them) and the way that they are grouped by subject. In order to find such patterns, we need some evidence: a collection of texts that have already been grouped, which we can then analyze to look for correlations between their attributes and the groups they’ve been assigned to.

Bayesian inference is the mathematical formalization of this process of inductively reasoning about groups: identifying patterns of correlation in existing groups, and then applying these patterns to sort new things into those groups.

📖 To read before this meeting:

  1. Wilson, Patrick. “Subjects and the Sense of Position.” In Two Kinds of Power, 69–92. Berkeley: University of California Press, 1968. PDF.
    11,900 words
    Reading tips

    In this chapter Patrick Wilson considers the problems that arise when one tries to come up with systematic rules for classifying texts by subject.

    Wilson can be a bit long-winded, but his insights are worth it. (You can skip the very long footnotes, so this reading is actually shorter than it looks.) What Wilson calls a “writing” is more typically referred to as a text. In this chapter he is criticizing the assumptions librarians make when cataloging texts by subject. The “sense of position” in the title of the chapter refers to the librarian’s sense of where in a classification scheme a text should be placed. Although he is talking about library classification, everything Wilson says is also applicable to state-of-the-art machine classification of texts today.

  2. Maron, M. E.“Automatic Indexing: An Experimental Inquiry.” Journal of the ACM 8, no. 3 (July 1961): 404–17. https://doi.org/10.1145/321075.321084.
    6,500 words
    Reading tips

    Bill Maron was an engineer at missile manufacturer Ramo-Wooldridge when he began investigating statistical methods for classifying and retrieving documents. In this paper he describes a method for statistically modeling the subject matter of texts. He introduces the basic ideas behind what is now known as a Bayesian classifier, a technique that is still widely used today for a variety of automatic classification tasks from spam filtering to face recognition.

    Trigger warning: math. The math is relatively basic, and if you’ve studied any probability, you should be able to follow it. But if not, just skip it: Maron explains everything important about his experiment in plain English. Pay extra attention to what he says about “clue words.”

  3. Optional
    Smucker, Mark D. “Information Representation.” In Interactive Information Seeking, Behaviour and Retrieval, edited by Ian Ruthven and Diane Kelly, 77–93. London: Facet Pub., 2011. PDF.

October 27
Exam #2 handed out

November 1
Exam #2 due

Selecting systems in the wild

During the last part of the course, you and your classmates will work together on identifying and analyzing selecting systems “in the wild.”

We’ll begin by reviewing and refining our model of how selecting systems work by carrying out various operations on groups of documents, collecting, arranging, and transforming them into new groups.

Then we’ll take another look at Boolean algebra and Bayesian inference. We’ll think about how these two different formal techniques for reasoning about groups can be used to produce different kinds of selecting systems.

Next, we'll consider the trade-offs between using human and machine labor in selecting systems.

Finally, we'll reflect on the relationship between selecting systems and society. Do new kinds of selecting systems cause changes in culture, politics, and society? Or do social, political, and cultural norms and practices determine the kind of selecting systems we create?

Week starting November 2
Selecting systems

View slides Updated Monday 11/1 7:06 PM

Total amount of required reading for this week: 8,100 words

This week we’ll look at examples of selecting systems and try to analyze them, reviewing and refining our model of how selecting systems work by carrying out various operations on groups of documents, collecting, arranging, and transforming them into new groups.

This will also be the week that the class splits into teams of investigators, each of which will choose a selecting system to analyze.

📖 To read before this meeting:

  1. Buckland, Michael, and Christian Plaunt. “On the Construction of Selection Systems.” Library Hi Tech 12, no. 4 (1994): 15–28. PDF.
    8,100 words
    Reading tips

    An examination of the structure and components of information storage and retrieval systems and information filtering systems. Argues that all selection systems can be represented in terms of combinations of a set of basic components. The components are of only two types: representations of data objects and functions that operate on them.

Week starting November 9
Comparing selecting techniques

View slides Updated Tuesday 11/9 9:20 AM

Total amount of required reading for this week: 13,600 words

Boolean algebra and Bayesian inference are two different formal techniques for reasoning about groups. These techniques can be applied to produce different kinds of selecting systems. Why might one technique be used rather than the other? How and why might the two techniques be combined in a selecting system?

📖 To read before this meeting:

  1. Hjørland, Birger. “Classical Databases and Knowledge Organization: A Case for Boolean Retrieval and Human Decision-Making during Searches.” Journal of the Association for Information Science and Technology 66, no. 8 (August 1, 2015): 1559–75. PDF.
    2,800 words
    Reading tips

    This is an excerpt from an article arguing that, though they are perceived as outdated, selection systems based on Boolean algebra (more commonly referred to as Boolean retrieval systems) are preferable for some purposes because they offer more opportunities for human decision-making during searches.

  2. Rieder, Bernhard. “Interested Learning.” In Engines of Order, 235–64. Amsterdam: Amsterdam University Press, 2020. PDF.
    10,800 words
    Reading tips

    This reading scrutinizes Bill Maron’s Bayesian classifier, identifying it as an example of a technique that is now applied for many purposes that differ quite a bit from Maron’s.

November 9
Project proposals due

Project proposals must be submitted to your recitation instructor before your recitation meets this week.

Week starting November 16
Automation of selecting labor

View slides Updated Monday 11/15 7:05 PM

Total amount of required reading for this week: 6,000 words

Selecting usable information from a mass of material involves labor. This week we’ll consider the question of automation: what kinds of selecting labor can be done by people, and what kinds can be done by machines? What kinds of selecting labor should be done by people, and what kinds should be done by machines?

📖 To read before this meeting:

  1. Irani, Lilly. “Justice for ‘Data Janitors.’” Public Books, January 15, 2015. https://www.publicbooks.org/zaloom-tribute-2021-justice-for-data-janitors/.
    4,400 words
  2. Resnikoff, Jason. “How ‘Automation’ Made America Work Harder.” Zócalo Public Square, September 2, 2021. https://www.zocalopublicsquare.org/2021/09/02/automation-revolution-america-labor-work-history/ideas/essay/.
    1,600 words
  3. Optional
    Roberts, Sarah H. “Understanding Commercial Content Moderation.” In Behind The Screen, 33–72. New Haven: Yale University Press, 2019. PDF.
    20,200 words
    Reading tips

    In this chapter from her book Behind the Screen, Sarah Roberts provides an overview of commercial content moderation at companies like Facebook. She explains what commercial content moderation is, who does it, and the conditions under which they work.

  4. Optional
    Seligman, Ben B. “The Social Cost of Cybernation.” In The Evolving Society: The Proceedings of the First Annual Conference on the Cybercultural Revolution—Cybernetics and Automation, edited by Alice Mary Hilton, 159–66. New York: Institute for Cybercultural Research, 1966. PDF.
    2,600 words
  5. Optional
    Boggs, James. “The Negro and Cybernation.” In The Evolving Society: The Proceedings of the First Annual Conference on the Cybercultural Revolution—Cybernetics and Automation, edited by Alice Mary Hilton, 167–72. New York: Institute for Cybercultural Research, 1966. PDF.
    1,900 words

Week starting November 23
Thanksgiving

Neither lecture nor recitations will meet this week due to the Thanksgiving holiday.

Week starting November 30
Project check-ins

As classes end this week, neither lecture nor recitations will meet. However, each project group must schedule a meeting with one of the instructors to discuss their progress so far.

December 2
Project check-in deadline

Your team must meet with one of the instructors by Thursday, December 2 to discuss progress on your project.

December 9
Project presentation videos due

Project presentation videos must be submitted via Panopto by 11:59PM on Thursday, December 9.

December 9
Selecting system analysis due