Foundations of Information Science

UNC SILS, INLS 201, Spring 2021

Week starting January 18
Recitation sections begin meeting

All recitation sections will begin meeting the week starting January 18.

See the recitation schedule and Zoom links.

January 19
Synchronous all hands meeting

On Tuesday, January 19, starting at 9:30AM Eastern time, we will have our only synchronous meeting of all the INLS 201 sections.

The instructors will introduce ourselves, and we’ll go over the syllabus, class policies, and how to use the course website.

Zoom link: https://unc.zoom.us/j/92073259583

The password for the meeting will be posted to Sakai.

The meeting will be recorded.

After the meeting is over, I’ll post any slides I showed to this website, and (if you are logged in) you will see a link to a PDF of them below.

What is information?

Information is the result of a process that begins with a mass of material and ends with something usable. The “mass of material” can be practically anything: words on pages, recorded sounds, photographic images, tables of numbers, and so on. The “something usable” is what we call information.

The process that leads from the former to the latter can vary widely, but in this course we’ll focus on situations in which there seems to be “too much” material for one person to carry out the process on their own. When that’s the case, people need to work together and construct systems to carry out the process of producing information. We’ll call systems that select usable information out of a mass of too much material “selection systems.”

In the first part of the course we’ll consider the social practices that produce “too much” material, motivating the development of systems to select usable information from that material. We’ll then look at two complementary theories for understanding how that selection process works.

Week starting January 25
Masses of material: documentation

View slides Updated Thursday 1/21 2:57 PM

Total amount of required reading for this week: 9,200 words

In everyday English, to document usually means to provide material evidence that serves as a record, as in: “If you want to apply for a passport, you will have to document your citizenship status.”

In an information technology context, documentation typically means the written instructions that accompany software or hardware, as in: “Please consult the user documentation before using this software.”

But we can also use the word documentation to refer more broadly to all kinds of human practices of “documenting” that produce masses of material: words on pages, recorded sounds, photographic images, tables of numbers, and so on. The perspective of documentation emphasizes the various things people do with this stuff, and how these practices fit into large social and historical contexts.

📖 To read before this meeting:

  1. Buckland, Michael. “Introduction.” In Information and Society, 1–19. MIT Press, 2017. PDF.
    3,800 words
  2. Buckland, Michael. “Document and Evidence.” In Information and Society, 21–49. MIT Press, 2017. PDF.
    5,400 words
  3. Optional
    Brown, John Seely, and Paul Duguid. “Reading the Background.” In The Social Life of Information, 173–205. Boston: Harvard Business School Press, 2000. PDF.
    10,600 words
    Reading tips

    In this chapter from their book The Social Life of Information, John Seely Brown and Paul Duguid explain why despite 50+ years of digital computers and networks, we still use a lot of paper documents.

January 26
Open Q&A sessions begin

On Tuesday, January 26, 9:30–10:30AM Eastern time, we will have the first of our weekly open Q&A sessions.

These are an opportunity for you to ask questions about the readings, lectures, or assignments. Drop in anytime.

Zoom link: https://unc.zoom.us/j/804236122

The password will be posted to Sakai.

Week starting February 1
Producing meaning and significance: semiotics

View slides Updated Thursday 1/28 4:18 PM

Total amount of required reading for this week: 17,800 words

Usable information is meaningful. How does some mass of material come to have meaning? Semiotics is the study of how phenomena take on meaning or signify.

There are a wide range of semiotic theories, but all of these theories are characterized by a focus on how structural relations produce meaning. Semioticians are interested in how perceptible signs take on meaning as a function of the structural role they play in a larger system. They look at how signs are organized into languages or codes, and how signs and codes operate within our broader cultures.

Ideas from semiotics are useful for thinking about selection systems. Data scientists, “user experience” designers, computer programmers and other information professionals employ semiotic concepts all the time, though many of them are unaware of it.

📖 To read before this meeting:

  1. Johansen, Jørgen Dines, and Svend Erik Larsen. “Code and Structure: From Difference to Meaning.” In Signs in Use, 7–23. Routledge, 2002. https://ebookcentral.proquest.com/lib/unc/reader.action?docID=240576&ppg=16.
    5,100 words
    Reading tips

    Pay attention to the italicized terms in this reading. You may find it useful to look them up in the book’s glossary (see the optional reading below).

    If you find yourself getting confused, try to focus on understanding how “codes” work in the examples of the tic-tac-toe game and the pedestrian trying to cross at an intersection.

    You can stop reading this chapter on page 16, before the section entitled “Code and structure.” After that point, the chapter gets into some advanced topics that we won’t be covering in this course.

  2. Johansen, Jørgen Dines, and Svend Erik Larsen. “Signs: From Tracks to Words.” In Signs in Use, 24–52. Routledge, 2002. https://ebookcentral.proquest.com/lib/unc/reader.action?docID=240576&ppg=33.
    12,700 words
    Reading tips

    Pay attention to the italicized terms in this reading. You may find it useful to look them up in the book’s glossary (see the optional reading below).

  3. Optional
    Johansen, Jørgen Dines, and Svend Erik Larsen. “Glossary.” In Signs in Use, 199–222. Routledge, 2002. https://ebookcentral.proquest.com/lib/unc/reader.action?docID=240576&ppg=208.
  4. Optional
    Daylight, Russell. “The Semiotic Abstraction.” Semiotica 2017, no. 218 (January 26, 2017). PDF.
    3,700 words
    Reading tips

    This article compares how the concept of abstraction is understood by computer scientists and semioticians. The author argues that semiotic systems should be understood as “machines for creating differences,” of which computers are one kind.

Week starting February 8
Statistically measuring pattern: information theory

View slides Updated Thursday 2/4 7:04 PM

Total amount of required reading for this week: 6,600 words

Engineers building systems for transmitting signals through wires and over radio waves developed techniques for compression (eliminating redundancy) and clarification (eliminating noise). Clarification allows signals to be communicated despite flaws in the transmission process. Compression allows more signal to be transmitted in a smaller amount of time. The mathematical theory behind these techniques is known as information theory

The process of selecting usable information out of a mass of material can also looked at from the perspective of information theory. An advantage of doing this is that it allows us to characterize the selection process mathematically, opening the door to powerful techniques of formalization and automation. However, unlike semiotics, information theory has nothing to say about significance—it is concerned only with repeating patterns, not with what those patterns mean.

📖 To read before this meeting:

  1. Weaver, Warren. “Recent Contributions to The Mathematical Theory of Communication,” September 1949. PDF.
    6,000 words
    Reading tips

    Claude Shannon, an engineer who worked at Bell Labs, developed a mathematical theory of communication that came to be known as “information theory.” The papers in which Shannon developed his theory were originally published in 1948 in two parts in the Bell System Technical Journal. A year later, Warren Weaver published this summary of Shannon’s work.

    There is some math in this report. If you’re not mathematically inclined, just skip over it—it isn’t necessary to understand the math in order to understand the basic ideas.

  2. Shannon, Claude. “The Bandwagon.” IRE Transactions on Information Theory 2, no. 3 (1956): 3. PDF.
    600 words
    Reading tips

    About six years after information theory made its debut, Shannon wrote this one-page editorial.

  3. Optional
    Gleick, James. “Information Theory.” In The Information, 1st ed., 204–232. New York: Pantheon Books, 2011. PDF.
    9,100 words
    Reading tips

    This chapter from science writer James Gleick’s book The Information is an engaging mini-biography of Claude Shannon, but it is also an accessible introduction to information theory.

  4. Optional
    Eckersley, Peter. “A Primer on Information Theory and Privacy.” Electronic Frontier Foundation, August 10, 2020. https://www.eff.org/deeplinks/2010/01/primer-information-theory-and-privacy.
    700 words
    Reading tips

    This short article use the information theoretic concept of entropy to explain why it is so easy to identify individual people based on their web browsing activity.

Week starting February 15
Spring break I

Due to the University wellness days, there will be no lecture posted this week, and recitations will not meet. Relax!

February 17
Exam 1 handed out

Building patterns out of distinctions

In the first part of the course we looked at the practices of documentation that produce “too much” material, motivating the development of systems to select usable information from that material. We then introduced two complementary theories for understanding how that selection process works: semiotics and information theory.

Both semiotics and information theory show how the process of selecting usable information from a mass of material involves distinguishing differences, using these differences to separate things into groups, and then using these groups to build “informative” patterns or structures.

In the second part of this course we’ll look at that process more closely. We’ll start by considering how we draw distinctions, group things, and build patterns in everyday life as we think and communicate about the world around us. Then we’ll look at how collective action motivates the formalization and systematization of those activities, taking the collective pursuit of scientific knowledge as a paradigmatic example.

Finally, we'll examine and contrast two different ways of formally modeling the process of building patterns out of basic distinctions: Boolean algebra (putting things into groups based on their attributes) and Bayesian inference (hypothesizing about the groups to which things might belong, based on past evidence).

Week starting February 22
Drawing distinctions in everyday life

View slides Updated Thursday 2/18 3:12 PM

Total amount of required reading for this week: 16,900 words

Thinking and communicating require drawing boundaries and making distinctions—categorizing and classifying the world around us. Categorization and classification are both cognitive operations and social processes. How we think is inseparable from what we do with others.

📖 To read before this meeting:

  1. Zerubavel, Eviatar. “Introduction / Islands of Meaning / The Great Divide / The Social Lens.” In The Fine Line, 1–17, 21–24, 61–80. New York: Free Press, 1991. PDF.
    16,900 words
    Reading tips

    Eviatar Zerubavel is a cognitive sociologist, meaning that he studies how social processes shape our thinking, and he’s written a number of fascinating and accessible books on the topic. These are selections from his book The Fine Line about making distinctions in everyday life.

  2. Optional
    Lakoff, George. “The Importance of Categorization / From Wittgenstein to Rosch.” In Women, Fire, and Dangerous Things: What Categories Reveal about the Mind. Chicago: University of Chicago Press, 1987. PDF.
    21,600 words

February 26
Exam 1 due

Week starting March 1
Systematically making things classifiable

View slides Updated Thursday 2/25 2:30 PM

Total amount of required reading for this week: 10,100 words

Last week we looked at categorization and classification in everyday life. This week we’ll look at efforts to systematically make things categorizable and classifiable at scale beyond everyday life, in order to engage in some kind of collective action.

For example, scientists seek to develop universal classifications rather than relying on locally-specific ones. Establishing and maintaining universal classifications is difficult, as the history of the scientific classification of clouds demonstrates. It is not only a matter of classifying clouds, but also a matter of making clouds classifiable.

Science is not the only institution that seeks to systematically classify things in order to coordinate across great distances and over long periods of time. Law, trade and finance, engineering—every variety of large-scale coordination has its own techniques of making things classifiable (though we can identify some common features).

📖 To read before this meeting:

  1. Daston, Lorraine. “Cloud Physiognomy.” Representations 135, no. 1 (August 1, 2016): 45–71. https://doi.org/10.1525/rep.2016.135.1.45.
    10,100 words
  2. Optional
    Dupré, John. “Scientific Classification.” Theory, Culture & Society 23, no. 2–3 (May 1, 2006): 30–32. PDF.
    1,200 words
  3. Optional
    Glushko, Robert J, Paul P Maglio, Teenie Matlock, and Lawrence W Barsalou. “Categorization in the Wild.” Trends in Cognitive Sciences 12, no. 4 (April 2008): 129–35. http://dx.doi.org/10.1016/j.tics.2008.01.007.
    5,000 words

Week starting March 8
Spring break II

Due to the University wellness days, there will be no lecture posted this week, and recitations will not meet. Relax!

Week starting March 15
Calculating with classes: Boolean algebra

View slides Updated Thursday 3/11 2:32 PM

Total amount of required reading for this week: 14,400 words

This week we will look at one common way of formally modeling the process of building patterns out of basic distinctions: Boolean algebra.

Boolean algebra relies on the following “common-sense” assumptions:

  • The world consists of individual objects or entities.
  • These entities have attributes that can be counted and described.
  • Entities can be sorted into classes based on the presence or absence or values of their attributes.

If we make these assumptions, we can translate our classifications of the world into mathematical expressions using Boolean algebra.

📖 To read before this meeting:

  1. Hunter, Eric. “What Is Classification? / Classification in an Information System / Faceted Classification.” In Classification Made Simple, 3rd ed. Farnham: Ashgate, 2009. PDF.
    5,600 words
  2. Berkeley, Edmund C. “Boolean Algebra (the Technique for Manipulating AND, OR, NOT and Conditions).” The Record 26 part II, no. 54 (1937): 373–414. PDF.
    3,400 words
    Reading tips

    This article is by Edmund Berkeley, a pioneer of computer science and co-founder of the Association for Computing Machinery, which is still the primary scholarly association for computer scientists. But he wrote this article in 1937, before he became a computer scientist—because computers had yet to exist. At the time he was a mathematician working at the Prudential life insurance company, where he recognized the usefulness of Boolean algebra for modeling insurance data. He published this article in a professional journal for actuaries (people who compile and analyze statistics and use them to calculate insurance risks and premiums).

    Berkeley uses some frightening-looking mathematical notation in parts of this article, but everything he discusses is actually quite simple. The most important parts are:

    pages 373–374, where he gives a simple explanation of Boolean algebra,

    pages 380–381, where he considers practical applications of Boolean algebra, and

    pages 383 on, where he pays close attention to translation back and forth between Boolean algebra and English.

  3. Kent, William. “Attributes / Types and Categories and Sets / Models.” In Data and Reality, 77–94. Amsterdam: North-Holland, 1978. PDF.
    5,400 words
    Reading tips

    This is an excerpt from one of my favorite books, Data and Reality by Bill Kent. Kent was a computer programmer and database designer at IBM and Hewlett-Packard, during the era when the database technologies we use today were first being developed. He thought deeply and carefully about the challenges of data modeling and management, which he recognized were not primarily technical challenges.

    The fixed-width typewriter font makes this reading look old-fashioned, but nothing in it is out-of-date. These are precisely the same issues data modelers and “data scientists” struggle with today.

  4. Optional
    Evans, Eric. “Crunching Knowledge.” In Domain-Driven Design. Boston: Addison-Wesley, 2004. PDF.
    3,000 words

Week starting March 22
Learning to distinguish: Bayesian inference

View slides Updated Thursday 3/18 2:49 PM

Total amount of required reading for this week: 18,400 words

Boolean algebra makes it possible to formally specify precise rules for distinguishing between groups of things. Yet it’s often the case that we are able to distinguish between groups of things, yet we cannot precisely specify rules for doing so.

An example is the classification of texts by subject. Grouping together books or journal articles that are about the same things doesn’t seem so difficult, assuming that we can read and understand them. But it turns out to be difficult to precisely specify rules for doing this.

As an alternative one can approach the problem statistically: perhaps there are patterns of correlation between the attributes of texts (for example, the words that appear in them), and the way that we group them by subject. In order to find such patterns, we need some evidence: a collection of texts that have already been classified, which we can then analyze to look for correlations between their attributes and the groups they’ve been assigned to.

Bayesian inference is the mathematical formalization of this process of identifying patterns of correlation based on past evidence, and then applying these patterns to classify new things.

📖 To read before this meeting:

  1. Wilson, Patrick. “Subjects and the Sense of Position.” In Two Kinds of Power, 69–92. Berkeley: University of California Press, 1968. PDF.
    11,900 words
    Reading tips

    In this chapter Patrick Wilson considers the problems that arise when one tries to come up with systematic rules for classifying texts by subject.

    Wilson can be a bit long-winded, but his insights are worth it. (You can skip the very long footnotes, so this reading is actually shorter than it looks.) What Wilson calls a “writing” is more typically referred to as a text. In this chapter he is criticizing the assumptions librarians make when cataloging texts by subject. The “sense of position” in the title of the chapter refers to the librarian’s sense of where in a classification scheme a text should be placed. Although he is talking about library classification, everything Wilson says is also applicable to state-of-the-art machine classification of texts today.

  2. Maron, M. E.“Automatic Indexing: An Experimental Inquiry.” Journal of the ACM 8, no. 3 (July 1961): 404–17. https://doi.org/10.1145/321075.321084.
    6,500 words
    Reading tips

    Bill Maron was an engineer at missile manufacturer Ramo-Wooldridge when he began investigating statistical methods for classifying and retrieving documents. In this paper he describes a method for statistically modeling the subject matter of texts. He introduces the basic ideas behind what is now known as a Bayesian classifier, a technique that is still widely used today for a variety of automatic classification tasks from spam filtering to face recognition.

    Trigger warning: math. The math is relatively basic, and if you’ve studied any probability, you should be able to follow it. But if not, just skip it: Maron explains everything important about his experiment in plain English. Pay extra attention to what he says about “clue words.”

  3. Optional
    Smucker, Mark D. “Information Representation.” In Interactive Information Seeking, Behaviour and Retrieval, edited by Ian Ruthven and Diane Kelly, 77–93. London: Facet Pub., 2011. PDF.

Week starting March 29
Exam review

There is no new lecture or reading this week. Recitations will focus on review in preparation for the second exam.

March 30
Exam 2 handed out

April 2
Exam 2 due

Selection systems in society

Selection systems produce usable information from masses of material. Boolean algebra and Bayesian are two different formal techniques using which selections systems can be constructed (and of course it is possible to construct systems that combine these two techniques, and possibly other techniques as well).

During the last part of this course, we'll look at selection systems as technologies that function in the context of our broader society.

First, we'll reflect on the relationship between technology and society. Does technological change cause social, political, and cultural change? Or do technologies simply reflect social, political, and cultural practices?

Next, we'll consider the trade-offs between using human and machine labor in selection systems.

Finally, we'll look again at Boolean algebra and Bayesian inference, and consider the broader social consequences of their differences.

Week starting April 5
Selection systems / Technology and society

View slides Updated Thursday 4/1 5:07 PM

Total amount of required reading for this week: 12,700 words

This week we’ll start by reviewing what we’ve learned so far about how selection systems work, before turning to the question of the relationship between technology and society.

There are various positions one might take regarding the relationship between technology and society. Sometimes people talk about technology as an external force that exerts influence on society, pushing us in certain directions. Other times people insist that technologies are “just tools” that can be used in different ways, for better or for worse.

📖 To read before this meeting:

  1. Buckland, Michael, and Christian Plaunt. “On the Construction of Selection Systems.” Library Hi Tech 12, no. 4 (1994): 15–28. PDF.
    8,100 words
    Reading tips

    An examination of the structure and components of information storage and retrieval systems and information filtering systems. Argues that all selection systems can be represented in terms of combinations of a set of basic components. The components are of only two types: representations of data objects and functions that operate on them.

  2. Slack, Jennifer Daryl, and J. Macgregor Wise. “Determinism.” In Culture and Technology: A Primer, 2nd ed., 49–57. Peter Lang, 2014. https://ebookcentral.proquest.com/lib/unc/reader.action?docID=2011077&ppg=65.
    4,600 words
  3. Optional
    Campanella, Thomas J. “Robert Moses and His Racist Parkway, Explained.” Bloomberg CityLab, July 9, 2017. https://web.archive.org/web/20200719205411/https://www.bloomberg.com/news/articles/2017-07-09/robert-moses-and-his-racist-parkway-explained.
    1,400 words
    Reading tips

    The parkway bridges of Long Island, built by city planner Robert Moses, provide an illustrative example of how people talk about technology and society.

Week starting April 12
Selection labor by people and machines

View slides Updated Friday 4/9 1:58 PM

Total amount of required reading for this week: 20,200 words

Selecting usable information from a mass of material involves labor. This week we’ll consider the question of automation: what kinds of labor can be done by people, and what kinds can be done by machines? What kinds of labor should be done by people, and what kinds should be done by machines?

📖 To read before this meeting:

  1. Roberts, Sarah H. “Understanding Commercial Content Moderation.” In Behind The Screen, 33–72. New Haven: Yale University Press, 2019. PDF.
    20,200 words
    Reading tips

    In this chapter from her book Behind the Screen, Sarah Roberts provides an overview of commercial content moderation at companies like Facebook. She explains what commercial content moderation is, who does it, and the conditions under which they work.

  2. Optional
    Seligman, Ben B. “The Social Cost of Cybernation.” In The Evolving Society: The Proceedings of the First Annual Conference on the Cybercultural Revolution—Cybernetics and Automation, edited by Alice Mary Hilton, 159–66. New York: Institute for Cybercultural Research, 1966. PDF.
    2,600 words
  3. Optional
    Boggs, James. “The Negro and Cybernation.” In The Evolving Society: The Proceedings of the First Annual Conference on the Cybercultural Revolution—Cybernetics and Automation, edited by Alice Mary Hilton, 167–72. New York: Institute for Cybercultural Research, 1966. PDF.
    1,900 words

April 12
Investigation proposals due this week

Week starting April 19
Comparing selection techniques

View slides Updated Friday 4/16 2:55 PM

Total amount of required reading for this week: 13,600 words

Boolean algebra and Bayesian inference are two different—possibly complementary—techniques for building selection systems. We’ve looked at how these techniques work in the abstract, but what social consequences do they have?

📖 To read before this meeting:

  1. Hjørland, Birger. “Classical Databases and Knowledge Organization: A Case for Boolean Retrieval and Human Decision-Making during Searches.” Journal of the Association for Information Science and Technology 66, no. 8 (August 1, 2015): 1559–75. PDF.
    2,800 words
    Reading tips

    This is an excerpt from an article arguing that, though they are perceived as outdated, selection systems based on Boolean algebra (more commonly referred to as Boolean retrieval systems) are preferable for some purposes because they offer more opportunities for human decision-making during searches.

  2. Rieder, Bernhard. “Interested Learning.” In Engines of Order, 235–64. Amsterdam: Amsterdam University Press, 2020. PDF.
    10,800 words
    Reading tips

    This reading scrutinizes Bill Maron’s Bayesian classifier, identifying it as an example of a technique that is now applied for many purposes that differ quite a bit from Maron’s.

Week starting April 26
Last week of recitations

April 26
Progress reports on investigative findings

May 12
Selection system investigation due