Foundations of Information Science
UNC SILS, INLS 201, Spring 2022
January 11
Recitations begin meeting
All recitation sections will begin meeting this week. See the recitation schedule.
Note that all recitations will meet online via Zoom for the first three weeks of the semester.
Week starting
January 11
First meeting
View
slides
Our first meeting of the semester is at 9:30AM on Tuesday January 11th via Zoom (see the announcements on Sakai for the password).
During our first meeting the instructors will introduce ourselves, and we’ll go over the structure of the course, access to resources such as the weekly readings, and guidelines for success.
It is highly recommended that you (virtually) attend this meeting “live,” so that you can ask any questions you may have. However if you are unable to attend due to a time conflict, it will be recorded.
What is information?
Information is the result of a process that begins with a bunch of meaningful stuff and ends with something usable. The “bunch of meaningful stuff” can be practically anything: words on pages, recorded sounds, photographic images, tables of numbers, records of transactions, 3D models… the list goes on. The “something usable” is what we call information.
The process that leads from “stuff” to “information” can vary widely, but in this course we’ll focus on situations in which there is “too much stuff” for one person to carry out the process on their own. When that’s the case, people need to work together and construct systems to carry out the process of producing usable information. We’ll call systems that select usable information out of a mass of too much stuff “selecting systems.”
In the first part of the course we’ll introduce some concepts and terminology that will allow us to be a bit more precise when thinking and talking about what it is that selecting systems do.
Week starting
January 18
Meaningful stuff: documents
View
slides
Total amount of required reading for this week: 9,200 words
There’s a lot of stuff in the world, but only some of it is meaningful. If you see someone eating soup, you’re unlikely to ask them, “What does that soup mean?” On the other hand, if you see someone painting a mural, asking them what it means would be perfectly appropriate.
Not all meaningful stuff sticks around. Spoken words or messages written in sand will disappear without a trace unless they are somehow captured and made persistent. We’ll refer to persistent, meaningful stuff as “documents.”
The word “documents” might bring to mind things like college applications or tax forms. Those are certainly documents, but many other things can be documents too: photographs, pop songs, tweets, video games, even zoo animals. What makes some thing a document is not some special property it has, but the way it is used: how it is created, exchanged, understood, modified, collected, described, stored, etc.
A selecting system consists of various operations involving documents: collecting and creating them, transforming them, and arranging them.
📖 To read before this meeting:
-
Buckland, Michael. “Introduction.” In Information and Society, 1–19. MIT Press, 2017. PDF.
-
Buckland, Michael. “Document and Evidence.” In Information and Society, 21–49. MIT Press, 2017. PDF.
-
OptionalBrown, John Seely, and Paul Duguid. “Reading the Background.” In The Social Life of Information, 173–205. Boston: Harvard Business School Press, 2000. PDF.
Reading tips
In this chapter from their book The Social Life of Information, John Seely Brown and Paul Duguid explain why despite 50+ years of digital computers and networks, we still use a lot of paper documents.
Week starting
January 25
Producing meaning: semiosis
View
slides
Total amount of required reading for this week: 6,500 words
Documents are persistent meaningful stuff. The process through which something comes to have meaning is known as semiosis, and semiotics is the study of that process.
Semiotics does not provide a theory for explaining semiosis. What it provides are conceptual tools for thinking more precisely about the production of meaning.
Selecting systems take a bunch of meaningful stuff as input and produce as output usable information—the meaning of which is somehow related to the meaning of the stuff that was input. Semiotic concepts are thus particularly useful for thinking about what selecting systems do.
📖 To read before this meeting:
-
Shaw, Ryan. “Semiosis at an Intersection.” In Selecting Systems, 2021. PDF.
-
Shaw, Ryan. “Semiosis on the Front Page.” In Selecting Systems, 2021. PDF.
-
OptionalDaylight, Russell. “The Semiotic Abstraction.” Semiotica 2017, no. 218 (January 26, 2017). PDF.
Reading tips
This article compares how the concept of abstraction is understood by computer scientists and semioticians. The author argues that semiotic systems should be understood as “machines for creating differences,” of which computers are one kind.
Week starting
February 1
Measuring patterns: information theory
View
slides
Total amount of required reading for this week: 6,600 words
Semiotics provides conceptual tools for analyzing the meaning of “meaningful stuff.” Information theory provides conceptual tools for analyzing the stuff.
Information theory starts from the recognition that in order for stuff to be potentially meaningful, it has to be patterned in some way. Information theory is the study of those patterns, and it provides mathematical tools for comparing and measuring those patterns.
Those mathematical tools have turned out to be useful for many purposes, including for the construction of selecting systems. But unlike semiotics, information theory has nothing at all to say about meaning—it is concerned only with patterns, not with what those patterns might mean.
In other words, “information theory” is a misleading name. The word “information” in “information theory” does not mean “the result of a process that begins with a bunch of meaningful stuff and ends with something usable.” A better name for information theory would be “pattern theory.”
📖 To read before this meeting:
-
Weaver, Warren. “Recent Contributions to The Mathematical Theory of Communication,” September 1949. PDF.
Reading tips
Claude Shannon, an engineer who worked at Bell Labs, developed a mathematical theory of communication that came to be known as “information theory.” The papers in which Shannon developed his theory were originally published in 1948 in two parts in the Bell System Technical Journal. A year later, Warren Weaver published this summary of Shannon’s work.
There is some math in this report. If you’re not mathematically inclined, just skip over it—it isn’t necessary to understand the math in order to understand the basic ideas.
-
Shannon, Claude. “The Bandwagon.” IRE Transactions on Information Theory 2, no. 3 (1956): 3. PDF.
Reading tips
About six years after information theory made its debut, Shannon wrote this one-page editorial.
-
OptionalEckersley, Peter. “A Primer on Information Theory and Privacy.” Electronic Frontier Foundation, August 10, 2020. https://www.eff.org/deeplinks/2010/01/primer-information-theory-and-privacy.
Reading tips
This short article use the information theoretic concept of entropy to explain why it is so easy to identify individual people based on their web browsing activity.
-
OptionalGleick, James. “Information Theory.” In The Information, 1st ed., 204–232. New York: Pantheon Books, 2011. PDF.
Reading tips
This chapter from science writer James Gleick’s book The Information is an engaging mini-biography of Claude Shannon, but it is also an accessible introduction to information theory.
February 4
Exam #1 handed out
Week starting
February 8
Exam #1
There is no new material this week, as you will be working on exam #1.
During our usual lecture time, there will be an open Q&A, before and during which you can submit questions about difficulties you might be having with the exam.
The recitations this week will also focus on discussing and helping each other think about how to answer the exam questions.
February 11
Exam #1 due
Systematically grouping documents
Both semiotics and information theory provide tools for understanding how documents are built up out of groups of more basic meaningful things: a text is a group of words, an image is a group of figures and grounds, an electronic record is a group of keys and values…
A selecting system carries out various operations on these groups: collecting, arranging, and transforming them into new groups (and groups of groups, and groups of groups of groups…). The goal is to select out of a mass of stuff some specific group: the group of videos that will keep you watching, the group of students that are likely to succeed in college, the group of hypotheses consistent with the data.
Designing and implementing a selecting system typically requires:
- The development of a systematic way to define and describe groups, and
- a way to formally describe and reason about operations on those groups.
Building upon what we learned in the first part of the course, in the second part we’ll examine these two requirements.
We’ll start by considering how we draw distinctions and group things as we think and communicate about the world around us, and how the desire to coordinate these activities across broader scales motivates standardization and systematization.
Then, we'll consider and contrast two different ways of formally describing operations on groups: Boolean algebra (deductively describing and reasoning about operations on groups) and Bayesian inference (inductively describing and reasoning about operations on groups).
Week starting
February 15
Establishing systematic groups
View
slides
Total amount of required reading for this week: 10,100 words
Categories are that groups that have names. This week we’ll examine how loose, everyday categories become standardized and systematized into classifications, in order to support some kind of collective action.
For example, scientists seek to develop universal classifications rather than relying on locally-specific categories. Establishing and maintaining universal classifications is difficult, as the history of the scientific classification of clouds demonstrates. It’s not just a matter of agreeing on categories, but also a matter of establishing and documenting observational practices that make clouds classifiable.
Science is not the only institution that seeks to systematically classify things in order to coordinate collective action across great distances and over long periods of time. Law, medicine, trade and finance, engineering—every variety of large-scale coordination has its own techniques of making things classifiable (though we can identify some common features).
📖 To read before this meeting:
-
Daston, Lorraine. “Cloud Physiognomy.” Representations 135, no. 1 (August 1, 2016): 45–71. https://doi.org/10.1525/rep.2016.135.1.45.
-
OptionalDupré, John. “Scientific Classification.” Theory, Culture & Society 23, no. 2–3 (May 1, 2006): 30–32. PDF.
-
OptionalGlushko, Robert J, Paul P Maglio, Teenie Matlock, and Lawrence W Barsalou. “Categorization in the Wild.” Trends in Cognitive Sciences 12, no. 4 (April 2008): 129–35. http://dx.doi.org/10.1016/j.tics.2008.01.007.
Week starting
February 22
Deductively reasoning about groups: Boolean algebra
View
slides
Total amount of required reading for this week: 14,400 words
This week we will look at one common way of formally describing operations on groups: Boolean algebra.
Boolean algebra relies on the following “common-sense” assumptions:
- The world consists of individual objects or entities.
- These entities have attributes that can be counted and described.
- Entities can be sorted into groups based on the presence or absence or values of their attributes.
If we make these assumptions, we can define groups using Boolean algebraic expressions. We can then manipulate these expressions according to the rules of Boolean algebra to deductively reason about operations on those groups (for example combining, intersecting, and negating them).
We call this formal reasoning because it depends only on the forms (the symbols and operators) of the mathematical expressions—the actual groups of things that those symbols represent are irrelevant.
📖 To read before this meeting:
-
Hunter, Eric. “What Is Classification? / Classification in an Information System / Faceted Classification.” In Classification Made Simple, 3rd ed. Farnham: Ashgate, 2009. PDF.
-
Berkeley, Edmund C. “Boolean Algebra (the Technique for Manipulating AND, OR, NOT and Conditions).” The Record 26 part II, no. 54 (1937): 373–414. PDF.
Reading tips
This article is by Edmund Berkeley, a pioneer of computer science and co-founder of the Association for Computing Machinery, which is still the primary scholarly association for computer scientists. But he wrote this article in 1937, before he became a computer scientist—because computers had yet to exist. At the time he was a mathematician working at the Prudential life insurance company, where he recognized the usefulness of Boolean algebra for modeling insurance data. He published this article in a professional journal for actuaries (people who compile and analyze statistics and use them to calculate insurance risks and premiums).
Berkeley uses some frightening-looking mathematical notation in parts of this article, but everything he discusses is actually quite simple. The most important parts are:
pages 373–374, where he gives a simple explanation of Boolean algebra,
pages 380–381, where he considers practical applications of Boolean algebra, and
pages 383 on, where he pays close attention to translation back and forth between Boolean algebra and English.
-
Kent, William. “Attributes / Types and Categories and Sets / Models.” In Data and Reality, 77–94. Amsterdam: North-Holland, 1978. PDF.
Reading tips
This is an excerpt from one of my favorite books, Data and Reality by Bill Kent. Kent was a computer programmer and database designer at IBM and Hewlett-Packard, during the era when the database technologies we use today were first being developed. He thought deeply and carefully about the challenges of data modeling and management, which he recognized were not primarily technical challenges.
The fixed-width typewriter font makes this reading look old-fashioned, but nothing in it is out-of-date. These are precisely the same issues data modelers and “data scientists” struggle with today.
-
OptionalEvans, Eric. “Crunching Knowledge.” In Domain-Driven Design. Boston: Addison-Wesley, 2004. PDF.
Week starting
March 1
Inductively reasoning about groups: Bayesian inference
View
slides
Total amount of required reading for this week: 18,400 words
Boolean algebra makes it possible to formally specify precise rules for grouping. Yet it’s often the case that we are able to distinguish different groups, but we cannot precisely specify rules for doing so.
An example is the grouping of texts by subject. Grouping together books or journal articles that are about the same things doesn’t seem so difficult, assuming that we can read and understand them. But it turns out to be difficult to precisely specify rules for doing this.
As an alternative one can approach the problem statistically: perhaps there are patterns of correlation between the attributes of texts (for example, the words that appear in them) and the way that they are grouped by subject. In order to find such patterns, we need some evidence: a collection of texts that have already been grouped, which we can then analyze to look for correlations between their attributes and the groups they’ve been assigned to.
Bayesian inference is the mathematical formalization of this process of inductively reasoning about groups: identifying patterns of correlation in existing groups, and then applying these patterns to sort new things into those groups.
📖 To read before this meeting:
-
Wilson, Patrick. “Subjects and the Sense of Position.” In Two Kinds of Power, 69–92. Berkeley: University of California Press, 1968. PDF.
Reading tips
In this chapter Patrick Wilson considers the problems that arise when one tries to come up with systematic rules for classifying texts by subject.
Wilson can be a bit long-winded, but his insights are worth it. (You can skip the very long footnotes, so this reading is actually shorter than it looks.) What Wilson calls a “writing” is more typically referred to as a text. In this chapter he is criticizing the assumptions librarians make when cataloging texts by subject. The “sense of position” in the title of the chapter refers to the librarian’s sense of where in a classification scheme a text should be placed. Although he is talking about library classification, everything Wilson says is also applicable to state-of-the-art machine classification of texts today.
-
Maron, M. E.“Automatic Indexing: An Experimental Inquiry.” Journal of the ACM 8, no. 3 (July 1961): 404–17. https://doi.org/10.1145/321075.321084.
Reading tips
Bill Maron was an engineer at missile manufacturer Ramo-Wooldridge when he began investigating statistical methods for classifying and retrieving documents. In this paper he describes a method for statistically modeling the subject matter of texts. He introduces the basic ideas behind what is now known as a Bayesian classifier, a technique that is still widely used today for a variety of automatic classification tasks from spam filtering to face recognition.
Trigger warning: math. The math is relatively basic, and if you’ve studied any probability, you should be able to follow it. But if not, just skip it: Maron explains everything important about his experiment in plain English. Pay extra attention to what he says about “clue words.”
-
OptionalSmucker, Mark D. “Information Representation.” In Interactive Information Seeking, Behaviour and Retrieval, edited by Ian Ruthven and Diane Kelly, 77–93. London: Facet Pub., 2011. PDF.
Week starting
March 8
Exam #2
There is no new lecture or reading this week.
During our usual lecture time, there will be an open Q&A, before and during which you can submit questions about the material we’ve covered during the first two units.
Recitations will focus on review in preparation for the second exam.
March 8
Exam #2 handed out
March 13
Exam #2 due
Week starting
March 15
Spring break
Due to Spring Break neither the lecture nor recitations will meet.
Selecting systems in the wild
During the last part of the course, you and your classmates will work together on identifying and analyzing selecting systems “in the wild.”
We’ll begin by reviewing and refining our model of how selecting systems work by carrying out various operations on groups of documents, collecting, arranging, and transforming them into new groups.
Then we’ll take another look at Boolean algebra and Bayesian inference. We’ll think about how these two different formal techniques for reasoning about groups can be used to produce different kinds of selecting systems.
Next, we'll consider the trade-offs between using human and machine labor in selecting systems.
Finally, we'll reflect on the relationship between selecting systems and society. Do new kinds of selecting systems cause changes in culture, politics, and society? Or do social, political, and cultural norms and practices determine the kind of selecting systems we create?
Week starting
March 22
Selecting systems
View
slides
Total amount of required reading for this week: 8,100 words
This week we’ll look at examples of selecting systems and try to analyze them, reviewing and refining our model of how selecting systems work by carrying out various operations on groups of documents, collecting, arranging, and transforming them into new groups.
This will also be the week that the class splits into teams of investigators, each of which will choose a selecting system to analyze.
📖 To read before this meeting:
-
Buckland, Michael, and Christian Plaunt. “On the Construction of Selection Systems.” Library Hi Tech 12, no. 4 (1994): 15–28. PDF.
Reading tips
An examination of the structure and components of information storage and retrieval systems and information filtering systems. Argues that all selection systems can be represented in terms of combinations of a set of basic components. The components are of only two types: representations of data objects and functions that operate on them.
March 29
Selecting system analysis proposals due
Selecting system analysis proposals must be submitted to your recitation instructor before your recitation meets this week.
Week starting
March 29
Comparing selecting techniques
View
slides
Total amount of required reading for this week: 13,600 words
Boolean algebra and Bayesian inference are two different formal techniques for reasoning about groups. These techniques can be applied to produce different kinds of selecting systems. Why might one technique be used rather than the other? How and why might the two techniques be combined in a selecting system?
📖 To read before this meeting:
-
Hjørland, Birger. “Classical Databases and Knowledge Organization: A Case for Boolean Retrieval and Human Decision-Making during Searches.” Journal of the Association for Information Science and Technology 66, no. 8 (August 1, 2015): 1559–75. PDF.
Reading tips
This is an excerpt from an article arguing that, though they are perceived as outdated, selection systems based on Boolean algebra (more commonly referred to as Boolean retrieval systems) are preferable for some purposes because they offer more opportunities for human decision-making during searches.
-
Rieder, Bernhard. “Interested Learning.” In Engines of Order, 235–64. Amsterdam: Amsterdam University Press, 2020. PDF.
Reading tips
This reading scrutinizes Bill Maron’s Bayesian classifier, identifying it as an example of a technique that is now applied for many purposes that differ quite a bit from Maron’s.
Week starting
April 5
Automation of selecting labor
View
slides
Total amount of required reading for this week: 6,000 words
Selecting usable information from a mass of material involves labor. This week we’ll consider the question of automation: what kinds of selecting labor can be done by people, and what kinds can be done by machines? What kinds of selecting labor should be done by people, and what kinds should be done by machines?
📖 To read before this meeting:
-
Irani, Lilly. “Justice for ‘Data Janitors.’” Public Books, January 15, 2015. https://www.publicbooks.org/zaloom-tribute-2021-justice-for-data-janitors/.
-
Resnikoff, Jason. “How ‘Automation’ Made America Work Harder.” Zócalo Public Square, September 2, 2021. https://www.zocalopublicsquare.org/2021/09/02/automation-revolution-america-labor-work-history/ideas/essay/.
-
OptionalRoberts, Sarah H. “Understanding Commercial Content Moderation.” In Behind The Screen, 33–72. New Haven: Yale University Press, 2019. PDF.
Reading tips
In this chapter from her book Behind the Screen, Sarah Roberts provides an overview of commercial content moderation at companies like Facebook. She explains what commercial content moderation is, who does it, and the conditions under which they work.
-
OptionalSeligman, Ben B. “The Social Cost of Cybernation.” In The Evolving Society: The Proceedings of the First Annual Conference on the Cybercultural Revolution—Cybernetics and Automation, edited by Alice Mary Hilton, 159–66. New York: Institute for Cybercultural Research, 1966. PDF.
-
OptionalBoggs, James. “The Negro and Cybernation.” In The Evolving Society: The Proceedings of the First Annual Conference on the Cybercultural Revolution—Cybernetics and Automation, edited by Alice Mary Hilton, 167–72. New York: Institute for Cybercultural Research, 1966. PDF.
Week starting
April 12
Selecting systems in society
View
slides
There are various positions one might take regarding the relationship between technology and society. Sometimes people talk about technology as an external force that exerts influence on society, pushing us in certain directions. Other times people insist that technologies are “just tools” that can be used in different ways, for better or for worse.
The same questions can be raised about selecting systems. Do new kinds of selecting systems cause changes in culture, politics, and society? Or do social, political, and cultural norms and practices determine the kind of selecting systems we create?
This week’s readings are all optional. We’ll spend the lecture and recitation considering selecting systems at work in the world around us.
📖 To read before this meeting:
-
OptionalSlack, Jennifer Daryl, and J. Macgregor Wise. “Determinism.” In Culture and Technology: A Primer, 2nd ed., 49–57. Peter Lang, 2014. https://ebookcentral.proquest.com/lib/unc/reader.action?docID=2011077&ppg=65.
-
OptionalPinch, Trevor J., and Wiebe E. Bijker. “The Social Construction of Facts and Artefacts: Or How the Sociology of Science and the Sociology of Technology Might Benefit Each Other.” Social Studies of Science 14, no. 3 (1984): 411–428. PDF.
Reading tips
The authors are attacking what they describe as “linear” models of technological development, which focus on a series of “technological breakthroughs” leading inevitably to where we are today. They argue that looking at the actual historical development of a technology like the bicycle shows that what seem in retrospect to be obvious “technological breakthroughs” were not at all obvious at the time.
It may help to consult these pages to get a sense of the different bicycle models discussed in the reading:
-
OptionalWinner, Langdon. “Do Artifacts Have Politics?” Daedalus 109, no. 1 (1980): 121–136. https://www.jstor.org/stable/20024652.
April 19
Selecting system analysis progress reports
During this week’s recitation your group will give a 5–7 minute presentation in class highlighting what progress you’ve made on your selection system analysis.
Week starting
April 19
Progress reports
There is no new lecture or reading this week.
Week starting
April 26
Work on your final project
As classes end this week, neither lecture nor recitations will meet. However, each project group is encouraged to schedule a meeting with one of the instructors to discuss their progress so far and any issues they are having.