Foundations of Information Science

UNC SILS, INLS 201, Fall 2018

August 21
Introduction

View slides Updated Friday 2/27 4:03 PM

Today we’ll meet each other, and I’ll go over the syllabus, class policies, and how to use the course website. You’ll also tell me a little about yourself, and we’ll probably finish early.

After class is over, I’ll post any slides I showed to this website, and (if you are logged in) you will see a link to a PDF of them below.

August 23
Document and evidence

View slides Updated Friday 2/27 4:03 PM

Total amount of required reading for this meeting: 5,400 words

Our lives and our societies are structured by, and constituted through, documents—and this has been true for a long time.

Today’s reading is the second chapter of Michael Buckland’s book on Information and Society. Buckland is a professor at the Berkeley School of Information, and he was my doctoral advisor.

📖 To read before this meeting:

Buckland, Michael. “Document and Evidence.” In Information and Society, 21–49. MIT Press, 2017. PDF.

5,400 words

August 28
The conduit metaphor

View slides Updated Friday 2/27 4:03 PM

The way we talk about communication, and particularly communication via documents, does not reflect how communication actually works. It’s good to keep this in mind to avoid some common traps of thinking about information.

📖 To read before this meeting:

Reddy, Michael. “The Conduit Metaphor.” In Metaphor and Thought, edited by Andrew Ortony, 284–310. Cambridge University Press, 1980. PDF.

August 30
Genres of information

View slides Updated Friday 2/27 4:03 PM

Total amount of required reading for this meeting: 4,600 words

What is information? We hear the word a lot, but it’s surprisingly hard to pin down what it means. For today we’ll read an article that attempts to explain why, written by the computer-scientist-turned-information-scholar Philip Agre. Agre is an advocate of what he calls “critical technical practice,” which he suggests requires cultivating a “split identity” as both a problem-solving engineer and problem-finding critic. In this article, Agre brings that technically-informed critical perspective to bear on the idea of “information.”

📖 To read before this meeting:

Agre, Philip E. “Institutional Circuitry: Thinking about the Forms and Uses of Information.” Information Technology and Libraries 14, no. 4 (December 1995): 225. PDF.

4,600 words

September 4
The information universe

View slides Updated Friday 2/27 4:03 PM

When Google asserts that its mission is to organize the world’s information, to what is it referring? What does “the world’s information” consist of?

Philosopher of librarianship Patrick Wilson wrestled with the same question in 1968, when he attempted to define the limits of “the bibliographical universe.”

📖 To read before this meeting:

Wilson, Patrick. “The Bibliographical Universe.” In Two Kinds of Power, 6–19. Berkeley: University of California Press, 1968. PDF.

September 6
Information science?

View slides Updated Friday 2/27 4:03 PM

Total amount of required reading for this meeting: 2,500 words

Can there be a science of information? It depends on what you mean by “science,” and also on what you mean by “information.”

Information scientist Marcia Bates often, and influentially, reflected on the nature of both information and information science. In 1999 she argued that information science should be understood as a “meta-science.”

📖 To read before this meeting:

Bates, Marcia J. “The Invisible Substrate of Information Science.” Journal of the American Society for Information Science; New York 50, no. 12 (October 1999): 1043–50. PDF.

2,500 words

September 11
Thinking with our eyes and hands

View slides Updated Friday 2/27 4:03 PM

Total amount of required reading for this meeting: 9,200 words

For today we’ll read an article by Bruno Latour, a French philosopher, anthropologist and sociologist. Latour wrote this article to persuade his colleagues in the social sciences that they need to pay more attention to documents and processes of documentation.

📖 To read before this meeting:

Latour, Bruno. “Visualisation and Cognition: Thinking with Eyes and Hands.” Knowledge and Society: Studies in the Sociology of Culture Past and Present 6 (1986): 1–40. PDF.

9,200 words

Reading tips

Latour uses some unusual terminology in this article. He refers to documents as inscriptions and practices of documentation as inscription procedures. He also refers to documents as immutable mobiles, highlighting what he considers to be two of their most important qualities: immutability and mobility.

Latour is interested in the relationship between practices of documentation and thinking (cognition). His basic argument is that what may seem like great advances in thought are actually better understood as the emergence of new practices of documentation. Latour focuses primarily on documents as aids to visualization rather than as carriers of information. Thus he begins by discussing the emergence of new visualization techniques, such as linear perspective.

September 13

Hurricane Florence

September 18
Explaining how documents work

View slides Updated Friday 2/27 4:03 PM

Total amount of required reading for this meeting: 5,100 words

One way to explain how communication through documents works is to focus on “signs”: how signs are organized into codes or languages, and how signs and codes operate in our broader culture. The study of signs and processes involving signs is known as “semiotics.”

📖 To read before this meeting:

Johansen, Jørgen Dines, and Svend Erik Larsen. “Code and Structure: From Difference to Meaning.” In Signs in Use, 7–23. Routledge, 2002. https://ebookcentral.proquest.com/lib/unc/reader.action?docID=240576&ppg=16.

5,100 words

Reading tips

Pay attention to the italicized terms in this reading. You may find it useful to look them up in the book’s glossary (see the optional reading below).

If you find yourself getting confused, try to focus on understanding how “codes” work in the examples of the tic-tac-toe game and the pedestrian trying to cross at an intersection.

You can stop reading this chapter on page 16, before the section entitled “Code and structure.” After that point, the chapter gets into some advanced topics that we won’t be covering in this course.

September 20
Explaining how comics work

View slides Updated Friday 2/27 4:03 PM

Cartoonist Scott McCloud uses semiotic analysis to describe how we understand comics.

📖 To read before this meeting:

McCloud, Scott. “The Vocabulary of Comics.” In Understanding Comics, 1st HarperPerennial ed., 24–59. New York: HarperPerennial, 1994. PDF.

September 25
Drawing distinctions

View slides Updated Friday 2/27 4:03 PM

Total amount of required reading for this meeting: 16,900 words

Until now we’ve mainly focused on documents and the marks on them, and how we understand and interpret those marks. This week we change our focus a bit, to look at how our understanding of the world is structured.

We begin with some excerpts from a book by Eviatar Zerubavel about how we categorize and classify the world around us. Zerubavel is a cognitive sociologist, meaning that he studies how social processes shape our thinking, and he’s written a number of fascinating and accessible books on the topic.

📖 To read before this meeting:

Zerubavel, Eviatar. “Introduction / Islands of Meaning / The Great Divide / The Social Lens.” In The Fine Line, 1–17, 21–24, 61–80. New York: Free Press, 1991. PDF.

16,900 words

Reading tips

Eviatar Zerubavel is a cognitive sociologist, meaning that he studies how social processes shape our thinking, and he’s written a number of fascinating and accessible books on the topic. These are selections from his book The Fine Line about making distinctions in everyday life.

September 27
Classifying: drawing distinctions systematically

View slides Updated Friday 2/27 4:03 PM

Total amount of required reading for this meeting: 5,600 words

We all categorize and classify all the time, but we don’t always do it intentionally and systematically. Today we’ll try out a form of systematic classification known as faceted classification.

📖 To read before this meeting:

Hunter, Eric. “What Is Classification? / Classification in an Information System / Faceted Classification.” In Classification Made Simple, 3rd ed. Farnham: Ashgate, 2009. PDF.

5,600 words

September 27

Take-home exam 1 due

October 2
Classifying clouds

View slides Updated Friday 2/27 4:03 PM

Total amount of required reading for this meeting: 10,100 words

Most of us would readily agree that our everyday “folk” classifications are historically contingent and somewhat arbitrary. Yet scientific classification presumably is different: science is the study of reality, and so scientific classifications are “real” in a way that other classifications are not.

Today we’ll discuss historian of science Lorraine Daston‘s history of scientists’ attempts to classify clouds.

📖 To read before this meeting:

Daston, Lorraine. “Cloud Physiognomy.” Representations 135, no. 1 (August 1, 2016): 45–71. https://doi.org/10.1525/rep.2016.135.1.45.

10,100 words

October 4
Classifying what writings are about

View slides Updated Friday 2/27 4:03 PM

Total amount of required reading for this meeting: 11,900 words

Clouds are hard to classify, and so are books and other writings—for similar reasons.

📖 To read before this meeting:

Wilson, Patrick. “Subjects and the Sense of Position.” In Two Kinds of Power, 69–92. Berkeley: University of California Press, 1968. PDF.

11,900 words

Reading tips

In this chapter Patrick Wilson considers the problems that arise when one tries to come up with systematic rules for classifying texts by subject.

Wilson can be a bit long-winded, but his insights are worth it. (You can skip the very long footnotes, so this reading is actually shorter than it looks.) What Wilson calls a “writing” is more typically referred to as a text. In this chapter he is criticizing the assumptions librarians make when cataloging texts by subject. The “sense of position” in the title of the chapter refers to the librarian’s sense of where in a classification scheme a text should be placed. Although he is talking about library classification, everything Wilson says is also applicable to state-of-the-art machine classification of texts today.

October 9
Automating semiotic labor

View slides Updated Friday 2/27 4:03 PM

The past couple of weeks we’ve looked at how people categorize, classify, and name things of interest. As we’ve seen, this can be hard work, and like other kinds of hard work, people have sought to escape it through automation.

To what extent can the organization of information be automated? Information scholar Julian Warner looks at this question by drawing a distinction between different kinds of semiotic labor.

📖 To read before this meeting:

Warner, Julian. “Forms of Labour in Information Systems.” Information Research 7, no. 4 (2002). http://www.informationr.net/ir/7-4/paper135.html.

October 11
Computation

View slides Updated Friday 2/27 4:03 PM

People were building systems to automate information organization and retrieval long before the invention of the computer, but the digital computer made possible many techniques that were previously unfeasible. The invention of computing also gave birth to a theory of computation, which gives us a mathematical framework for characterizing and measuring syntactic labor. Today we’ll look at one of the earliest computational techniques to be applied to information organization: Boolean logic.

📖 To read before this meeting:

Hillis, W. “Nuts and Bolts / Universal Building Blocks.” In The Pattern on the Stone, 1–38. New York: Basic Books, 1998. PDF.

October 16
The logic of distinctions and classes

View slides Updated Friday 2/27 4:03 PM

Total amount of required reading for this meeting: 3,400 words

Boolean logic (and ultimately, set theory) is the mathematical formalization upon which many of the techniques of information organization are built. In 1937 Edmund Berkeley, a mathematician working at the Prudential life insurance company, recognized the usefulness of Boolean logic for modeling insurance data—even though at the time there were no digital computers to assist with the calculations, only punched card tabulators.

Berkeley would later go on to be a pioneer of computer science, co-founding the Association for Computing Machinery which is still the primary scholarly association for computer scientists.

📖 To read before this meeting:

Berkeley, Edmund C. “Boolean Algebra (the Technique for Manipulating AND, OR, NOT and Conditions).” The Record 26 part II, no. 54 (1937): 373–414. PDF.

3,400 words

Reading tips

This article is by Edmund Berkeley, a pioneer of computer science and co-founder of the Association for Computing Machinery, which is still the primary scholarly association for computer scientists. But he wrote this article in 1937, before he became a computer scientist—because computers had yet to exist. At the time he was a mathematician working at the Prudential life insurance company, where he recognized the usefulness of Boolean algebra for modeling insurance data. He published this article in a professional journal for actuaries (people who compile and analyze statistics and use them to calculate insurance risks and premiums).

Berkeley uses some frightening-looking mathematical notation in parts of this article, but everything he discusses is actually quite simple. The most important parts are:

pages 373–374, where he gives a simple explanation of Boolean algebra,

pages 380–381, where he considers practical applications of Boolean algebra, and

pages 383 on, where he pays close attention to translation back and forth between Boolean algebra and English.

October 18

Fall break

October 23

Ryan was sick

October 25
Boolean retrieval

View slides Updated Friday 2/27 4:03 PM

When using Boolean retrieval, we treat writings as simple sets of words. This allows us to obtain lists of writings in response to queries consisting of words combined with the operators AND, OR, and NOT.

📖 To read before this meeting:

Manning, Christopher D, Prabhakar Raghavan, and Hinrich Schütze. “Boolean Retrieval.” In Introduction to Information Retrieval. Cambridge, UK: Cambridge University Press, 2008. http://nlp.stanford.edu/IR-book/pdf/01bool.pdf.

Reading tips

Introduces inverted indexes and shows how simple Boolean queries can be processed using such indexes.

October 30
A case for Boolean retrieval

View slides Updated Friday 2/27 4:03 PM

Total amount of required reading for this meeting: 2,800 words

Boolean retrieval is sometimes characterized as hopelessly outdated. But there is something to be said for the division of labor—and power—that Boolean retrieval organizes.

📖 To read before this meeting:

Hjørland, Birger. “Classical Databases and Knowledge Organization: A Case for Boolean Retrieval and Human Decision-Making during Searches.” Journal of the Association for Information Science and Technology 66, no. 8 (August 1, 2015): 1559–75. PDF.

2,800 words

Reading tips

This is an excerpt from an article arguing that, though they are perceived as outdated, selection systems based on Boolean algebra (more commonly referred to as Boolean retrieval systems) are preferable for some purposes because they offer more opportunities for human decision-making during searches.

November 1
Probability and inductive logic

Information science took a major turn when the designers of information retrieval systems began to explore the statistical modeling of language.

Statistics is hard. Most people don’t intuitively understand probability, including me, and including the vast majority of scientists who rely on statistical methods. So today we’ll review some of the basics, so we know just enough to be dangerous.

📖 To read before this meeting:

Hacking, Ian. An Introduction to Probability and Inductive Logic. Cambridge: Cambridge University Press, 2001. PDF.

November 6

No water

November 8
Selection systems and automatic classification

View slides Updated Friday 2/27 4:03 PM

Total amount of required reading for this meeting: 6,500 words

The shift to statistical modeling in information science can be traced to the work of Bill Maron. Maron was an engineer at missile manufacturer Ramo-Wooldridge when he began investigating statistical methods for classifying and retrieving documents. For today we’ll read a classic paper of Maron’s in which he develops the basic ideas behind the Bayesian classifier, a technique that is still widely used today for a variety of automatic classification tasks from spam filtering to face recognition.

📖 To read before this meeting:

Maron, M. E.“Automatic Indexing: An Experimental Inquiry.” Journal of the ACM 8, no. 3 (July 1961): 404–17. https://doi.org/10.1145/321075.321084.

6,500 words

Reading tips

Bill Maron was an engineer at missile manufacturer Ramo-Wooldridge when he began investigating statistical methods for classifying and retrieving documents. In this paper he describes a method for statistically modeling the subject matter of texts. He introduces the basic ideas behind what is now known as a Bayesian classifier, a technique that is still widely used today for a variety of automatic classification tasks from spam filtering to face recognition.

Trigger warning: math. The math is relatively basic, and if you’ve studied any probability, you should be able to follow it. But if not, just skip it: Maron explains everything important about his experiment in plain English. Pay extra attention to what he says about “clue words.”

November 13
Evaluating selection systems

View slides Updated Friday 2/27 4:03 PM

A selection system is any system that selects things from a larger collection. Selection systems may involve varying amounts of automated labor. Deciding whether and how to automate requires some way to evaluate the effects of automation on the quality of the selection system.

📖 To read before this meeting:

Buckland, Michael. “Evaluation of Selection Methods.” In Information and Society, 153–164. MIT Press, 2017. PDF.

November 13

Take-home exam 2 due

November 15

Ryan at UCLA

November 20
Scrutinizing selection systems

View slides Updated Friday 2/27 4:03 PM

Total amount of required reading for this meeting: 10,300 words

For the remainder of the semester we’ll be “scrutinizing” some of the selection systems currently organizing us. Bernhard Rieder gets us started by scrutinizing Maron’s Bayes classifier.

📖 To read before this meeting:

Rieder, Bernhard. “Scrutinizing an Algorithmic Technique: The Bayes Classifier as Interested Reading of Reality.” Information, Communication & Society 20, no. 1 (January 2, 2017): 100–117. https://doi.org/10.1080/1369118X.2016.1181195.

10,300 words

November 22

Thanksgiving

November 27
Scrutinizing recommendation systems

View slides Updated Friday 2/27 4:03 PM

What are the consequences of the shift from 1) information systems that allow us to precisely specify the properties of the things we seek, to 2) information systems that attempt to anticipate our needs or desires and recommend things to us? If a YouTube video, a search result, a fashion brand, a scientific paper, or a restaurant that people discover via a recommendation service becomes popular and successful, is it because that video, result, brand, paper, or restaurant is of high quality, or is it perhaps due in part to the way the recommendation service works? Sociologists Matthew Salganik and Duncan Watts sought to investigate this question by building their own streaming music service.

📖 To read before this meeting:

Matthew J. Salganik, and Duncan J. Watts. “Leading the Herd Astray: An Experimental Study of Self-Fulfilling Prophecies in an Artificial Cultural Market.” Social Psychology Quarterly 71, no. 4 (December 1, 2008): 338–55. https://doi.org/10.1177/019027250807100404.

November 29
Scrutinizing large-scale recommendation systems

View slides Updated Friday 2/27 4:03 PM

All selection systems, including recommendation systems, organize how we think about the things selected or recommended. But only a few selection systems become so large-scale that they begin to organize the production of those things. One of these few is YouTube.

📖 To read before this meeting:

Bridle, James. “Something Is Wrong on the Internet.” James Bridle, November 6, 2017. https://medium.com/@jamesbridle/something-is-wrong-on-the-internet-c39c471271d2.

December 4
Scrutinizing prediction systems

View slides Updated Friday 2/27 4:03 PM

Recommendation can be viewed as a prediction problem: the system is trying to predict whether a given user will like, or find relevant, or be satisfied with some thing. If it predicts that the user will like it, or find it relevant, or be satisfied with it, then it recommends it to the user, otherwise it does not.

Thus the same techniques used for automatic recommendation can be used to automate other kinds of decisions, such as whether to grant bail to a criminal defendant.

📖 To read before this meeting:

Angwin, Julia, Jeff Larson, Surya Mattu, and Lauren Kirchner. “Machine Bias.” Text/html. ProPublica, May 23, 2016. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing.

December 11
Last take-home exam due

December 11

Take-home exam 3 due

August 21 Introduction

August 23 Document and evidence

August 28 The conduit metaphor

August 30 Genres of information

September 4 The information universe

September 6 Information science?

September 11 Thinking with our eyes and hands

September 13 Hurricane Florence

September 18 Explaining how documents work

September 20 Explaining how comics work

September 25 Drawing distinctions

September 27 Classifying: drawing distinctions systematically

September 27 Take-home exam 1 due

October 2 Classifying clouds

October 4 Classifying what writings are about

October 9 Automating semiotic labor

October 11 Computation

October 16 The logic of distinctions and classes

October 18 Fall break

October 23 Ryan was sick

October 25 Boolean retrieval

October 30 A case for Boolean retrieval

November 1 Probability and inductive logic

November 6 No water

November 8 Selection systems and automatic classification

November 13 Evaluating selection systems

November 13 Take-home exam 2 due

November 15 Ryan at UCLA

November 20 Scrutinizing selection systems

November 22 Thanksgiving

November 27 Scrutinizing recommendation systems

November 29 Scrutinizing large-scale recommendation systems

December 4 Scrutinizing prediction systems

December 11 Last take-home exam due

December 11 Take-home exam 3 due

August 21
Introduction

August 23
Document and evidence

August 28
The conduit metaphor

August 30
Genres of information

September 4
The information universe

September 6
Information science?

September 11
Thinking with our eyes and hands

September 13
Hurricane Florence

September 18
Explaining how documents work

September 20
Explaining how comics work

September 25
Drawing distinctions

September 27
Classifying: drawing distinctions systematically

September 27
Take-home exam 1 due

October 2
Classifying clouds

October 4
Classifying what writings are about

October 9
Automating semiotic labor

October 11
Computation

October 16
The logic of distinctions and classes

October 18
Fall break

October 23
Ryan was sick

October 25
Boolean retrieval

October 30
A case for Boolean retrieval

November 1
Probability and inductive logic

November 6
No water

November 8
Selection systems and automatic classification

November 13
Evaluating selection systems

November 13
Take-home exam 2 due

November 15
Ryan at UCLA

November 20
Scrutinizing selection systems

November 22
Thanksgiving

November 27
Scrutinizing recommendation systems

November 29
Scrutinizing large-scale recommendation systems

December 4
Scrutinizing prediction systems

December 11
Last take-home exam due

December 11
Take-home exam 3 due