Foundations of Information Science

UNC School of Information and Library Science, INLS 201, Spring 2018

January 11
Introduction

Today we’ll meet each other, and I’ll explain the plan for the class and how to use the course website. Finally we’ll try out our federated wiki.

If you feel like it, check out the federated wiki videos.

January 16
Genres of information

This class is about the study, or science, of information. OK, but what is information? We hear the word a lot, but it’s surprisingly hard to pin down what it means. For today we’ll read an article that attempts to explain why, written by the computer-scientist-turned-information-scholar Philip Agre. Agre is an advocate of what he calls “critical technical practice,” which he suggests requires cultivating a “split identity” as both a problem-solving engineer and problem-finding critic. In this article, Agre brings that technically-informed critical perspective to bear on the idea of “information.”

To read before this class:

  1. Agre, Philip E. “Institutional Circuitry: Thinking about the Forms and Uses of Information.” Information Technology and Libraries; Chicago 14, no. 4 (December 1995): 225. https://search.proquest.com/docview/215834010/abstract/D4ABCDE862CC4B56PQ/2.

January 18
Document society

Our lives and our societies are structured by and constituted through documents. We’ll look at some examples.

Today’s reading is the first chapter of Michael Buckland’s book on Information and Society. Buckland is a professor at the Berkeley School of Information, and he was my doctoral advisor.

Optional, but highly recommended, is an excerpt from Alva Noë’s book Strange Tools: Art and Human Nature about how playing baseball requires documents. Noë is a philosopher, also at Berkeley, who writes about human consciousness, neuroscience, and art.

To read before this class:

  1. Buckland, Michael. “Introduction.” In Information and Society, 1–19. MIT Press, 2017. PDF.

  2. Noë, Alva. “Art Loops and the Garden of Eden.” In Strange Tools, 29–48. New York: Hill and Wang, a division of Farrar, Straus and Giroux, 2015. PDF.

January 23
Thinking with our eyes and hands

For today we’ll read an article by Bruno Latour, a French philosopher, anthropologist and sociologist. Latour wrote this article to persuade his colleagues in the social sciences that they need to pay more attention to documents and processes of documentation.

This is the first of our more difficult readings, which will mostly be assigned for Tuesdays, giving you five days to read them. On the Thursdays before, I will give you some tips for reading these slightly more difficult texts.

To read before this class:

  1. Latour, Bruno. “Visualisation and Cognition: Thinking with Eyes and Hands.” Knowledge and Society: Studies in the Sociology of Culture Past and Present 6 (1986): 1–40. PDF.

    This is a long article, and can be difficult to read in parts. If you’re struggling, focus on sections IIIV, especially Section IV where he summarizes his arguments.

January 25
Information theory

As we began to communicate by through wires and over radio waves, engineers sought to understand and describe how it happens, in order to design better communication systems. Claude Shannon, an engineer who worked at Bell Labs, developed an influential theory that came to be known as “information theory.” Today we’ll investigate some of the phenomena he described.

Before class you should read the excerpt from Edgar Allen Poe’s The Gold-Bug, and optionally you may also read a short historical account of the development of Shannon’s theory by science writer James Gleick.

To read before this class:

  1. Poe, Edgar Allan. “The Cryptograph / The Solution Begun / The Cipher Read.” In The Gold Bug. Chicago, New York [etc.] Rand, McNally & Company, 1902. http://archive.org/details/goldbug00poee_1. PDF.

  2. Gleick, James. “Information Theory.” In The Information, 1st ed., 204–32. New York: Pantheon Books, 2011. PDF.

January 30
Meaning, signs and codes

Another approach to understanding communication through documents (in addition to Shannon’s theory) is to focus on “signs,” the organization of signs into codes or languages, and the cultures within which signs and codes operate. This approach is known as semiotics. Media scholar John Fiske provides a good basic explanation of what semiotics is and how it differs from information theory.

To read before this class:

  1. Fiske, John. “Communication Theory / Meanings, Signs, and Codes.” In Introduction to Communication Studies, 2nd ed., 6–12, 39–46, 56–58, 64–65. London ; New York: Routledge, 1990. PDF.

February 1
Understanding graphics and images

Semiotics, the study of signs, isn’t limited to texts: we can also use it to describe how we understand graphics and images. Cartoonist Scott McCloud shows how.

To read before this class:

  1. McCloud, Scott. “The Vocabulary of Comics.” In Understanding Comics, 1st HarperPerennial ed., 24–59. New York: HarperPerennial, 1994. PDF.

February 6
Making distinctions

Until now we’ve mainly focused on documents and the marks on them, and how we understand and interpret those marks. This week we change our focus a bit, to look at how our understanding of the world is structured.

We begin with some excerpts from a book by Eviatar Zerubavel about how we categorize and classify the world around us. Zerubavel is a cognitive sociologist, meaning that he studies how social processes shape our thinking, and he’s written a number of fascinating and accessible books on the topic.

To read before this class:

  1. Zerubavel, Eviatar. “Introduction / Islands of Meaning / The Great Divide / The Social Lens.” In The Fine Line, 1–17, 21–24, 61–80. New York: Free Press, 1991. PDF.

February 8
Classification in everyday life

We all categorize and classify all the time, but we don’t always do it intentionally and systematically. Today we’ll try out a form of systematic classification known as faceted classification.

To read before this class:

  1. Hunter, Eric. “What Is Classification? / Classification in an Information System / Faceted Classification.” In Classification Made Simple, 3rd ed. Farnham: Ashgate, 2009. PDF.

February 13
Scientific classification

Most of us would readily agree that our everyday “folk” classifications are historically contingent and somewhat arbitrary. Yet scientific classification presumably is different: science is the study of reality, and so scientific classifications are “real” in a way that other classifications are not. Today we’ll discuss the extent to which this is true.

The required reading is by Lorraine Daston, a historian of science. She traces the history of scientists’ attempts to classify clouds.

Optionally, you may also read a short (1.5 pages) article on scientific classification by the philosopher of science John Dupré.

To read before this class:

  1. Daston, Lorraine. “Cloud Physiognomy.” Representations 135, no. 1 (August 1, 2016): 45–71. https://doi.org/10.1525/rep.2016.135.1.45.

  2. Dupré, John. “Scientific Classification.” Theory, Culture & Society 23, no. 2–3 (May 1, 2006): 30–32. https://doi.org/10.1177/026327640602300201.

February 15
Naming

We can’t talk or write about things or kinds of things without giving them names. Unfortunately naming isn’t as easy as it sometimes may seem. Today we’ll investigate the difficulties of agreeing on names.

The required reading is another chapter from Buckland’s Information and Society, this time on the topic of naming.

If you have time, I also highly recommend the second book chapter on naming, by Bill Kent. Kent was a computer programmer and database designer at IBM and Hewlett-Packard, during the era when the database technologies we use today were first being developed. He thought deeply and carefully about the challenges of data management, which he recognized were not primarily technical challenges.

To read before this class:

  1. Buckland, Michael. “Naming.” In Information and Society, 89–110. MIT Press, 2017. PDF.

  2. Kent, William. “Naming.” In Data and Reality, 41–61. Amsterdam: North-Holland, 1978. PDF.

February 20
Automation

The past couple of weeks we’ve looked at how people categorize, classify, and name things of interest. As we’ve seen, this can be hard work, and like other kinds of hard work, people have sought to escape it through automation.

To what extent can the organization of information be automated? Information scholar Julian Warner looks at this question by drawing a distinction between different kinds of semiotic labor.

To read before this class:

  1. Warner, Julian. “Forms of Labour in Information Systems.” Information Research 7, no. 4 (2002). http://www.informationr.net/ir/7-4/paper135.html.

February 22
Computation

People were building systems to automate information organization and retrieval long before the invention of the computer, but the digital computer made possible many techniques that were previously unfeasible. The invention of computing also gave birth to a theory of computation, which gives us a mathematical framework for characterizing and measuring syntactic labor. Today we’ll look at one of the earliest computational techniques to be applied to information organization: Boolean logic.

To read before this class:

  1. Hillis, W. “Nuts and Bolts / Universal Building Blocks.” In The Pattern on the Stone, 1–38. New York: Basic Books, 1998. PDF.

February 27
The logic of distinctions and sets

Boolean logic (and ultimately, set theory) is the mathematical formalization upon which many of the techniques of information organization are built. In 1937 Edmund Berkeley, a mathematician working at the Prudential life insurance company, recognized the usefulness of Boolean logic for modeling insurance data—even though at the time there were no digital computers to assist with the calculations, only punched card tabulators.

Berkeley would later go on to be a pioneer of computer science, co-founding the Association for Computing Machinery which is still the primary scholarly association for computer scientists.

To read before this class:

  1. Berkeley, Edmund C. “Boolean Algebra (the Technique for Manipulating AND, OR, NOT and Conditions).” The Record 26 part II, no. 54 (1937): 373–414. PDF.

March 1
Ryan is at US2TS

No class.

March 6
Two minute madness

Assignment #1 Midterm class presentation  due

Assignment #2 Midterm paper  due

Today your midterm papers are due, and each of you will give a two minute, one slide presentation briefly explaining the topic of your paper.

March 8
Midterm exam

The midterm exam will be given in class, and it will cover all the concepts we’ve discussed so far.

March 13
Spring break

No class.

March 15
Spring break

No class.

March 20
Correctness

In computer science, correctness refers to the degree of correspondence between what a computer program actually does, and what it is supposed to do. A “correct” program is one that does what it is supposed to. But what is a computer program “supposed” to do? It may be relatively straightforward to check that a program is correct with respect to a formal model or specification—but there is still the problem of whether that formal model corresponds with the understandings of reality that the program’s designers and users have. Philosopher and computer scientist Brian Cantwell Smith considers these issues in a paper presented to International Physicians for the Prevention of Nuclear War.

To read before this class:

  1. Smith, Brian Cantwell. “The Limits of Correctness.” In Symposium on Unintentional Nuclear War, Fifth Congress of the International Physicians for the Prevention of Nuclear War. Budapest, 1985. PDF.

March 22
Statistical models

Information science took a major turn when the designers of information retrieval systems for the military and weapons manufacturers began to explore how to automatically classify and index texts. These explorations led to a new form of modeling: the statistical modeling of language. Once we had the ability to create texts digitally and to digitize existing texts, we could use these texts to build statistical language models, a process that was greatly accelerated by the advent of the World Wide Web, which made the collection of large numbers of texts much easier than it had been before.

Text just happened to be one of the first kinds of data that we were able to collect large amounts of. But the same techniques used to statistically model language can also be used to model other phenomena—provided that one can collect large amounts of data generated by these other phenomena. Once people began using the Web for all kinds of things beyond publishing texts, these other kinds of data suddenly became available, opening the door to statistical modeling of nearly everything. Data scientist Cathy O’Neil gives an account of our present-day modeling fever.

To read before this class:

  1. O’Neil, Cathy. “Bomb Parts: What Is a Model?” In Weapons of Math Destruction, 15–31. New York: Crown, 2016. PDF.

March 27
Modeling text for computation

Computationally analyzing text first requires representing the text in a form that can be computationally manipulated. This form is quite different from the forms we are used to interpreting as readers.

To read before this class:

  1. Manning, Christopher, Prabhakar Raghavan, and Hinrich Schütze. “Boolean Retrieval / The Term Vocabulary and Postings Lists.” In Introduction to Information Retrieval, 1–34. New York: Cambridge University Press, 2008.

March 29
Probability and inductive logic

Statistics is hard. Most people don’t intuitively understand probability, including me, and including the vast majority of scientists who rely on statistical methods. So today we’ll review some of the basics, so we know just enough to be dangerous.

To read before this class:

  1. Hacking, Ian. An Introduction to Probability and Inductive Logic. Cambridge: Cambridge University Press, 2001. PDF.

April 3
Automatically classifying text

The shift to statistical modeling in information science can be traced to the work of Bill Maron. Maron was an engineer at missile manufacturer Ramo-Wooldridge when he began investigating statistical methods for classifying and retrieving documents. For today we’ll read a classic paper of Maron’s in which he develops the basic ideas behind the Bayesian classifier, a technique that is still widely used today for a variety of automatic classification tasks from spam filtering to face recognition.

To read before this class:

  1. Maron, M. E.“Automatic Indexing: An Experimental Inquiry.” Journal of the ACM 8, no. 3 (July 1961): 404–17. https://doi.org/10.1145/321075.321084.

April 5
Ryan was sick

No class.

April 10
Modeling topics

Topic modeling is a technique for classifying text that does not require one to specify a set of categories ahead of time. For that reason it has become particularly popular among humanities scholars and social scientists interested in exploring large collections of text, such as archival collections or social media platforms. Today we’ll try out some simple topic models.

To read before this class:

  1. Ted Underwood. “Topic Modeling Made Just Simple Enough.” The Stone and the Shell, April 7, 2012. https://tedunderwood.com/2012/04/07/topic-modeling-made-just-simple-enough/.

April 12
Modeling everything

Once a technique for statistical modeling has been developed, it can usually be applied to problems other than those for which it was initially developed. Thus topic modeling, initially developed for the unsupervised classification of text, is easily modified to classify other things like people and organizations.

For today, please read chapter 1 of Applications of Topic Models, “The What and Wherefore of Topic Models.” In addition, please skim one of the following chapters, to get a sense of how topic modeling gets used: “Historical Documents,” “Understanding Scientific Publications,” “Fiction and Literature,” and “Computational Social Science”.

To read before this class:

  1. Boyd-Graber, Jordan, Yuening Hu, and David Mimno. “Applications of Topic Models.” Foundations and Trends in Information Retrieval 11, no. 2–3 (July 20, 2017): 143–296. https://doi.org/10.1561/1500000030.

April 17
The impact of recommendation

What are the consequences of the shift from 1) information systems that allow us to precisely specify the properties of the things we seek, to 2) information systems that attempt to anticipate our needs or desires and recommend things to us? If a YouTube video, a search result, a fashion brand, a scientific paper, or a restaurant that people discover via a recommendation service becomes popular and successful, is it because that video, result, brand, paper, or restaurant is of high quality, or is it perhaps due in part to the way the recommendation service works? Sociologists Matthew Salganik and Duncan Watts sought to investigate this question by building their own streaming music service.

To read before this class:

  1. Matthew J. Salganik, and Duncan J. Watts. “Leading the Herd Astray: An Experimental Study of Self-Fulfilling Prophecies in an Artificial Cultural Market.” Social Psychology Quarterly 71, no. 4 (December 1, 2008): 338–55. https://doi.org/10.1177/019027250807100404.

April 19
Gaming recommendations

There is reason to believe that recommendation services which rely on historical data are biased toward popular items, creating a “rich-get-richer” effect. This can also result in an overall homogenization of consumption—less overall diversity in what people read, watch, buy, eat, etc. This can be true even if individuals find that their use of recommendation services is introducing them to new things!

But a separate issue is that recommendation services which rely on historical data may be fooled into believing that unpopular items are actually popular. In other words, the services can be “gamed” by small groups who are strongly motivated to make something seem popular, in the hopes that this will become a self-fulfilling prophecy.

To read before this class:

  1. Butler, Oobah. “I Made My Shed the Top Rated Restaurant On TripAdvisor.” Vice, December 6, 2017. https://www.vice.com/en_uk/article/434gqw/i-made-my-shed-the-top-rated-restaurant-on-tripadvisor.

April 24
Human decisions and machine predictions

The powerful techniques that information scientists developed for classifying and ranking texts are now being applied to every aspect of our lives. What effects is this having? How can we determine whether information technologies are aiding our decision-making or harming it? Judges make high-impact life-altering and world-altering decisions daily. One kind of high-impact decision judges make is whether to grant bail to persons accused of crimes. What is the potential impact of judges being guided in these decisions by algorithms trained on historical data?

To read before this class:

  1. Kleinberg, Jon, Himabindu Lakkaraju, Jure Leskovec, Jens Ludwig, and Sendhil Mullainathan. “Human Decisions and Machine Predictions.” Working Paper. National Bureau of Economic Research, February 2017. https://doi.org/10.3386/w23180.

April 26
Looking back / looking ahead

Assignment #3 Final paper  due

Today your final papers are due. We’ll review the ground we covered this semester and look ahead to more advanced information science classes, and information science careers.

May 7
Final exam

The final exam is scheduled for 12 noon on Monday, May 7. It will cover all the concepts from this course.