Foundations of Information Science

UNC SILS, INLS 201, Spring 2018

January 11
Introduction

View slides Updated Sunday 12/22 12:20 PM

Today we’ll meet each other, and I’ll explain the plan for the class and how to use the course website. Finally we’ll try out our federated wiki.

If you feel like it, check out the federated wiki videos.

January 16
Genres of information

View slides Updated Sunday 12/22 12:20 PM

Total amount of required reading for this meeting: 4,600 words

This class is about the study, or science, of information. OK, but what is information? We hear the word a lot, but it’s surprisingly hard to pin down what it means. For today we’ll read an article that attempts to explain why, written by the computer-scientist-turned-information-scholar Philip Agre. Agre is an advocate of what he calls “critical technical practice,” which he suggests requires cultivating a “split identity” as both a problem-solving engineer and problem-finding critic. In this article, Agre brings that technically-informed critical perspective to bear on the idea of “information.”

📖 To read before this meeting:

  1. Agre, Philip E. “Institutional Circuitry: Thinking about the Forms and Uses of Information.” Information Technology and Libraries 14, no. 4 (December 1995): 225. PDF.
    4,600 words

January 18
Document society

Total amount of required reading for this meeting: 3,800 words

Our lives and our societies are structured by and constituted through documents. We’ll look at some examples.

Today’s reading is the first chapter of Michael Buckland’s book on Information and Society. Buckland is a professor at the Berkeley School of Information, and he was my doctoral advisor.

Optional, but highly recommended, is an excerpt from Alva Noë’s book Strange Tools: Art and Human Nature about how playing baseball requires documents. Noë is a philosopher, also at Berkeley, who writes about human consciousness, neuroscience, and art.

📖 To read before this meeting:

  1. Buckland, Michael. “Introduction.” In Information and Society, 1–19. MIT Press, 2017. PDF.
    3,800 words
  2. Noë, Alva. “Art Loops and the Garden of Eden.” In Strange Tools, 29–48. New York: Hill and Wang, a division of Farrar, Straus and Giroux, 2015. PDF.

January 23
Thinking with our eyes and hands

View slides Updated Sunday 12/22 12:20 PM

Total amount of required reading for this meeting: 9,200 words

For today we’ll read an article by Bruno Latour, a French philosopher, anthropologist and sociologist. Latour wrote this article to persuade his colleagues in the social sciences that they need to pay more attention to documents and processes of documentation.

This is the first of our more difficult readings, which will mostly be assigned for Tuesdays, giving you five days to read them. On the Thursdays before, I will give you some tips for reading these slightly more difficult texts.

📖 To read before this meeting:

  1. Latour, Bruno. “Visualisation and Cognition: Thinking with Eyes and Hands.” Knowledge and Society: Studies in the Sociology of Culture Past and Present 6 (1986): 1–40. PDF.
    9,200 words
    Reading tips

    Latour uses some unusual terminology in this article. He refers to documents as inscriptions and practices of documentation as inscription procedures. He also refers to documents as immutable mobiles, highlighting what he considers to be two of their most important qualities: immutability and mobility.

    Latour is interested in the relationship between practices of documentation and thinking (cognition). His basic argument is that what may seem like great advances in thought are actually better understood as the emergence of new practices of documentation. Latour focuses primarily on documents as aids to visualization rather than as carriers of information. Thus he begins by discussing the emergence of new visualization techniques, such as linear perspective.

January 25
Information theory

View slides Updated Sunday 12/22 12:20 PM

Total amount of required reading for this meeting: 9,100 words

As we began to communicate by through wires and over radio waves, engineers sought to understand and describe how it happens, in order to design better communication systems. Claude Shannon, an engineer who worked at Bell Labs, developed an influential theory that came to be known as “information theory.” Today we’ll investigate some of the phenomena he described.

Before class you should read the excerpt from Edgar Allen Poe’s The Gold-Bug, and optionally you may also read a short historical account of the development of Shannon’s theory by science writer James Gleick.

📖 To read before this meeting:

  1. Poe, Edgar Allan. “The Cryptograph / The Solution Begun / The Cipher Read.” In The Gold Bug. Chicago, New York [etc.] Rand, McNally & Company, 1902. http://archive.org/details/goldbug00poee_1. PDF.
  2. Gleick, James. “Information Theory.” In The Information, 1st ed., 204–232. New York: Pantheon Books, 2011. PDF.
    9,100 words
    Reading tips

    This chapter from science writer James Gleick’s book The Information is an engaging mini-biography of Claude Shannon, but it is also an accessible introduction to information theory.

January 30
Meaning, signs and codes

View slides Updated Sunday 12/22 12:20 PM

Another approach to understanding communication through documents (in addition to Shannon’s theory) is to focus on “signs,” the organization of signs into codes or languages, and the cultures within which signs and codes operate. This approach is known as semiotics. Media scholar John Fiske provides a good basic explanation of what semiotics is and how it differs from information theory.

📖 To read before this meeting:

  1. Fiske, John. “Communication Theory / Meanings, Signs, and Codes.” In Introduction to Communication Studies, 2nd ed., 6–12, 39–46, 56–58, 64–65. London ; New York: Routledge, 1990. PDF.

February 1
Understanding graphics and images

View slides Updated Sunday 12/22 12:20 PM

Semiotics, the study of signs, isn’t limited to texts: we can also use it to describe how we understand graphics and images. Cartoonist Scott McCloud shows how.

📖 To read before this meeting:

  1. McCloud, Scott. “The Vocabulary of Comics.” In Understanding Comics, 1st HarperPerennial ed., 24–59. New York: HarperPerennial, 1994. PDF.

February 6
Making distinctions

View slides Updated Sunday 12/22 12:20 PM

Total amount of required reading for this meeting: 16,900 words

Until now we’ve mainly focused on documents and the marks on them, and how we understand and interpret those marks. This week we change our focus a bit, to look at how our understanding of the world is structured.

We begin with some excerpts from a book by Eviatar Zerubavel about how we categorize and classify the world around us. Zerubavel is a cognitive sociologist, meaning that he studies how social processes shape our thinking, and he’s written a number of fascinating and accessible books on the topic.

📖 To read before this meeting:

  1. Zerubavel, Eviatar. “Introduction / Islands of Meaning / The Great Divide / The Social Lens.” In The Fine Line, 1–17, 21–24, 61–80. New York: Free Press, 1991. PDF.
    16,900 words
    Reading tips

    Eviatar Zerubavel is a cognitive sociologist, meaning that he studies how social processes shape our thinking, and he’s written a number of fascinating and accessible books on the topic. These are selections from his book The Fine Line about making distinctions in everyday life.

February 8
Classification in everyday life

View slides Updated Sunday 12/22 12:20 PM

Total amount of required reading for this meeting: 5,600 words

We all categorize and classify all the time, but we don’t always do it intentionally and systematically. Today we’ll try out a form of systematic classification known as faceted classification.

📖 To read before this meeting:

  1. Hunter, Eric. “What Is Classification? / Classification in an Information System / Faceted Classification.” In Classification Made Simple, 3rd ed. Farnham: Ashgate, 2009. PDF.
    5,600 words

February 13
Scientific classification

View slides Updated Sunday 12/22 12:20 PM

Total amount of required reading for this meeting: 11,300 words

Most of us would readily agree that our everyday “folk” classifications are historically contingent and somewhat arbitrary. Yet scientific classification presumably is different: science is the study of reality, and so scientific classifications are “real” in a way that other classifications are not. Today we’ll discuss the extent to which this is true.

The required reading is by Lorraine Daston, a historian of science. She traces the history of scientists’ attempts to classify clouds.

Optionally, you may also read a short (1.5 pages) article on scientific classification by the philosopher of science John Dupré.

📖 To read before this meeting:

  1. Daston, Lorraine. “Cloud Physiognomy.” Representations 135, no. 1 (August 1, 2016): 45–71. https://doi.org/10.1525/rep.2016.135.1.45.
    10,100 words
  2. Dupré, John. “Scientific Classification.” Theory, Culture & Society 23, no. 2–3 (May 1, 2006): 30–32. PDF.
    1,200 words

February 15
Naming

View slides Updated Sunday 12/22 12:20 PM

We can’t talk or write about things or kinds of things without giving them names. Unfortunately naming isn’t as easy as it sometimes may seem. Today we’ll investigate the difficulties of agreeing on names.

The required reading is another chapter from Buckland’s Information and Society, this time on the topic of naming.

If you have time, I also highly recommend the second book chapter on naming, by Bill Kent. Kent was a computer programmer and database designer at IBM and Hewlett-Packard, during the era when the database technologies we use today were first being developed. He thought deeply and carefully about the challenges of data management, which he recognized were not primarily technical challenges.

📖 To read before this meeting:

  1. Buckland, Michael. “Naming.” In Information and Society, 89–110. MIT Press, 2017. PDF.
  2. Kent, William. “Naming.” In Data and Reality, 41–61. Amsterdam: North-Holland, 1978. PDF.

February 20
Automation

View slides Updated Sunday 12/22 12:20 PM

The past couple of weeks we’ve looked at how people categorize, classify, and name things of interest. As we’ve seen, this can be hard work, and like other kinds of hard work, people have sought to escape it through automation.

To what extent can the organization of information be automated? Information scholar Julian Warner looks at this question by drawing a distinction between different kinds of semiotic labor.

📖 To read before this meeting:

  1. Warner, Julian. “Forms of Labour in Information Systems.” Information Research 7, no. 4 (2002). http://www.informationr.net/ir/7-4/paper135.html.

February 22
Computation

View slides Updated Sunday 12/22 12:20 PM

People were building systems to automate information organization and retrieval long before the invention of the computer, but the digital computer made possible many techniques that were previously unfeasible. The invention of computing also gave birth to a theory of computation, which gives us a mathematical framework for characterizing and measuring syntactic labor. Today we’ll look at one of the earliest computational techniques to be applied to information organization: Boolean logic.

📖 To read before this meeting:

  1. Hillis, W. “Nuts and Bolts / Universal Building Blocks.” In The Pattern on the Stone, 1–38. New York: Basic Books, 1998. PDF.

February 27
The logic of distinctions and sets

View slides Updated Sunday 12/22 12:20 PM

Total amount of required reading for this meeting: 3,400 words

Boolean logic (and ultimately, set theory) is the mathematical formalization upon which many of the techniques of information organization are built. In 1937 Edmund Berkeley, a mathematician working at the Prudential life insurance company, recognized the usefulness of Boolean logic for modeling insurance data—even though at the time there were no digital computers to assist with the calculations, only punched card tabulators.

Berkeley would later go on to be a pioneer of computer science, co-founding the Association for Computing Machinery which is still the primary scholarly association for computer scientists.

📖 To read before this meeting:

  1. Berkeley, Edmund C. “Boolean Algebra (the Technique for Manipulating AND, OR, NOT and Conditions).” The Record 26 part II, no. 54 (1937): 373–414. PDF.
    3,400 words
    Reading tips

    This article is by Edmund Berkeley, a pioneer of computer science and co-founder of the Association for Computing Machinery, which is still the primary scholarly association for computer scientists. But he wrote this article in 1937, before he became a computer scientist—because computers had yet to exist. At the time he was a mathematician working at the Prudential life insurance company, where he recognized the usefulness of Boolean algebra for modeling insurance data. He published this article in a professional journal for actuaries (people who compile and analyze statistics and use them to calculate insurance risks and premiums).

    Berkeley uses some frightening-looking mathematical notation in parts of this article, but everything he discusses is actually quite simple. The most important parts are:

    pages 373–374, where he gives a simple explanation of Boolean algebra,

    pages 380–381, where he considers practical applications of Boolean algebra, and

    pages 383 on, where he pays close attention to translation back and forth between Boolean algebra and English.

March 1
Ryan is at US2TS

March 6
Two minute madness

View slides Updated Sunday 12/22 12:20 PM

Today your midterm papers are due, and each of you will give a two minute, one slide presentation briefly explaining the topic of your paper.

March 6
Midterm class presentation due

March 6
Midterm paper due

March 8
Midterm exam

The midterm exam will be given in class, and it will cover all the concepts we’ve discussed so far.

March 13
Spring break

March 15
Spring break

March 20
Correctness

View slides Updated Sunday 12/22 12:20 PM

In computer science, correctness refers to the degree of correspondence between what a computer program actually does, and what it is supposed to do. A “correct” program is one that does what it is supposed to. But what is a computer program “supposed” to do? It may be relatively straightforward to check that a program is correct with respect to a formal model or specification—but there is still the problem of whether that formal model corresponds with the understandings of reality that the program’s designers and users have. Philosopher and computer scientist Brian Cantwell Smith considers these issues in a paper presented to International Physicians for the Prevention of Nuclear War.

📖 To read before this meeting:

  1. Smith, Brian Cantwell. “The Limits of Correctness.” In Symposium on Unintentional Nuclear War, Fifth Congress of the International Physicians for the Prevention of Nuclear War. Budapest, 1985. PDF.

March 22
Statistical models

View slides Updated Sunday 12/22 12:20 PM

Information science took a major turn when the designers of information retrieval systems for the military and weapons manufacturers began to explore how to automatically classify and index texts. These explorations led to a new form of modeling: the statistical modeling of language. Once we had the ability to create texts digitally and to digitize existing texts, we could use these texts to build statistical language models, a process that was greatly accelerated by the advent of the World Wide Web, which made the collection of large numbers of texts much easier than it had been before.

Text just happened to be one of the first kinds of data that we were able to collect large amounts of. But the same techniques used to statistically model language can also be used to model other phenomena—provided that one can collect large amounts of data generated by these other phenomena. Once people began using the Web for all kinds of things beyond publishing texts, these other kinds of data suddenly became available, opening the door to statistical modeling of nearly everything. Data scientist Cathy O’Neil gives an account of our present-day modeling fever.

📖 To read before this meeting:

  1. O’Neil, Cathy. “Bomb Parts: What Is a Model?” In Weapons of Math Destruction, 15–31. New York: Crown, 2016. PDF.

March 27
Modeling text for computation

View slides Updated Sunday 12/22 12:20 PM

Computationally analyzing text first requires representing the text in a form that can be computationally manipulated. This form is quite different from the forms we are used to interpreting as readers.

📖 To read before this meeting:

  1. Manning, Christopher, Prabhakar Raghavan, and Hinrich Schütze. “Boolean Retrieval / The Term Vocabulary and Postings Lists.” In Introduction to Information Retrieval, 1–34. New York: Cambridge University Press, 2008.
    Reading tips

March 29
Probability and inductive logic

Statistics is hard. Most people don’t intuitively understand probability, including me, and including the vast majority of scientists who rely on statistical methods. So today we’ll review some of the basics, so we know just enough to be dangerous.

📖 To read before this meeting:

  1. Hacking, Ian. An Introduction to Probability and Inductive Logic. Cambridge: Cambridge University Press, 2001. PDF.

April 3
Automatically classifying text

View slides Updated Sunday 12/22 12:20 PM

Total amount of required reading for this meeting: 6,500 words

The shift to statistical modeling in information science can be traced to the work of Bill Maron. Maron was an engineer at missile manufacturer Ramo-Wooldridge when he began investigating statistical methods for classifying and retrieving documents. For today we’ll read a classic paper of Maron’s in which he develops the basic ideas behind the Bayesian classifier, a technique that is still widely used today for a variety of automatic classification tasks from spam filtering to face recognition.

📖 To read before this meeting:

  1. Maron, M. E.“Automatic Indexing: An Experimental Inquiry.” Journal of the ACM 8, no. 3 (July 1961): 404–17. https://doi.org/10.1145/321075.321084.
    6,500 words
    Reading tips

    Bill Maron was an engineer at missile manufacturer Ramo-Wooldridge when he began investigating statistical methods for classifying and retrieving documents. In this paper he describes a method for statistically modeling the subject matter of texts. He introduces the basic ideas behind what is now known as a Bayesian classifier, a technique that is still widely used today for a variety of automatic classification tasks from spam filtering to face recognition.

    Trigger warning: math. The math is relatively basic, and if you’ve studied any probability, you should be able to follow it. But if not, just skip it: Maron explains everything important about his experiment in plain English. Pay extra attention to what he says about “clue words.”

April 5
Ryan was sick

April 10
Modeling topics

View slides Updated Sunday 12/22 12:20 PM

Topic modeling is a technique for classifying text that does not require one to specify a set of categories ahead of time. For that reason it has become particularly popular among humanities scholars and social scientists interested in exploring large collections of text, such as archival collections or social media platforms. Today we’ll try out some simple topic models.

📖 To read before this meeting:

  1. Ted Underwood. “Topic Modeling Made Just Simple Enough.” The Stone and the Shell, April 7, 2012. https://tedunderwood.com/2012/04/07/topic-modeling-made-just-simple-enough/.

April 12
Modeling everything

View slides Updated Sunday 12/22 12:20 PM

Once a technique for statistical modeling has been developed, it can usually be applied to problems other than those for which it was initially developed. Thus topic modeling, initially developed for the unsupervised classification of text, is easily modified to classify other things like people and organizations.

For today, please read chapter 1 of Applications of Topic Models, “The What and Wherefore of Topic Models.” In addition, please skim one of the following chapters, to get a sense of how topic modeling gets used: “Historical Documents,” “Understanding Scientific Publications,” “Fiction and Literature,” and “Computational Social Science”.

📖 To read before this meeting:

  1. Boyd-Graber, Jordan, Yuening Hu, and David Mimno. “Applications of Topic Models.” Foundations and Trends in Information Retrieval 11, no. 2–3 (July 20, 2017): 143–296. https://doi.org/10.1561/1500000030.

April 17
The impact of recommendation

View slides Updated Sunday 12/22 12:20 PM

What are the consequences of the shift from 1) information systems that allow us to precisely specify the properties of the things we seek, to 2) information systems that attempt to anticipate our needs or desires and recommend things to us? If a YouTube video, a search result, a fashion brand, a scientific paper, or a restaurant that people discover via a recommendation service becomes popular and successful, is it because that video, result, brand, paper, or restaurant is of high quality, or is it perhaps due in part to the way the recommendation service works? Sociologists Matthew Salganik and Duncan Watts sought to investigate this question by building their own streaming music service.

📖 To read before this meeting:

  1. Matthew J. Salganik, and Duncan J. Watts. “Leading the Herd Astray: An Experimental Study of Self-Fulfilling Prophecies in an Artificial Cultural Market.” Social Psychology Quarterly 71, no. 4 (December 1, 2008): 338–55. https://doi.org/10.1177/019027250807100404.

April 19
Gaming recommendations

There is reason to believe that recommendation services which rely on historical data are biased toward popular items, creating a “rich-get-richer” effect. This can also result in an overall homogenization of consumption—less overall diversity in what people read, watch, buy, eat, etc. This can be true even if individuals find that their use of recommendation services is introducing them to new things!

But a separate issue is that recommendation services which rely on historical data may be fooled into believing that unpopular items are actually popular. In other words, the services can be “gamed” by small groups who are strongly motivated to make something seem popular, in the hopes that this will become a self-fulfilling prophecy.

📖 To read before this meeting:

  1. Butler, Oobah. “I Made My Shed the Top Rated Restaurant On TripAdvisor.” Vice, December 6, 2017. https://www.vice.com/en_uk/article/434gqw/i-made-my-shed-the-top-rated-restaurant-on-tripadvisor.

April 24
Human decisions and machine predictions

View slides Updated Sunday 12/22 12:20 PM

The powerful techniques that information scientists developed for classifying and ranking texts are now being applied to every aspect of our lives. What effects is this having? How can we determine whether information technologies are aiding our decision-making or harming it? Judges make high-impact life-altering and world-altering decisions daily. One kind of high-impact decision judges make is whether to grant bail to persons accused of crimes. What is the potential impact of judges being guided in these decisions by algorithms trained on historical data?

📖 To read before this meeting:

  1. Kleinberg, Jon, Himabindu Lakkaraju, Jure Leskovec, Jens Ludwig, and Sendhil Mullainathan. “Human Decisions and Machine Predictions.” Working Paper. National Bureau of Economic Research, February 2017. https://doi.org/10.3386/w23180.

April 26
Looking back / looking ahead

View slides Updated Sunday 12/22 12:20 PM

Today your final papers are due. We’ll review the ground we covered this semester and look ahead to more advanced information science classes, and information science careers.

April 26
Final paper due

May 7
Final exam

The final exam is scheduled for 12 noon on Monday, May 7. It will cover all the concepts from this course.