Foundations of Information Science

UNC School of Information and Library Science, INLS 201, Spring 2020

What is information science? What are the information professions?

During the first part of this course, we'll try to understand what “information science” and the “information professions” might be.

We'll start the first week with a brief history of the information professions in the 20th century United States.

Each of the next three weeks we will examine a different paradigm for understanding what information is: 1) information theory, 2) semiotics, and 3) documentation. Each of these three paradigms produces different answers to the questions “What should information professionals do?” and “Can there be a science of information?”

Finally, we'll wrap up the first part of the course by considering three prominent information scholars' different views on what “information science” is.

January 14
The information professions

Total amount of reading for this week: 13,100 words

SILS is a professional school, so we’ll begin by examining the “information professions.“ What are they? How do they relate to “information schools,” or “information science”? The story is complicated. In 1988, sociologist Andrew Abbott, who was interested in how professions emerge and change, tried to sort it all out.

For this week, please read Abbott’s “The Information Professions,” a chapter in his book The System of Professions.

To read before this class:

  1. Abbott, Andrew. “The Information Professions.” In The System of Professions, 215–246. University of Chicago Press, 1988. PDF.
    13,100 words
    Reading tips

    This is an excerpt from a book that advances a theory about how professions change over time, so there is some discussion of that theory here. Don’t worry too much about that—focus on what Abbott means by the qualitative and quantitative information professions, and especially his discussion of the attempt to create a combined jurisdiction that would unify quantitative and qualitative information.

    Abbott’s story ends in 1988, so there is obviously more to say about what has happened to the “information professions” since then.

January 21
Information theory

Total amount of reading for this week: 17,100 words

The first of the three paradigms for thinking about information that we will examine is information theory.

As we began to communicate through wires and over radio waves, engineers sought to understand and describe how it happens, in order to design better communication systems. Claude Shannon, an engineer who worked at Bell Labs, developed an influential theory that came to be known as “information theory.”

The papers in which Claude Shannon developed his mathematical theory of communication—commonly referred to as information theory—were originally published in 1948 in two parts in the Bell System Technical Journal. A year later, Warren Weaver published a summary of Shannon’s work (part 2 of the first reading below), alongside his own speculations regarding its implications (parts 1 and 3).

About six years after information theory made its debut, Shannon wrote the one-page editorial that is the second reading below.

The third reading is a chapter from a standard introductory textbook in communications, explaining information theory (or communication theory) in more modern and accessible terms.

To read before this class:

  1. Weaver, Warren. “Recent Contributions to The Mathematical Theory of Communication,” September 1949. PDF.
    9,900 words
    Reading tips

    There is some math in this report. If you’re not mathematically inclined, just skip over it—it isn’t necessary to understand the math in order to understand the basic ideas. Focus on how Shannon and Weaver define communication, and the problems they identify. Try to understand what Weaver means by his three levels (A, B, C) of communication.

  2. Shannon, Claude. “The Bandwagon.” IRE Transactions on Information Theory 2, no. 3 (1956): 3. PDF.
    600 words
  3. Fiske, John. “Communication Theory.” In Introduction to Communication Studies, 3nd ed., 5–21. London ; New York: Routledge, 2010.
    6,600 words

January 28
Semiotics: meaning, signs, and codes

Total amount of reading for this week: 16,000 words

The second of the three paradigms for thinking about information that we will examine is semiotics.

While the mathematical theory of information ignores meaning and focuses on communication as a process, semiotics focuses on the construction of meaning * through the use of signs. Semioticians look at how signs are organized into languages or codes*, and how signs and codes operate within our broader cultures.

Both of this week’s readings are from the same introductory communications textbook that we read a chapter from last week.

To read before this class:

  1. Fiske, John. “Communication, Meaning, and Signs.” In Introduction to Communication Studies, 3nd ed., 37–60. London ; New York: Routledge, 2010.
    8,400 words
  2. Fiske, John. “Codes.” In Introduction to Communication Studies, 3nd ed., 61–79. London ; New York: Routledge, 2010.
    7,600 words

February 4

Total amount of reading for this week: 19,800 words

Assignment 1 handed out

The third of the three paradigms for thinking about information that we will examine is documentation.

In everyday speech, documentation usually means material that provides official evidence or that serves as a record, as in: “If you want to apply for a passport, you will have to provide documentation of your citizenship status.” In an information technology context, documentation typically means the written instructions that accompany software or hardware, as in: “Please consult the user documentation before using this software.”

But we can also use the word documentation to refer more broadly to all kinds of human practices involving documents: creating them, annotating them, classifying them, aggregating them, etc. This what we will mean by documentation in this course.

The first reading for this week is from the book The Social Life of Information by John Seely Brown and Paul Duguid. In this chapter, Brown and Duguid explain why despite 50+ years of digital computers and networks, we still use a lot of paper documents.

The second reading is an excerpt from an article by Bruno Latour, a French philosopher, anthropologist and sociologist. Latour wrote this article to persuade his colleagues in the social sciences that they need to pay more attention to documents and practices of documentation.

To read before this class:

  1. Brown, John Seely, and Paul Duguid. “Reading the Background.” In The Social Life of Information, 173–205. Boston: Harvard Business School Press, 2000. PDF.
    10,600 words
  2. Latour, Bruno. “Visualisation and Cognition: Thinking with Eyes and Hands.” Knowledge and Society: Studies in the Sociology of Culture Past and Present 6 (1986): 1–40. PDF.
    9,200 words
    Reading tips

    Latour uses some unusual terminology in this article. He refers to documents as inscriptions and practices of documentation as inscription procedures. He also refers to documents as immutable mobiles, highlighting what he considers to be two of their most important qualities: immutability and mobility.

    Latour is interested in the relationship between practices of documentation and thinking (cognition). His basic argument is that what may seem like great advances in thought are actually better understood as the emergence of new practices of documentation. Latour focuses primarily on documents as aids to visualization rather than as carriers of information. Thus he begins by discussing the emergence of new visualization techniques, such as linear perspective.

February 11
Information science?

Assignment 1 due

Total amount of reading for this week: 19,000 words

Can there be a science of information? It depends on what you mean by “science,” and also on what you mean by “information.” This week we’ll read and discuss three different takes—all by current or former faculty of the UCLA Department of Information Studies—on the possibility of “information science.”

This first take is by information scientist Marcia Bates, who often—and influentially—reflected on the nature of both information and information science. In this article she argues that information science should be understood as a “meta-science.”

The second take is by Philip Agre, a computer-scientist-turned-information-scholar, who is more skeptical about the possibility of information science. He argues that treating different genres of documents all as “information” is not necessarily the best way to think about the content or meaning of those documents. He urges greater attention to the practices of documentation that constitute the “circuitry” of various institutions.

The final take is by Jonathan Furner. Furner argues that information science is not about information, nor is it a science.

To read before this class:

  1. Bates, Marcia J. “The Invisible Substrate of Information Science.” Journal of the American Society for Information Science; New York 50, no. 12 (October 1999): 1043–50.
    6,900 words
  2. Agre, Philip E. “Institutional Circuitry: Thinking about the Forms and Uses of Information.” Information Technology and Libraries 14, no. 4 (December 1995): 225.
    4,600 words
  3. Furner, Jonathan. “Information Science Is Neither.” Library Trends 63, no. 3 (2015): 362–77.
    7,500 words

Worldviews and data models

A worldview is a way of looking at the world that shapes one's perception of the world. Worldviews make it easy to see some things and difficult or impossible to see other things. Indeed, one's understanding of what “things” are is part of one's worldview.

It is not possible to communicate, or even to think, without a worldview. For anything to have meaning, to be thinkable and communicable, it must somehow be represented, and representation inevitably means drawing boundaries and making distinctions to establish what “things” there are and how they relate to one another. Collectively, these boundaries and distinctions constitute a worldview.

When we represent things using computers, we encode worldviews into data models. Different approaches to modeling data are different ways of turning the world into computable “information.” The choice of one way of data modeling over another can be consequential.

During the second part of this course, we'll first look at how we draw boundaries and make distinctions, both in everyday life and as part of the construction of scientific knowledge.

We'll then examine and contrast two different ways of formally modeling these boundaries and distinctions so that they can be represented and manipulated mathematically: Boolean algebra (putting things into groups based on their attributes) and Bayesian inference (hypothesizing about the groups to which things might belong, based on past evidence).

February 18
Drawing boundaries, making distinctions

Total amount of reading for this week: 16,900 words

Making things meaningful involves drawing boundaries and making distinctions—categorizing and classifying the world around us. Eviatar Zerubavel is a cognitive sociologist, meaning that he studies how social processes shape our thinking, and he’s written a number of fascinating and accessible books on the topic. For this week we’ll read some selections from his book The Fine Line about making distinctions in everyday life.

To read before this class:

  1. Zerubavel, Eviatar. “Introduction / Islands of Meaning / The Great Divide / The Social Lens.” In The Fine Line, 1–17, 21–24, 61–80. New York: Free Press, 1991. PDF.
    16,900 words

February 25
Scientific classification

Total amount of reading for this week: 11,300 words

Last week we looked at categorization and classification in everyday life. This week we’ll look at scientific categorization and classification.

Most of us would readily agree that our everyday “folk” classifications are somewhat arbitrary. Scientific classification presumably is different: science is the study of reality, and so scientific classifications are “real” in a way that other classifications are not. This week we’ll consider whether we can really draw such a clean distinction between “everyday” and “scientific” classification.

The first reading is a very short article by the philosopher of science John Dupré.

The second reading is by Lorraine Daston, a historian of science. She traces the history of scientists’ attempts to classify clouds.

To read before this class:

  1. Dupré, John. “Scientific Classification.” Theory, Culture & Society 23, no. 2–3 (May 1, 2006): 30–32.
    1,200 words
  2. Daston, Lorraine. “Cloud Physiognomy.” Representations 135, no. 1 (August 1, 2016): 45–71.
    10,100 words
    Reading tips

    Things to focus on in this reading:

    • What’s the difference between variety and variability, and why are both problems for classification?

    • What are some of the possible different approaches that might be taken to classify clouds?

    • What motivated the creation of cloud atlases?

    • What role do images play in cloud atlases?

March 3

Midterm due

Midterm exam

The midterm exam will be given in Manning 209, 9:30 AM – 10:45 AM, unless you have made other arrangements.

March 10
Spring break

No class.

March 17
Spring break

No class.

March 24
Data modeling: objects, attributes, and types

Total amount of reading for this week: 16,500 words

This week we will consider a common way of modeling the world so as to turn it into “information.” It is so common that most of us take it for granted.

This way of modeling the world relies on the following “common-sense” assumptions:

  • The world consists of individual objects or entities.
  • These entities have attributes that can be counted and described.
  • Entities can be sorted into types based on the presence or absence or values of their attributes.

If we make these assumptions, we can translate our models of the world into mathematical expressions using Boolean algebra.

Three short readings each explore this way of modeling, from slightly different perspectives.

The first reading is an excerpt from one of my favorite books, Data and Reality by Bill Kent. Kent was a computer programmer and database designer at IBM and Hewlett-Packard, during the era when the database technologies we use today were first being developed. He thought deeply and carefully about the challenges of data modeling and management, which he recognized were not primarily technical challenges.

The second reading is an excerpt from a very useful and easy-to-read (and very British) textbook on how to classify things.

The final reading is by Edmund Berkeley, a pioneer of computer science and co-founder of the Association for Computing Machinery, which is still the primary scholarly association for computer scientists. But he wrote this article in 1937, before he became a computer scientist—because computers had yet to exist. At the time he was a mathematician working at the Prudential life insurance company, where he recognized the usefulness of Boolean algebra for modeling insurance data. He published this article in a professional journal for actuaries (people who compile and analyze statistics and use them to calculate insurance risks and premiums).

To read before this class:

  1. Kent, William. “Attributes / Types and Categories and Sets / Models.” In Data and Reality, 77–94. Amsterdam: North-Holland, 1978. PDF.
    5,400 words
    Reading tips

    The fixed-width typewriter font makes this reading look old-fashioned, but nothing in it is out-of-date. These are precisely the same issues data modelers and “data scientists” struggle with today.

  2. Hunter, Eric. “What Is Classification? / Classification in an Information System / Faceted Classification.” In Classification Made Simple, 3rd ed. Farnham: Ashgate, 2009. PDF.
    5,600 words
  3. Berkeley, Edmund C. “Boolean Algebra (the Technique for Manipulating AND, OR, NOT and Conditions).” The Record 26 part II, no. 54 (1937): 373–414. PDF.
    5,500 words
    Reading tips

    Berkeley uses some frightening-looking mathematical notation in parts of this article, but everything he discusses is actually quite simple. If the notation turns you off, just skip over it. The most important parts are:

    • pages 373–375, where he gives a simple explanation of Boolean algebra,
    • pages 380–381, where he considers practical applications of Boolean algebra, and
    • pages 383 on, where he pays close attention to translation back and forth between Boolean algebra and English.

March 31
Classifying texts, modeling subject matter

Total amount of reading for this week: 18,400 words

The limitations of the kind of modeling we looked at last week become clear if we try to apply it to classify the subject matter of texts. Texts include things like books and news articles, but could also include things like movies and video games—anything for which it makes sense to ask, “What it is about?”

In the first reading, Patrick Wilson considers the problems that arise if one tries to treat the subject of a text as an attribute of that text.

The second reading introduces a way of modeling the world that is radically different from the one we looked at last week. Bill Maron was an engineer at missile manufacturer Ramo-Wooldridge when he began investigating statistical methods for classifying and retrieving documents. In this paper he describes a method for statistically modeling the subject matter of texts. He introduces the basic ideas behind what is now known as a Bayesian classifier, a technique that is still widely used today for a variety of automatic classification tasks from spam filtering to face recognition.

To read before this class:

  1. Wilson, Patrick. “Subjects and the Sense of Position.” In Two Kinds of Power, 69–92. Berkeley: University of California Press, 1968. PDF.
    11,900 words
    Reading tips

    Wilson can be a bit long-winded, but his insights are worth it. (You can skip the very long footnotes, so this reading is actually shorter than it looks.) What Wilson calls a “writing” is more typically referred to as a text. In this chapter he is criticizing the assumptions librarians make when cataloging texts by subject. The “sense of position” in the title of the chapter refers to the librarian’s sense of where in a classification scheme a text should be placed. Although he is talking about library classification, everything Wilson says is also applicable to state-of-the-art machine classification of texts today.

  2. Maron, M. E.“Automatic Indexing: An Experimental Inquiry.” Journal of the ACM 8, no. 3 (July 1961): 404–417.
    6,500 words
    Reading tips

    Trigger warning: math. The math is relatively basic and if you’ve studied any probability, you should be able to follow it. But if not, just skip it: Maron explains everything important about his experiment in plain English. Pay extra attention to what he says about “clue words.”

Selection systems in society

Information professionals, along with the technological systems that they build, can be understood as constituting selection systems that attempt to extract usable information from masses of documents. Boolean algebra and Bayesian inference are two different logics according to which selections systems can be constructed (and of course it is possible to construct systems that combine these two logics and possibly other logics as well).

During the last part of this course, we'll look at selection systems in the context of our broader society.

First, we'll reflect on the relationship between technology and society. Does technological change cause social, political, and cultural change? Or do technologies simply reflect social, political, and cultural practices?

Then, we'll consider the trade-offs between using human and machine labor in selection systems.

Finally, we'll look again at Boolean algebra and Bayesian inference, and consider the broader consequences of their differing logics.

April 7
Technology and society

Total amount of reading for this week: 15,000 words

Assignment 2 handed out

There are various positions one might take regarding the relationship between technology and society. This week we’ll read some influential papers investigating this relationship.

The first reading is not an influential paper, but rather a review of a influential book on technology, Lynn White‘s Medieval Technology and Social Change. White became famous for persuasively arguing that two technologies—the stirrup and the plough—determined the course of medieval history. The author of this review does not agree. (I’ve only included the first part of the review, which addresses the stirrup.)

The second reading is an excerpt from an influential paper introducing the theory of social construction of technology (SCOT), which holds that technological change is determined by social factors rather than properties inherent to technologies. The theory is illustrated using the history of the bicycle.

The last reading is a very famous and influential paper by Langdon Winner, in which he explores whether certain technologies can be understood as “embodying” political ideas.

To read before this class:

  1. Sawyer, P. H.“Technical Determinism: The Stirrup and the Plough.” Past & Present, no. 24 (1963): 90–95. PDF.
    2,600 words
  2. Pinch, Trevor J., and Wiebe E. Bijker. “The Social Construction of Facts and Artefacts: Or How the Sociology of Science and the Sociology of Technology Might Benefit Each Other.” Social Studies of Science 14, no. 3 (1984): 411–428. PDF.
    3,400 words
    Reading tips

    The authors are attacking what they describe as “linear” models of technological development, which focus on a series of “technological breakthroughs” leading inevitably to where we are today. They argue that looking at the actual historical development of a technology like the bicycle shows that what seem in retrospect to be obvious “technological breakthroughs” were not at all obvious at the time.

    It may help to consult these pages to get a sense of the different bicycle models discussed in the reading:

  3. Winner, Langdon. “Do Artifacts Have Politics?” Daedalus 109, no. 1 (1980): 121–136.
    9,000 words

April 14
Selection labor by people and machines

Assignment 2 due

Total amount of reading for this week: 13,600 words

Regardless of the type of data modeling employed, turning the world into information involves labor. This week we’ll consider the question of automation: what kinds of labor should be done by people, and what kinds should be done by machines?

The first reading simply introduces the concept of a selection system.

The second reading is by Julian Warner, who looks at description and search labor in selection systems through a Marxist lens.

The third reading is an excerpt from a recent article that considers how various biases reflected in human-built classification schemes are incorporated into machine-driven selection systems.

To read before this class:

  1. Buckland, Michael. “Discovery and Selection.” In Information and Society, 135–152. MIT Press, 2017. PDF.
    2,900 words
  2. Warner, Julian. “Description and Search Labor for Information Retrieval.” Journal of the American Society for Information Science and Technology 58, no. 12 (2007): 1783–1790.
    6,800 words
    Reading tips

    Warner’s writing can be hard to follow at times. If you’re getting bogged down, focus on trying to understand the various categories of labor that Warner identifies, and how they relate to one another. What does he mean by “the dynamic compelling the transfer of human syntactic labor to technology stemming from the costs of direct human labor” (page 1789)?

  3. Broughton, Vanda. “The Respective Roles of Intellectual Creativity and Automation in Representing Diversity: Human and Machine Generated Bias.” Knowledge Organization 46, no. 8 (2019): 596–601. PDF.
    3,900 words
    Reading tips

    This article provides a good review of the ethics of machine labor, and the close look at how religions are represented in WordNet is useful.

    I’ve left out the latter part of the article, which muses about how robots might be given a sense of morality.

April 21
Applying selection techniques: Boolean vs. Bayesian

Total amount of reading for this week: 13,100 words

Boolean algebra and Bayesian inference are different—possibly complementary—techniques for building selection systems. We’ve looked at how these techniques work in the abstract, but what consequences do they have?

The first reading for this week is an excerpt from an article arguing that, though they are perceived as outdated, selection systems based on Boolean algebra (more commonly referred to as Boolean retrieval systems) are preferable for some purposes because they offer more opportunities for human decision-making during searches.

The second reading “scrutinizes” Bill Maron’s Bayesian classifier, identifying it as an example of an algorithmic technique that is now applied for many different purposes that differ quite a bit in their particulars from Maron’s “library problem.”

To read before this class:

  1. Hjørland, Birger. “Classical Databases and Knowledge Organization: A Case for Boolean Retrieval and Human Decision-Making during Searches.” Journal of the Association for Information Science and Technology 66, no. 8 (August 1, 2015): 1559–75. PDF.
    2,800 words
  2. Rieder, Bernhard. “Scrutinizing an Algorithmic Technique: The Bayes Classifier as Interested Reading of Reality.” Information, Communication & Society 20, no. 1 (January 2, 2017): 100–117.
    10,300 words

April 27
Final exam available

The exam will be available starting 12:01am EDT on Monday, April 27.

May 3
Final exam due

The exam must be completed by 11:59pm EDT on Sunday, May 3.