Foundations of Information Science
UNC SILS, INLS 201, Fall 2017
August 22
Introduction
Today we’ll meet each other, and I’ll explain the plan for the class and how to use the course website. Finally we’ll try out our federated wiki.
If you feel like it, check out the federated wiki videos.
August 24
Document society
Total amount of required reading for this meeting: 3,800 words
Our lives and our societies are structured by and constituted through documents. We’ll look at some examples.
Today’s reading is the first chapter of Michael Buckland’s book on Information and Society. Buckland is a professor at the Berkeley School of Information, and he was my doctoral advisor.
Optional, but highly recommended, is an excerpt from Alva Noë’s book Strange Tools: Art and Human Nature about how playing baseball requires documents. Noë is a philosopher, also at Berkeley, who writes about human consciousness, neuroscience, and art.
📖 To read before this meeting:
August 29
Thinking with our eyes and hands
View
slides
Total amount of required reading for this meeting: 9,200 words
For today we’ll read an article by Bruno Latour, a French philosopher, anthropologist and sociologist. Latour wrote this article to persuade his colleagues in the social sciences that they need to pay more attention to documents and processes of documentation.
This is the first of our more difficult readings, which will mostly be assigned for Tuesdays, giving you five days to read them. On the Thursdays before, I will give you some tips for reading these slightly more difficult texts.
📖 To read before this meeting:
-
Latour, Bruno. “Visualisation and Cognition: Thinking with Eyes and Hands.” Knowledge and Society: Studies in the Sociology of Culture Past and Present 6 (1986): 1–40. PDF.
Reading tips
Latour uses some unusual terminology in this article. He refers to documents as inscriptions and practices of documentation as inscription procedures. He also refers to documents as immutable mobiles, highlighting what he considers to be two of their most important qualities: immutability and mobility.
Latour is interested in the relationship between practices of documentation and thinking (cognition). His basic argument is that what may seem like great advances in thought are actually better understood as the emergence of new practices of documentation. Latour focuses primarily on documents as aids to visualization rather than as carriers of information. Thus he begins by discussing the emergence of new visualization techniques, such as linear perspective.
August 31
Information theory
View
slides
Total amount of required reading for this meeting: 9,100 words
As we began to communicate by through wires and over radio waves, engineers sought to understand and describe how it happens, in order to design better communication systems. Claude Shannon, an engineer who worked at Bell Labs, developed an influential theory that came to be known as “information theory.” Today we’ll investigate some of the phenomena he described.
Before class you should read the excerpt from Edgar Allen Poe’s The Gold-Bug, and optionally you may also read a short historical account of the development of Shannon’s theory by science writer James Gleick.
📖 To read before this meeting:
-
Poe, Edgar Allan. “The Cryptograph / The Solution Begun / The Cipher Read.” In The Gold Bug. Chicago, New York [etc.] Rand, McNally & Company, 1902. http://archive.org/details/goldbug00poee_1. PDF.
-
Gleick, James. “Information Theory.” In The Information, 1st ed., 204–232. New York: Pantheon Books, 2011. PDF.
Reading tips
This chapter from science writer James Gleick’s book The Information is an engaging mini-biography of Claude Shannon, but it is also an accessible introduction to information theory.
September 5
Meaning, signs and codes
View
slides
Another approach to understanding communication through documents (in addition to Shannon’s theory) is to focus on “signs,” the organization of signs into codes or languages, and the cultures within which signs and codes operate. This approach is known as semiotics. Media scholar John Fiske provides a good basic explanation of what semiotics is and how it differs from information theory.
📖 To read before this meeting:
-
Fiske, John. “Communication Theory / Meanings, Signs, and Codes.” In Introduction to Communication Studies, 2nd ed., 6–12, 39–46, 56–58, 64–65. London ; New York: Routledge, 1990. PDF.
September 7
Understanding graphics and images
View
slides
Semiotics, the study of signs, isn’t limited to texts: we can also use it to describe how we understand graphics and images. Cartoonist Scott McCloud shows how.
📖 To read before this meeting:
-
McCloud, Scott. “The Vocabulary of Comics.” In Understanding Comics, 1st HarperPerennial ed., 24–59. New York: HarperPerennial, 1994. PDF.
September 12
Making distinctions
View
slides
Total amount of required reading for this meeting: 16,900 words
Until now we’ve mainly focused on documents and the marks on them, and how we understand and interpret those marks. This week we change our focus a bit, to look at how our understanding of the world is structured.
We begin with some excerpts from a book by Eviatar Zerubavel about how we categorize and classify the world around us. Zerubavel is a cognitive sociologist, meaning that he studies how social processes shape our thinking, and he’s written a number of fascinating and accessible books on the topic.
📖 To read before this meeting:
-
Zerubavel, Eviatar. “Introduction / Islands of Meaning / The Great Divide / The Social Lens.” In The Fine Line, 1–17, 21–24, 61–80. New York: Free Press, 1991. PDF.
Reading tips
Eviatar Zerubavel is a cognitive sociologist, meaning that he studies how social processes shape our thinking, and he’s written a number of fascinating and accessible books on the topic. These are selections from his book The Fine Line about making distinctions in everyday life.
September 14
Classification in everyday life
Total amount of required reading for this meeting: 5,600 words
We all categorize and classify all the time, but we don’t always do it intentionally and systematically. Today we’ll try out a form of systematic classification known as faceted classification.
📖 To read before this meeting:
-
Hunter, Eric. “What Is Classification? / Classification in an Information System / Faceted Classification.” In Classification Made Simple, 3rd ed. Farnham: Ashgate, 2009. PDF.
September 19
Scientific classification
View
slides
Total amount of required reading for this meeting: 11,300 words
Most of us would readily agree that our everyday “folk” classifications are historically contingent and somewhat arbitrary. Yet scientific classification presumably is different: science is the study of reality, and so scientific classifications are “real” in a way that other classifications are not. Today we’ll discuss the extent to which this is true.
The required reading is by Lorraine Daston, a historian of science. She traces the history of scientists’ attempts to classify clouds.
Optionally, you may also read a short (1.5 pages) article on scientific classification by the philosopher of science John Dupré.
📖 To read before this meeting:
-
Daston, Lorraine. “Cloud Physiognomy.” Representations 135, no. 1 (August 1, 2016): 45–71. https://doi.org/10.1525/rep.2016.135.1.45.
-
Dupré, John. “Scientific Classification.” Theory, Culture & Society 23, no. 2–3 (May 1, 2006): 30–32. PDF.
September 21
Naming
We can’t talk or write about things or kinds of things without giving them names. Unfortunately naming isn’t as easy as it sometimes may seem. Today we’ll investigate the difficulties of agreeing on names.
The required reading is another chapter from Buckland’s Information and Society, this time on the topic of naming.
If you have time, I also highly recommend the second book chapter on naming, by Bill Kent. Kent was a computer programmer and database designer at IBM and Hewlett-Packard, during the era when the database technologies we use today were first being developed. He thought deeply and carefully about the challenges of data management, which he recognized were not primarily technical challenges.
📖 To read before this meeting:
September 26
Automation
View
slides
The past couple of weeks we’ve looked at how people categorize, classify, and name things of interest. As we’ve seen, this can be hard work, and like other kinds of hard work, people have sought to escape it through automation.
To what extent can the organization of information be automated? Information scholar Julian Warner looks at this question by drawing a distinction between different kinds of semiotic labor.
📖 To read before this meeting:
-
Warner, Julian. “Forms of Labour in Information Systems.” Information Research 7, no. 4 (2002). http://www.informationr.net/ir/7-4/paper135.html.
September 28
Computation
View
slides
People were building systems to automate information organization and retrieval long before the invention of the computer, but the digital computer made possible many techniques that were previously unfeasible. The invention of computing also gave birth to a theory of computation, which gives us a mathematical framework for characterizing and measuring syntactic labor. Today we’ll look at one of the earliest computational techniques to be applied to information organization: Boolean logic.
📖 To read before this meeting:
-
Hillis, W. “Nuts and Bolts / Universal Building Blocks.” In The Pattern on the Stone, 1–38. New York: Basic Books, 1998. PDF.
October 3
The logic of distinctions and sets
View
slides
Total amount of required reading for this meeting: 3,400 words
Boolean logic (and ultimately, set theory) is the mathematical formalization upon which many of the techniques of information organization are built. In 1937 Edmund Berkeley, a mathematician working at the Prudential life insurance company, recognized the usefulness of Boolean logic for modeling insurance data—even though at the time there were no digital computers to assist with the calculations, only punched card tabulators.
Berkeley would later go on to be a pioneer of computer science, co-founding the Association for Computing Machinery which is still the primary scholarly association for computer scientists.
📖 To read before this meeting:
-
Berkeley, Edmund C. “Boolean Algebra (the Technique for Manipulating AND, OR, NOT and Conditions).” The Record 26 part II, no. 54 (1937): 373–414. PDF.
Reading tips
This article is by Edmund Berkeley, a pioneer of computer science and co-founder of the Association for Computing Machinery, which is still the primary scholarly association for computer scientists. But he wrote this article in 1937, before he became a computer scientist—because computers had yet to exist. At the time he was a mathematician working at the Prudential life insurance company, where he recognized the usefulness of Boolean algebra for modeling insurance data. He published this article in a professional journal for actuaries (people who compile and analyze statistics and use them to calculate insurance risks and premiums).
Berkeley uses some frightening-looking mathematical notation in parts of this article, but everything he discusses is actually quite simple. The most important parts are:
pages 373–374, where he gives a simple explanation of Boolean algebra,
pages 380–381, where he considers practical applications of Boolean algebra, and
pages 383 on, where he pays close attention to translation back and forth between Boolean algebra and English.
October 5
Modeling knowledge
View
slides
Total amount of required reading for this meeting: 3,000 words
By the 1970s, computer engineers had successfully built powerful and efficient databases, which they called “relational” databases because of their basis in the way relations are modeled by set theory. (This was Codd’s famous relational model of data.)
But database designers soon realized that having relational database technology was useless without a method for translating real-world situations and processes into the relational model. What they needed was a method for modeling knowledge relationally—and this is what the computer scientists Peter Chen provided in 1976 with his entity-relationship model.
In addition to the Chen article, please read database designer Eric Evans’ short account of what it is like to engage in entity-relationship modeling. For a slightly different account, you can optionally read Stephen Wolfram’s blog post about trying to model chemistry.
📖 To read before this meeting:
-
Chen, Peter Pin-Shan. “The Entity-Relationship Model—toward a Unified View of Data.” ACM Trans. Database Syst. 1, no. 1 (March 1976): 9–36. https://doi.org/10.1145/320434.320440.
-
Evans, Eric. “Crunching Knowledge.” In Domain-Driven Design. Boston: Addison-Wesley, 2004. PDF.
-
Wolfram, Stephen. “The Practical Business of Ontology: A Tale from the Front Lines.” Stephen Wolfram Blog, July 2017. http://blog.stephenwolfram.com/2017/07/the-practical-business-of-ontology-a-tale-from-the-front-lines/.
October 10
Correctness
View
slides
In computer science, correctness refers to the degree of correspondence between what a computer program actually does, and what it is supposed to do. A “correct” program is one that does what it is supposed to. But what is a computer program “supposed” to do? It may be relatively straightforward to check that a program is correct with respect to a formal model or specification—but there is still the problem of whether that formal model corresponds with the understandings of reality that the program’s designers and users have. Philosopher and computer scientist Brian Cantwell Smith considers these issues in a paper presented to International Physicians for the Prevention of Nuclear War.
📖 To read before this meeting:
-
Smith, Brian Cantwell. “The Limits of Correctness.” In Symposium on Unintentional Nuclear War, Fifth Congress of the International Physicians for the Prevention of Nuclear War. Budapest, 1985. PDF.
October 12
Two minute madness
View
slides
Today your midterm papers are due, and each of you will give a two minute, one slide presentation briefly explaining the topic of your paper.
October 12
Midterm paper due
October 17
Midterm exam
The midterm exam will be given in class, and it will cover the formal concepts we’ve covered so far: information theory, semiotics, faceted classification, Boolean logic, and entity-relationship modeling.
October 19
Fall break
October 24
From individuals to populations
There is no reading for today. I’ll return your midterm papers and exams, and we’ll review the first half of the course and look ahead to the second half.
October 26
Statistical models
View
slides
Information science took a major turn when the designers of information retrieval systems for the military and weapons manufacturers began to explore how to automatically classify and index texts. These explorations led to a new form of modeling: the statistical modeling of language. Once we had the ability to create texts digitally and to digitize existing texts, we could use these texts to build statistical language models, a process that was greatly accelerated by the advent of the World Wide Web, which made the collection of large numbers of texts much easier than it had been before.
Text just happened to be one of the first kinds of data that we were able to collect large amounts of. But the same techniques used to statistically model language can also be used to model other phenomena—provided that one can collect large amounts of data generated by these other phenomena. Once people began using the Web for all kinds of things beyond publishing texts, these other kinds of data suddenly became available, opening the door to statistical modeling of nearly everything. Data scientist Cathy O’Neil gives an account of our present-day modeling fever.
📖 To read before this meeting:
-
O’Neil, Cathy. “Bomb Parts: What Is a Model?” In Weapons of Math Destruction, 15–31. New York: Crown, 2016. PDF.
October 31
Modeling text for computation
View
slides
Computationally analyzing text first requires representing the text in a form that can be computationally manipulated. This form is quite different from the forms we are used to interpreting as readers.
📖 To read before this meeting:
-
Manning, Christopher, Prabhakar Raghavan, and Hinrich Schütze. “Boolean Retrieval / The Term Vocabulary and Postings Lists.” In Introduction to Information Retrieval, 1–34. New York: Cambridge University Press, 2008.
Reading tips
November 2
Probability and inductive logic
Statistics is hard. Most people don’t intuitively understand probability, including me, and including the vast majority of scientists who rely on statistical methods. So today we’ll review some of the basics, so we know just enough to be dangerous.
📖 To read before this meeting:
-
Hacking, Ian. An Introduction to Probability and Inductive Logic. Cambridge: Cambridge University Press, 2001. PDF.
November 7
Automatically classifying text
View
slides
Total amount of required reading for this meeting: 6,500 words
The shift to statistical modeling in information science can be traced to the work of Bill Maron. Maron was an engineer at missile manufacturer Ramo-Wooldridge when he began investigating statistical methods for classifying and retrieving documents. For today we’ll read a classic paper of Maron’s in which he develops the basic ideas behind the Bayesian classifier, a technique that is still widely used today for a variety of automatic classification tasks from spam filtering to face recognition.
📖 To read before this meeting:
-
Maron, M. E.“Automatic Indexing: An Experimental Inquiry.” Journal of the ACM 8, no. 3 (July 1961): 404–17. https://doi.org/10.1145/321075.321084.
Reading tips
Bill Maron was an engineer at missile manufacturer Ramo-Wooldridge when he began investigating statistical methods for classifying and retrieving documents. In this paper he describes a method for statistically modeling the subject matter of texts. He introduces the basic ideas behind what is now known as a Bayesian classifier, a technique that is still widely used today for a variety of automatic classification tasks from spam filtering to face recognition.
Trigger warning: math. The math is relatively basic, and if you’ve studied any probability, you should be able to follow it. But if not, just skip it: Maron explains everything important about his experiment in plain English. Pay extra attention to what he says about “clue words.”
November 9
Modeling topics
Topic modeling is a technique for classifying text that does not require one to specify a set of categories ahead of time. For that reason it has become particularly popular among humanities scholars and social scientists interested in exploring large collections of text, such as archival collections or social media platforms. Today we’ll try out some simple topic models.
📖 To read before this meeting:
-
Sievert, Carson. “A Topic Model for Movie Reviews.” Accessed August 20, 2017. https://ldavis.cpsievert.me/reviews/reviews.html.
November 14
Modeling everything
View
slides
Once a technique for statistical modeling has been developed, it can usually be applied to problems other than those for which it was initially developed. Thus topic modeling, initially developed for the unsupervised classification of text, is easily modified to classify other things like people and organizations.
For today, please read chapter 1 of Applications of Topic Models, “The What and Wherefore of Topic Models.” In addition, please read one of the following chapters: “Historical Documents,” “Understanding Scientific Publications,” “Fiction and Literature,” and “Computational Social Science”.
📖 To read before this meeting:
-
Boyd-Graber, Jordan, Yuening Hu, and David Mimno. “Applications of Topic Models.” Foundations and Trends in Information Retrieval 11, no. 2–3 (July 20, 2017): 143–296. https://doi.org/10.1561/1500000030.
November 16
Ranking, rating and recommending
View
slides
One of Maron’s motivations for developing statistical methods of information retrieval was the desire to provide ranked results. Ranking results involves not only matching documents to a query, but also ordering those documents from most “relevant” to least “relevant”.
Sixty years later, there are algorithmically-generated ranked lists for nearly everything. Today we’ll look at one example—university rankings—and discuss possible algorithms for another kind of ranking: your grades in this class.
📖 To read before this meeting:
-
Ramage, Daniel, Christopher D Manning, and Daniel A McFarland. “Which Universities Lead and Lag? Toward University Rankings Based on Scholarly Output.” In Proc. of NIPS Workshop on Computational Social Science and the Wisdom of the Crowds, 2010. https://people.cs.umass.edu/~wallach/workshops/nips2010css/papers/ramage.pdf.
November 21
Cancelled
November 23
Thanksgiving
November 28
Being ranked and rated
View
slides
The powerful techniques that information scientists developed for classifying and ranking texts are now being applied to every aspect of our lives. What effects is this having? Sociologist Wendy Espeland examines the effects of one very influential ranking system: the U.S. News & World Report college rankings.
📖 To read before this meeting:
-
Espeland, Wendy. “Reverse Engineering and Emotional Attachments as Mechanisms Mediating the Effects of Quantification.” Historical Social Research / Historische Sozialforschung 41, no. 2 (156) (2016): 280–304. https://doi.org/10.12759/hsr.41.2016.2.280-304.
November 30
Grading algorithm proposals
50% of your grade in this class will be based on my evaluation of your midterm and final papers. Today you will make proposals for how to determine the other 50% of your grade.
December 5
Looking back / looking ahead
Today your midterm papers are due. We’ll review the ground we covered this semester and look ahead to more advanced information science classes, and information science careers.
December 5
Final paper due
December 9
Final exam
The final exam is scheduled for 12 noon on Saturday, December 9. It will cover all the formal concepts from this course: information theory, semiotics, faceted classification, Boolean logic, entity-relationship modeling, and probabilistic modeling.