Organization of Information

UNC SILS, INLS 520, Fall 2013

August 20
Introduction

First, we get to know each other a bit. Then, the basics: how the the class meetings will be run, how you’ll be evaluated, expectations regarding readings and assignments, and so on. Finally, a brief and high-level overview of the topics that will be covered in the course, and how they are related.

August 22
The Organizing System

View slides Updated Thursday 11/21 4:59 PM

This course is an introduction to the conceptual foundations of information organization and retrieval: identifying things, describing things, grouping things, relating things, and selecting things. Traditionally these things have been textual documents in the narrow sense: books, periodicals, letters, administrative records, etc.—the kinds of things organized by libraries and archives. But the principles that underlie organization in libraries and archives can be generalized and applied to organize documents and information more broadly, in a variety of contexts. To emphasize what these contexts have in common, rather than how they differ, we will use the abstract notion of an organizing system.

An organizing system is an intentionally arranged collection of resources and the interactions they support. Explicitly or by default, an organizing system makes many interdependent decisions about the identities of things of interest and the ways they are represented as “information.” The organizing system defines how things will be named and described, how they can be grouped and related, and how people or software can create, transform, combine, compare and otherwise use these names, descriptions, groups and relations. When considering the how to make these decisions, we can ask five questions: What is being organized? Why it is being organized? How much is it being organized? When is it being organized? By whom (or by what computational processes) it is being organized?

📖 To read before this meeting:

  1. Glushko, Robert J. “1. Foundations for Organizing Systems.” In The Discipline of Organizing, edited by Robert J. Glushko, 3rd ed. O’Reilly, 2015.
    Reading tips

    Introduction to the concept of an organizing system and the five facets along which one can analyze organizing systems.

August 27
Analyzing Organizing Systems I

View slides Updated Thursday 11/21 4:59 PM

For today you will read about or examine directly six different organizing systems. As you do the readings below (each one is quite short) think about how you would locate each one in the “design space” introduced by Glushko in chapter one of TDO.

📖 To read before this meeting:

  1. Baron, Richard J., Elizabeth L. Fabens, Melissa Schiffman, and Erica Wolf. “Electronic Health Records: Just Around the Corner? Or over the Cliff?” Annals of Internal Medicine 143, no. 3 (August 2, 2005): 222–226. http://annals.org/article.aspx?volume=143&page=222.
  2. McGrath, Sean, and Fergal Murray. Principles of E-Government Architecture. Propylon, July 7, 2003. PDF.
  3. Manzini, Ezio, and Carlo Vezzoli. Product-Service Systems and Sustainability. United Nations Environment Program, 2002. http://www.unep.org/resourceefficiency/Portals/24147/scp/design/pdf/pss-imp-7.pdf.
  4. Miner, Edward A., and Cliff Missen. “‘Internet in a Box’: Augmenting Bandwidth with the eGranary Digital Library.” Africa Today 52, no. 2 (December 1, 2005): 21–37. http://www.jstor.org/stable/4187701.
  5. Qvenild, Marte. “Svalbard Global Seed Vault: A ‘Noah’s Ark’ for the World’s Seeds.” Development in Practice 18, no. 1 (February 1, 2008): 110–116. http://www.jstor.org/stable/27751880.
  6. Roy, Hugo, Michiel de Jong, Jan-Christoph Borchardt, and Unhosted. “ToS;DR.” Terms of Service; Didn’t Read, n.d. https://tosdr.org.

August 29
Analyzing Organizing Systems II

View slides Updated Thursday 11/21 4:59 PM

We will continue analyzing different kinds of organizing systems by situating them in a five-dimensional design space.

September 3
Activities in Organizing Systems I

View slides Updated Thursday 11/21 4:59 PM

📖 To read before this meeting:

  1. Glushko, Robert J., Erik Wilde, Jess Hemerly, Isabelle Sperano, and Robyn Perry. “2. Activities in Organizing Systems.” In The Discipline of Organizing, edited by Robert J. Glushko, 3rd ed. O’Reilly, 2015.
    Reading tips

    When we take an expansive view of organizing systems we can identify four activities that all organizing systems support or perform: selecting resources, organizing resources, designing resource-based interactions, and maintaining resources. These four activities are deeply ingrained in curricula and practice for organizing systems like libraries and museums, and they can be extended to other kinds of organizing systems employed by individuals, groups and enterprises in various domains.

  2. Glushko, Robert J. “10.1 Introduction & 10.2 The Organizing System Lifecycle.” In The Discipline of Organizing, edited by Robert J. Glushko, 3rd ed. O’Reilly, 2015.

September 5
Activities in Organizing Systems II

View slides Updated Thursday 11/21 4:59 PM

Note: You only need to read sections 3.1–3.4 of “Models of the Information Seeking Process,” and sections 5.3–5.4 of “Organization Systems.”

📖 To read before this meeting:

  1. Hearst, Marti. “Models of the Information Seeking Process.” In Search User Interfaces. Cambridge, UK: Cambridge University Press, 2009. http://searchuserinterfaces.com/book/sui_ch3_models_of_information_seeking.html.
  2. Morville, Peter, and Louis Rosenfeld. “Organization Systems.” In Information Architecture for the World Wide Web, 53–81. 3rd ed. Sebastopol, California: O’Reilly, 2006. http://proquestcombo.safaribooksonline.com/book/web-development/0596527349/basic-principles-of-information-architecture/i86131__chapterstart__chapter_5.
    Reading tips

    Broad overview of the ways organizing schemes and structures are deployed on Web sites.

  3. Doan, Anhai, Raghu Ramakrishnan, and Alon Y. Halevy. “Crowdsourcing Systems on the World-Wide Web.” Communications of the ACM 54, no. 4 (April 1, 2011): 86. http://cacm.acm.org/magazines/2011/4/106563-crowdsourcing-systems-on-the-world-wide-web/fulltext.
  4. Marshall, Catherine C. “Rethinking Personal Digital Archiving, Part 1.” D-Lib Magazine 14, no. 3/4 (2008). http://www.dlib.org/dlib/march08/marshall/03marshall-pt1.html.

September 10
Resources in Organizing Systems I

View slides Updated Thursday 11/21 4:59 PM

An organizing system reflects (or produces or enforces) a specific view of the world by defining what the resources being organized are. This involves making decisions about when things are to be considered the same or different, i.e. how they are to be identified. Decisions about identity and identification define the basic units of organization, and these decisions have consequences for every other aspect of the organizing system.

📖 To read before this meeting:

  1. Glushko, Robert J., Daniel D. Turner, Kimra McPherson, and Jess Hemerly. “3. Resources in Organizing Systems.” In The Discipline of Organizing, edited by Robert J Glushko, 3rd ed. O’Reilly, 2015.
    Reading tips

    An organizing system either explicitly creates, or assumes the existence of, a framework for identifying things.

  2. Coyle, Karen. “Identifiers: Unique, Persistent, Global.” The Journal of Academic Librarianship 32, no. 4 (2006): 428–431. http://kcoyle.net/jal-32-4.html.
  3. Berners-Lee, Tim. Cool URIs Don’t Change. W3C Style. W3C, 1998. http://www.w3.org/Provider/Style/URI.html.
  4. The Echo Nest. “Announcing Echoprint.” The Echo Nest Blog, June 23, 2011. http://blog.echonest.com/post/6824753703/announcing-echoprint.

September 12
Resources in Organizing Systems II

View slides Updated Thursday 11/21 4:59 PM

📖 To read before this meeting:

  1. Kent, William. “Entities.” In Data and Reality, v–19. Amsterdam: North-Holland, 1978. PDF.
    Reading tips

    Through its (explicit or implicit) framework of identity and identification, an organizing system defines a set of entities. These entities are a model, not of reality, but of how some people or organizations process information about reality.

  2. Brisbane, Arthur S. “On NYTimes.com, Now You See It, Now You Don’t.” The New York Times, June 25, 2011, sec. Opinion / Sunday Review. http://www.nytimes.com/2011/06/26/opinion/sunday/26pubed.html.
  3. Smith, Abby. “Authenticity in Perspective.” In Authenticity in a Digital Environment. Council on Library and Information Resources, 2000. http://www.clir.org/pubs/reports/pub92/smith.html.

September 17
eXtensible Markup Language

View slides Updated Thursday 11/21 4:59 PM

Please do the readings below, and spend some time familiarizing yourself with XML. In addition to the readings you may find the XML tutorial at W3Schools helpful.

You may already have some familiarity with XML, but perhaps mostly as a data format for applications or programming. In IO and IR it is essential to take a more abstract and intellectual view of XML and understand how it represents structured information models. XML encourages the separation of content from presentation, which is an important principle of information architecture. Encoding information in XML is an investment in information organization that pays off “downstream” in IR and language processing applications.

📖 To read before this meeting:

  1. Birnbaum, David J. “What is XML and why should humanists care? An even gentler introduction to XML”, January 5, 2012. http://dh.obdurodon.org/what-is-xml.xhtml.
  2. Glushko, Robert J. “XML Foundations.” In Document Engineering, 42-72. Cambridge, Massachusetts: MIT Press, 2005. http://people.ischool.berkeley.edu/~glushko/DocumentEngineeringBookDraft/DEBook/ch2_FINAL.pdf.

September 17
Scoping & Identifying Resources due

September 19
Resource Description and Metadata I

View slides Updated Thursday 11/21 4:59 PM

📖 To read before this meeting:

  1. Glushko, Robert J., Kimra McPherson, Ryan Greenberg, Matthew Mayernik, Graham Freeman, and Carl Lagoze. “4. Resource Description and Metadata.” In The Discipline of Organizing, edited by Robert J. Glushko, 3rd ed. O’Reilly, 2015.
    Reading tips

    What is the purpose of resource description? What resource properties should be described? How are resource descriptions created? What makes a good resource description?

  2. Wheatly, Malcolm. “Operation Clean Data.” CIO, September 10, 2005. http://www.cio.com.au/article/166533/operation_clean_data/.
  3. Whitman, Brian. “Why Music ID Resolution Matters to Every Music Fan on Facebook.” Variogr.am, 2011. http://notes.variogr.am/post/10733372290/music-resolving-facebook.

September 24
Resource Description and Metadata II

View slides Updated Thursday 11/21 4:59 PM

📖 To read before this meeting:

  1. Kent, William. “The Nature of an Information System.” In Data and Reality, 21–40. Amsterdam: North-Holland, 1978. PDF.
  2. Kent, William. “Naming.” In Data and Reality, 41–61. Amsterdam: North-Holland, 1978. PDF.

September 26
Describing Multimedia Resources

View slides Updated Thursday 11/21 4:59 PM

📖 To read before this meeting:

  1. Harpring, Patricia. “The Language of Images: Enhancing Access to Images by Applying Metadata Schemas and Structured Vocabularies.” In Introduction to Art Image Access: Issues, Tools, Standards, and Strategies, edited by Murtha Baca. Los Angeles: Getty Publications, 2002. http://www.getty.edu/research/publications/electronic_publications/intro_aia/harpring.pdf.
    Reading tips

    How metadata schemas and controlled vocabularies are used to describe, catalogue, and index works of art and architecture, and images of them.

  2. Bailer, Werner, Susanne Boll, Oscar Celma, Michael Hausenblas, and Yves Raimond. “Use Case Scenarios.” In Multimedia Semantics, edited by Raphael Troncy, Benoit Huet, and Simon Schenk, 7–19. West Sussex: Wiley, 2011. PDF.
  3. Corthaut, Nik, Sten Govaerts, Katrien Verbert, and Erik Duval. “Connecting the Dots: Music Metadata Generation, Schemas and Applications.” In Proceedings of the 9th International Conference on Music Information Retrieval. Philadelphia, 2008. http://ismir2008.ismir.net/papers/ISMIR2008_213.pdf.

October 1
Relationships and Structures I

View slides Updated Thursday 11/21 4:59 PM

Organizing systems do not simply describe and enable interactions with resources in isolation: they provide frameworks for relating resources to one another in useful ways.

📖 To read before this meeting:

  1. Glushko, Robert J., Matthew Mayernik, Alberto Pepe, and Murray Maloney. “5. Describing Relationships and Structures.” In The Discipline of Organizing, edited by Robert J. Glushko, 3rd ed. O’Reilly, 2015.
    Reading tips

    Defines “relationship” and introduces five perspectives for analyzing relationships among resources: semantic, lexical, structural, architectural, and implementation.

  2. Kent, William. “Relationships.” In Data and Reality, 63–76. Amsterdam: North-Holland, 1978. PDF.
  3. Fellbaum, Christiane. “WordNet.” In Theory and Applications of Ontology: Computer Applications, edited by Roberto Poli, Michael Healy, and Achilles Kameas, 231–243. Springer Netherlands, 2010. http://www.springerlink.com/content/n2516j53k5p26x76/abstract/.

October 1
Creating a Vocabulary & Descriptions due

October 3
Relationships and Structures II

View slides Updated Thursday 11/21 4:59 PM

📖 To read before this meeting:

  1. Pepper, Steve. The TAO of Topic Maps: Finding the Way in the Age of Infoglut, 2000. http://www.ontopia.net/topicmaps/materials/tao.html. PDF.
    Reading tips

    Topic maps are an ISO standard for describing knowledge structures and associating them with information resources. Topic maps are grounded in a basic model consisting of Topics, Associations, and Occurrences (TAO).

    The ontopia.net site may be down, so don’t overlook the alternative PDF link above.

  2. Ray, Kate. Web 3.0, 2010. http://vimeo.com/11529540.
    Reading tips

    A video about the Semantic Web.

  3. Heath, Tom, and Christian Bizer. “Introduction.” In Linked Data: Evolving the Web into a Global Data Space. Synthesis Lectures on the Semantic Web: Theory and Technology. Morgan & Claypool, 2011. http://linkeddatabook.com/editions/1.0/#htoc1.
  4. Weinberger, David. The Molecule of Data. Mp3. Library Lab, n.d. https://soundcloud.com/harvard/008-the-molecule-of-data.
    Reading tips

    In this podcast Karen Coyle explains why libraries are keen on the idea of using Linked Data to produce more value from their cataloging efforts.

October 8
Relationships and Structures III

View slides Updated Thursday 11/21 4:59 PM

Structure-based IR models combine representations of terms with information about structures within documents (i.e., hierarchical organization) and between documents (i.e. hypertext links and other explicit relationships). This structural information tells us what documents and parts of documents are most important and relevant, and provides additional justification for determining relevance and ordering a result set. The nature and pattern of links between documents has been studied for almost a century by “bibliometricians” who measured patterns of scientific citation to quantify the influence of specific documents or authors. The concepts and techniques of citation analysis seem applicable to the web since we can view it as a network of interlinked articles, and Google’s “page rank” algorithm is now the classic example. With the advent of “social media” there are now a wealth of new potential sources of structural metadata.

📖 To read before this meeting:

  1. Diaz, Alejandro M. “Through the Google Goggles: Sociopolitical Bias in Search Engine Design”. Stanford University, 2005. http://epl.scu.edu/~stsvalues/readings/Diaz_thesis_final.pdf#page=55.
    Reading tips

    The most famous and influential exploitation of “structural metadata” is PageRank, the secret sauce behind Google search (and now all other major search engines). While the idea behind PageRank is simple, its implications as a system for mediating access to information are not. Read only chapters 4 and 5.

  2. MacRoberts, M. H, and Barbara R MacRoberts. “Problems of Citation Analysis.” Scientometrics 36, no. 3 (July 1996): 435–444. http://www.springerlink.com/index/10.1007/BF02129604.
    Reading tips

    As this examination of citation analysis shows, interpretations can vary widely as to what “links” in a given structure mean.

  3. Mislove, Alan, Krishna P. Gummadi, and Peter Druschel. “Exploiting Social Networks for Internet Search.” In Record of the Fifth Workshop on Hot Topics in Networks: HotNets V. Irvine, CA: ACM SIGCOMM, 2006. http://www.read.cs.ucla.edu/hotnets5/mislove06exploiting.pdf.

October 10
Midterm Review

October 15
Midterm

The midterm will be given during regular class time. It will be distributed as a Word document, so you’ll need to bring a laptop to work on it. It is open-book, open-notes.

October 15
Midterm Exam due

October 17
Fall break

October 22
Categories: Describing Resource Classes and Types

View slides Updated Thursday 11/21 4:59 PM

Total amount of required reading for this meeting: 5,000 words

We impose meaning on the world by “carving it up” into concepts and categories. The conceptual and category boundaries we impose treat some things or instances as equivalent and others as different. Sometimes we do this implicitly and sometimes we do it explicitly. We do this as members of a culture and language community, as individuals, and as members of organizations or institutions. Across these different contexts the mechanisms and outcomes of our categorization efforts differ. In most cases the resulting categories are messier than our information systems and applications can handle, and understanding why and what to do about it are essential skills for information professionals.

📖 To read before this meeting:

  1. Glushko, Robert J., Rachelle Annechino, Jess Hemerly, and Longhao Wang. “6. Categorization: Describing Resource Classes and Types.” In The Discipline of Organizing, edited by Robert J. Glushko, 3rd ed. O’Reilly, 2015.
    Reading tips

    What categories are, how they are used in information management, and how changes in the understanding of human cognitive processes have altered theories of categorization over the years.

  2. Glushko, Robert J, Paul P Maglio, Teenie Matlock, and Lawrence W Barsalou. “Categorization in the Wild.” Trends in Cognitive Sciences 12, no. 4 (April 2008): 129–35. http://dx.doi.org/10.1016/j.tics.2008.01.007.
    5,000 words

October 24
Classification I: Assigning Resources to Categories

View slides Updated Thursday 11/21 4:59 PM

A classification is a system of categories, ordered according to a pre-determined set of principles and used to organize a set of instances or entities. This doesn’t mean that the principles are always good or equitable or robust: every classification is biased in one way or another. Classifications are embodied in every information-intensive activity or application.

📖 To read before this meeting:

  1. Glushko, Robert J., Jess Hemerly, Vivien Petras, Michael Manoochehri, Longhao Wang, Jordan Shedlock, and Daniel Griffin. “7. Classification: Assigning Resources to Categories.” In The Discipline of Organizing, 3rd ed. O’Reilly, 2015.
    Reading tips

    The terms “classification” and “categorization””are often used interchangeably, but they are not the same. Having a set of categories is not sufficient to create a classification. A classification must be principled so that we know where to place new items and entities in accordance with our system.

  2. Kent, William. “Attributes.” In Data and Reality, 77–84. Amsterdam: North-Holland, 1978. PDF.
  3. Kent, William. “Types and Categories and Sets.” In Data and Reality, 85–91. Amsterdam: North-Holland, 1978. PDF.

October 29
Classification II: Classification Structures

View slides Updated Thursday 11/21 4:59 PM

📖 To read before this meeting:

  1. Lambe, Patrick. “Taxonomies can take many forms.” In Organising Knowledge: Taxonomies, Knowledge and Organisational Effectiveness, 4-48. Oxford: Chandos, 2007. PDF.
  2. Hearst, Marti. “UIs for Faceted Navigation: Recent Advances and Remaining Open Problems.” In Proceedings of the Workshop on Computer Interaction and Information Retrieval (HCIR 2008). Redmond, Washington, 2008. http://flamenco.berkeley.edu/papers/hcir08.pdf.
  3. Golder, Scott, and Bernardo A. Huberman. “The Structure of Collaborative Tagging Systems.” arXiv:cs/0508082 (August 18, 2005). http://arxiv.org/abs/cs/0508082.

October 31
Interactions with Organizing Systems I

View slides Updated Thursday 11/21 4:59 PM

Read sections 9.1 to 9.3 of Chapter 9, “Interactions with Resources” for today.

📖 To read before this meeting:

  1. Petras, Vivien, Robert J Glushko, Karen Joy Nomorosa, J. J. M Ekaterin, Hyunwoo Park, Sean Marimpietri, Ian MacFarland, Robyn Perry, and Sean Marimpietri. “9. Interactions with Resources.” In The Discipline of Organizing, 3rd ed. O’Reilly, 2015.
  2. Williams, Ashley. “User-centered Design, Activity-centered Design, and Goal-directed Design: a Review of Three Methods for Designing Web Applications.” In Proceedings of the 27th ACM International Conference on Design of Communication, 1–8. SIGDOC  ’09. New York, NY, USA: ACM, 2009. http://doi.acm.org/10.1145/1621995.1621997.
  3. Norman, Donald A. “Logic Versus Usage: The Case for Activity-centered Design.” Interactions 13, no. 6 (November 2006): 45–ff. http://doi.acm.org/10.1145/1167948.1167978.

November 5
Ryan at ASIS&T Annual Meeting

November 5
Building a Taxonomy due

November 7
Interactions with Organizing Systems II

View slides Updated Thursday 11/21 4:59 PM

Read sections 9.4 to 9.6 of Chapter 9, “Interactions with Resources” for today.

📖 To read before this meeting:

  1. Petras, Vivien, Robert J Glushko, Karen Joy Nomorosa, J. J. M Ekaterin, Hyunwoo Park, Sean Marimpietri, Ian MacFarland, Robyn Perry, and Sean Marimpietri. “9. Interactions with Resources.” In The Discipline of Organizing, 3rd ed. O’Reilly, 2015.
  2. Trant, Jennifer. “Emerging Convergence? Thoughts on Museums, Archives, Libraries, and Professional Training.” Museum Management and Curatorship 24, no. 4 (2009): 369–387. http://www.archimuse.com/papers/trantConvergence0908-final.pdf.
  3. Brunnermeier, Smita B., and Sheila A. Martin. “Interoperability Costs in the US Automotive Supply Chain.” Supply Chain Management: An International Journal 7, no. 2 (May 1, 2002): 71–82. http://www.emeraldinsight.com/journals.htm?articleid=858244&show=abstract.

November 12
Standards for Organizing I

View slides Updated Thursday 11/21 4:59 PM

Until now we’ve focused on developing a conceptual understanding of how to define and describe entities and types of entities when organizing information. However to progress further we must familiarize ourselves with some of the various (and constantly evolving) methods and standards for formally expressing these concepts in machine-readable ways, and for guiding information organization processes to ensure consistency and interoperability. Today we’ll look at two kinds of standards: standardized syntaxes for data interchange and standardized conceptual or structural models.

In addition to the required reading below (“The Forms of Resource Descriptions”), here are some additional resources you may find useful:

Standard syntaxes for data interchange

Syntax governs the arrangement of symbols to create properly formed (but not necessarily meaningful) messages.

The dominant syntax standard for encoding data so that it can be exchanged among different organization systems is the eXtensible Markup Language (XML). Review the XML Foundations reading from 8/31, and the XML tutorials at ZVON and W3Schools if you’ve forgotten what you learned about XML.

An increasingly popular alternative syntax standard is JavaScript Object Notation (JSON). Read JSON: The Fat-Free Alternative to XML.

Standard conceptual or structural models

Conceptual or structural models aim to standardize the way information is conceptualized. They can range from very abstract to very specific. Unlike syntax standards, they do not specify how symbols are arranged but instead specify basic concepts and how they are related to one another. However, conceptual or structural models often specify how their concepts should be represented in one or more syntaxes.

As we discussed in class two weeks ago, The Resource Description Framework (RDF) is the conceptual model at the foundation of the Semantic Web. It is a very abstract conceptual model because it aims to standardize concepts suitable for modeling any kind of data. Watch Jenn Riley’s RDF for Librarians presentation for a more detailed explanation of RDF.

A higher-level yet still rather abstract conceptual model is the Functional Requirements for Bibliographic Records (FRBR). Read What is FRBR?

The Atom Syndication Format is a model for describing the structure of blog feeds, or any kind of data that can be expressed as a list of time-stamped items. Atom is an example of a structural model that is relatively tightly tied to a specific syntax (XML).

Google recently released the Dataset Publishing Language (DSPL), a new conceptual model for describing quantitative datasets such as demographic statistics. Skim through the DSPL Tutorial.

Finally there are conceptual or structural models for relatively concrete, well-understood kinds of things such as contact information, calendar events, postal addresses, and recipes. Recently the three major search engines agreed on a set of conceptual models for these types of information and published them at schema.org. Skim the schema.org documentation and take a look at the model for structuring recipes.

📖 To read before this meeting:

  1. Shaw, Ryan, and Murray Maloney. “8. The Forms of Resource Descriptions.” In The Discipline of Organizing, edited by Robert J. Glushko, 3rd ed. O’Reilly, 2015.

November 14
Standards for Organizing II

View slides Updated Thursday 11/21 4:59 PM

Today we’ll look at two more kinds of standards: standardized values or names and standardized processes. We’ll wrap up by considering how technical standards and transformation techniques can help achieve integration and interoperability, acknowledging that interoperability is not always possible and that non-technical factors play a huge role in determining the approach. In addition to the required reading below (“Why Standardization Efforts Fail”), here are some additional resources you may find useful:

Standard values or names: Controlled vocabularies & thesauri

Conceptual or structural models usually define the kinds of attributes that entities have, but may not specify the actual values that those attributes can take. This is the role of value standards, which are usually lists or hierarchies of names or identifiers that can be used as values for certain kinds of attributes.

A very simple example of a value standard is ISO 3166-1, which standardizes 2 and 3-letter codes for identifying countries.

More complex value standards resemble (or are) classifications, with faceted and/or hierarchical structure. Browse through the Art & Architecture Thesaurus, the AGROVOC agricultural vocabulary, and the Medical Subject Headings (MeSH).

Standard processes: Rules & best practices

Finally, rules or best practices seek to standardize the processes by which people organize information. Among other things, they may specify when and how the other kinds of standards should be used to describe and organize particular kinds of information.

Although not an official standard, the database guidelines at Discogs are a good example of what rules for cataloging look like. Read the Quick Start Guide and skim through some of the other database guidelines such as Genres/Styles and Master Release.

An example of a more official standard is Graphic Materials: Rules for Describing Original Items and Historical Collections, which provides rules for describing photographs, posters, cartoons, prints and drawings. Skim through the standard to get a sense of the variety of aspects of the description process that it attempts to standardize.

November 19
Standards Development & Governance

View slides Updated Thursday 11/21 4:59 PM

Today we’ll consider the vocabulary problem as it manifests itself across organizational contexts. Within an organization, different information systems might use data models that are incomplete or incompatible with respect to each other, and between organizations these differences can be even greater. Structural, syntactic, and semantic mismatches cause problems when processes and services attempt to span these system and organizational boundaries (for example, to create a complete model of a “customer” or to conduct a business transaction). We’ll consider how technical standards and transformation techniques can help achieve integration and interoperability, but we’ll acknowledge that interoperability is not always possible and that non-technical factors play a huge role in determining the approach.

📖 To read before this meeting:

  1. Cargill, Carl F. “Why Standardization Efforts Fail.” Journal of Electronic Publishing 14 (2011). http://dx.doi.org/10.3998/3336451.0014.103.
    Reading tips

    The ostensible failure of a standard has to be examined not so much from the focus of whether the standard or specification was written or even implemented (the usual metric), but rather from the viewpoint of whether the participants achieved their goals from their participation in the standardization process.

  2. Mazzocchi, Stefano. “Interoperability by Friction.” Stefano’s Linotype, 2008. http://web.archive.org/web/20080521183013/http://www.betaversion.org/~stefano/linotype/news/143/.
    Reading tips

    Stable standards are dead standards.

November 21
Guest Speaker: Jane Greenberg

Professor Jane Greenberg will give an overview of cataloging, and will discuss how she is experimenting with “crowdsourcing of semantics” in her SeaIce project.

For today, read Metadictionary: Advocating for a Community-driven Metadata Vocabulary Application.

📖 To read before this meeting:

  1. Greenberg, Jane, Angela Murillo, Rob Guralnick, Greg Janee, John Kunze, Nassib Nassar, Christopher Patton, Sarah Callaghan, and Karthik Ram. “Metadictionary: Advocating for a Community-driven Metadata Vocabulary Application.” In Proc. Int’l Conf. on Dublin Core and Metadata Applications. Lisbon, 2013. http://aeshin.org/files/courses/readings/paperSeaIceCAMP-4-DATA.pdf.

November 26
Computational Approaches to Categorization & Classification

View slides Updated Thursday 11/21 4:59 PM

For today, read sections 6.5 (“Implementing Categories”) and 7.6 (“Computational Classification”) of TDO, and chapters 3 (“Learning”) and 4 (“Types of Machine Learning”) of A First Encounter with Machine Learning.

📖 To read before this meeting:

  1. Glushko, Robert J., Rachelle Annechino, Jess Hemerly, and Longhao Wang. “6.5 Implementing Categories.” In The Discipline of Organizing, edited by Robert J. Glushko, 3rd ed. O’Reilly, 2015.
  2. Glushko, Robert J., Jess Hemerly, Vivien Petras, Michael Manoochehri, Longhao Wang, Jordan Shedlock, and Daniel Griffin. “7.6 Computational Classification.” In The Discipline of Organizing, 3rd ed. O’Reilly, 2015.
  3. Welling, Max. “Learning.” In A First Encounter with Machine Learning, 11–16, 2011. https://www.ics.uci.edu/~welling/teaching/ICS273Afall11/IntroMLBook.pdf.
  4. Welling, Max. “Types of Machine Learning.” In A First Encounter with Machine Learning, 17–20, 2011. https://www.ics.uci.edu/~welling/teaching/ICS273Afall11/IntroMLBook.pdf.

November 26
Classifying with Facets due

November 28
Thanksgiving

December 3
Final review & wrap-up

December 10
Final exam due

The final exam is due 24 hours after you download it, or at 11AM sharp on Tuesday, December 10, whichever comes first. Note that when you download the exam, the download time is recorded, so it is easy to check whether you have kept the exam longer than 24 hours. So don’t download it until you are ready to start working on it!

December 10
Final Exam due