Organization of Information

UNC SILS, INLS 520, Spring 2011

Organization of Information in the News

Due January 31.

Issues of information organization pervade our daily lives. Once you learn to notice them you will see them everywhere. For this assignment you will:

  1. Find a recent (published since January 2009) news item (article, video, whatever) that illustrates how an issue related to information organization impacts our lives as individuals, as members or employees of organizations, as citizens, and so on.

  2. Using the class blog, write a brief blog entry about your story, summarizing it and highlighting how it relates to issues of information organization. Write at least two paragraphs and no more than four that contextualize the story—that is, why is this story related to the organization of information? Also, see if you can answer the five questions about organizing systems in Chapter 1 of Intellectual Foundations for Information Organization and Information Retrieval.

Note: This assignment will not be graded as an individual assignment; it will be included in your participation grade.

Creating A Vocabulary & Descriptions

Due February 14.

Assignment Overview

In this assignment, you will:

  1. Create a vocabulary for describing a restaurant dining experience.
  2. Describe a dining experience using your vocabulary.
  3. Swap vocabularies with a classmate.
  4. Describe a dining experience using your classmate’s vocabulary.
  5. Reflect on your experience creating and using vocabularies.

Deadline

You must submit your work by uploading your zip archive of assignment files before 12:30PM on Monday, February 14, 2011. Late assignments will not be accepted unless you have an exceptionally good excuse. Even though you have two weeks for this assignment, you’ll want to start early, as you’ll need to swap vocabularies with a classmate in order to successfully complete all parts of the assignment.

Submission Requirements

You will submit a total of 3 text files (zipped).

The first file should be named report.txt. It should have the following sections:

  1. Scoping the Vocabulary
  2. Defining the Terms
  3. Reflection

The second file will be a sample dining “event” described using your vocabulary. Name this file mine.txt.

The third file will be a sample dining “event” described using your classmate’s vocabulary. Name this file swapped.txt.

Detailed Instructions

Your task is to develop a vocabulary for describing the experience of dining in a restaurant. Creating a vocabulary involves scoping the problem, making decisions about key trade-offs, choosing meaningful names and descriptions, and being aware of any biases you might be bringing to the vocabulary. This assignment will take you through that process.

Part 1: Scoping the Vocabulary

There are many different types of restaurants and many different types of dining experiences. The first thing you must do is settle on a scope for the type of experience you intend to model. Are you looking at five-star restaurants or fast food? Or do you want to try to model a universal dining experience?

In your report.txt document, under the section title “Scoping the Vocabulary,” write one or two sentences that define the scope of the vocabulary you intend to design. For example, if you were developing a vocabulary for describing course syllabi, you might define its scope by saying, “The Syllabus Vocabulary is a vocabulary for describing a syllabus for a university course. It will describe the topics, readings, assignments, and exams or other important events and the dates associated with each.”

Be sure to carefully consider not only what falls within your scope but also what is outside of it. The Syllabus Vocabulary might want to include some notion of date or time—but making the vocabulary fully compatible with Google Calendar or another calendar format would fall outside the scope.

Part 2: Defining the Terms

Identify and define the terms your vocabulary requires to describe events within your scope. For example, some possible terms in the Syllabus Vocabulary might be Topic, Reading, Date, Instructor, etc.

For each term, write a one or two sentence definition. These definitions are your instructions for people using your vocabulary to encode an instance—as your classmate will do later in this assignment—so strive for precision.

Write these terms in the “Defining the Terms” section of your report.txt document.

Remember the tradeoffs we’ve discussed so far in the semester. They’ll help you think through some key issues, such as:

  • How many “levels” of terms do you want or need? Everything can be at the same hierarchical level, or you may have some terms that are “containers” for others. There’s no right answer; just think about the implications.
  • How many terms do you need to cover everything in your scope?
  • What are the benefits/drawbacks of a simpler vocabulary? A more complex one?
  • Are you being consistent with the levels of abstraction and granularity of your terms?
  • There’s no required upper or lower boundary on the number of terms that will be in your vocabulary. That said, in the past, we’ve found that a useful vocabulary can be developed with somewhere between 10 and 20 terms.
Part 3: Using Your Vocabulary

You might have developed your vocabulary by thinking of a specific restaurant experience or set of experiences. This step will make that connection explicit, testing your vocabulary by having you create a description of a particular dining event.

Create a description of your dining event using your vocabulary terms in the file mine.txt. Make sure the event falls within the scope you defined in step 1.

As you do this step, you may find you want to change some of your terms, make things less (or more) granular, remove (or add) levels of hierarchy, or alter your vocabulary in other ways. That’s OK! You should revisit steps 1-3 and revise your scope, terms, definitions, and description as needed. Make some notes on any changes you make – you’ll want them for your reflection later.

Part 4: Swapping Vocabularies

Now comes the fun part: You’ll be swapping vocabularies with a classmate in your section. You should receive an email via the class mailing list that tells you who you’ll be swapping with.

For this part of the assignment, you’ll be sharing parts 1 and 2 of your report.txt. DO NOT share the description you created using your vocabulary, just the scope statement and the definition of terms.

Once you have your classmate’s vocabulary, attempt to create a description using only the scope and definitions they provide. Call your instance swapped.txt. Again, make sure your instance is well-formed. Send this file back to your classmate.

Part 5: Reflecting on the vocabulary modeling experience

Write a few paragraphs reflecting on your vocabulary modeling experience in your report.txt document under the section “Reflection.” This doesn’t need to be longer than a few hundred words, but it does need to include, at minimum, the following elements:

  1. Your thoughts on the challenges of creating your own vocabulary, what it was like to scope and create your terms, and what changes—if any—you needed to make after you tried to encode your own instance.

  2. What it was like to try to create a description using your classmate’s vocabulary. Did any terms confuse you? Was anything particularly clear? What did looking at someone else’s terms and definitions teach you about creating a vocabulary?

  3. What was it like to see the description your classmate created using your vocabulary? Did it match the expectations you had for your vocabulary? And what does that tell you about the success of your scope, terms, and definitions?

Note: The “No Busy Work” Principle

Assignments are meant to challenge you intellectually in some way, not to see how much “busy work” you can put up with. So I’ll never intentionally ask you to do something that takes time but that doesn’t give you more insights. Put another way, if you find yourself doing busy work in an assignment, re-read the instructions to see if they advise against doing what you’re doing. If that doesn’t end the busy work, ask me if you’re doing what is expected.

In this assignment there are ample opportunities to do “busy work,” so be careful not to do it. For example, a “Syllabus Vocabulary” would probably have some notion of “Topic,” and an description of a syllabus might have many of them. In part 3 of this assignment your task is to demonstrate how your vocabulary works by creating a description. If you were creating a syllabus description it would be sufficient to include just two or three topics, not 29.

Submit this assignment.

Classifying

Due March 14.

Assignment Overview

In this assignment you will:

  1. Familiarize yourself with an online tool for facet creation.
  2. Design a faceted classification for a provided set of instances.
  3. Adjust your classification given additional instances.
  4. Build and upload your faceted classification through the online tool.
  5. Reflect on your experience designing and iterating your classification.

Deadline

You must submit your work by uploading your facet map file to the FacetMap website (see below) and uploading your zip archive of assignment files to the course website before 12:30PM on Monday, March 14th, 2011. Late assignments will not be accepted unless you have an exceptionally good excuse.

Submission Requirements

You will submit a zip archive containing a copy of your facet map file and a report text file, as detailed in the assignment instructions below. The files should be named facetmap.txt and report.txt respectively. You will also need to make sure that your facet map file has successfully uploaded to the FacetMap webite, and that it is browsable online.

Part 1. Getting Acquainted with FacetMap

Get acquainted with the FacetMap website at http://facetmap.com. In particular, look at the wine demo to see how the three facets of Varietals, Region, and Price combine to organize hundreds of wine instances. The demo will start up in the “Commentary Track” that explains what is going on as you specify facet values to select wines in the collection. Notice that there are three different user interface styles for the display of the facets and the items that are selected by different facet values. Think about the pros and cons of each style of display.

Part 2. Design A Faceted Classification

Once you have familiarized yourself with FacetMap, it’s time to start working on your classification. Review pages 12–16 in Chapter 6 of IFIOR, noting Ranganathan’s dimensions, the set of general criteria for facet design, and the principles guiding facet ordering. You will use these to design your own faceted classification to organize this set of things. You’ll notice that every instance is an animal or some representation of an animal. Remember that you are classifying the animal instance shown in the picture, NOT the type or class to which the animal belongs. You can see picture titles by holding your mouse over the picture. The titles provide important clues regarding how to classify your instance. “Mosquito” should not be in the same class as “Painting of a Mosquito,” nor are we looking for taxonomic classes (reptiles, insects, etc.).

You can have as many facets as you want, but you must have at least four. You will need to be creative but make sure that anybody else could understand them without your help. They can be abstract or practical or a mix of both, so long as they classify the instances in a way that other people could understand. Also make sure that your facets are flexible enough to handle additions to the list of instances.

Part 3. Testing Your Facets with New Instances

A good set of facets should be able to accommodate new instances without adjustment. Please do not look at these sets until you’ve built the classification for the original 10 instances. Choose one of these five groups of new instances (5 new instances in each one):

Revise your facets if necessary so that your system can classify all 15 (10 original plus 5 extra) instances.

Part 4. Encoding Your Work for FacetMap

Once you have a system that handles all your instances, encode it and all your instances using the text file format shown on FacetMap.

FacetMap’s Limitations

You may run into quirks, bugs and limitations while using FacetMap. Have patience, make note of any compromises or design changes you had to make because of limitations of FacetMap or facets in general and send email to the class list if you are having difficulties that other students might encounter.

FacetMap enforces a strict “occurrence exclusivity” principle in assigning values in a facet. This means that if (for example) a library facet map had a “subject” facet that included both a “War” and a “Peace” heading, your copy of Tolstoy’s renowned work about both war and peace could not be listed under both. Keep this in mind when designing your facets and facet values. This principle is what distinguishes faceted classification from “tagging” systems where any tag can be assigned to any information resource or object regardless of other tags already associated with it.

Part 5. Submitting Your Work

Upload your facet map file to facetmap.com. Name your facet map sils2011_username, where username is the username you use to log in to the course website. (For example, my facet map would be named sils2011_ryanshaw.)

After you’ve uploaded, verify that your facet map is browsable at http://facetmap.com/browse/sils2011_username (again, where username is whatever your actual username is). Note that subsequent uploads will overwrite previous ones with the same name, so you can fix any problems you might notice after uploading.

Your uploaded facet map it will be available for 7 days, but after that it may expire, so be sure to keep a copy of what you submitted. (And make sure you don’t upload your facet map file too early, as I won’t be looking at them until after the assignment is due.)

In addition to uploading your facet map file to the FacetMap website, you’ll need to upload it to the course website. Upload a zip archive including your facet map file (name it facetmap.txt) and a short report (name it report.txt). The report should have two sections, and each should be a paragraph or two:

  1. The first section should describe your use of the FacetMap program, especially any compromises you made in your design because of perceived limitations.

  2. The second section should answer the following questions:

    • Were there sequence effects? That is, did you find that you had to work through the facets in a particular order to get to a classification you felt comfortable with?
    • How did the context of the image affect your placement within a facet?
    • Did you use Ranganathan’s PMEST in conceiving your facets? Was it useful?
    • What were your biggest challenges in designing the classification?
    • How well did your initial facet design handle the additional instances? What changes did you need to make to accommodate the new instances?

Submit this assignment.

Building a Taxonomy

Due March 28.

Assignment Overview

In this assignment you will:

  1. Define types for 15 animal instances.
  2. Sort those classes into a hierarchy of animal types.
  3. Create a diagram of your taxonomy.
  4. Write definitions for each part of your taxonomy using hypernyms and hyponyms.
  5. Reflect on your experience.

Deadline

You must submit your work by uploading your zip archive of assignment files to the course website before 12:30PM on Monday, March 28th, 2011. Late assignments will not be accepted unless you have an exceptionally good excuse.

Submission Requirements

You will submit a zip archive containing two text files, one named reflection.txt and one named urls.txt. reflection.txt should contain your reflection on the assignment (see part 5). urls.txt should contain the published URLs for your Google Docs spreadsheet and drawing (see parts 1-4). To get your published URL for each document, select Publish as a web page from the Share menu in the upper right-hand corner of the Google Docs interface. For your spreadsheet, make sure you choose to publish All sheets. Copy and paste your two published URLs into the urls.txt file.

Detailed Instructions

In this assignment, you’ll be returning to your “ark” from assignment #3: the set of 10 original animal depictions plus whichever bonus set you selected. This time, you’ll be developing a hierarchical classification scheme.

The goal of this assignment is to give you more practice thinking about categories and category membership, abstraction, classification, and taxonomy. You’ll also learn a technique for naming and describing a system of categories so that you can clearly convey their meaning to others.

Part 1. Identify your types

Round up your animals by returning to your 15 instances from assignment #3. Create a Google Docs spreadsheet by making a copy of this template. (You can do this by selecting Make a copy ... under the File menu.) In the first sheet (“Instances”) of your spreadsheet, create a list of your animal depiction instances. (Hint: Remember that each of the instances came with a name attached for assignment #3. You’ll probably want to continue using these names.)

In this part of the assignment, you’re going to start generalizing away from the specific instance you were given. For each animal instance, identify a type to which that instance belongs. For example, if you were classifying musical instruments, and you’d been given a picture of a drum set, you might pick something like “rhythm instrument” as the type. Or you might choose something more granular or more abstract than that. Remember, as always, you’re making a choice about the level of abstraction you use.

One thing you don’t want to do here is make your types so specific that they can’t describe anything but the instance you were given. (Making your type “yaks” for an instance called “yak” seems awfully convenient, but it’s a little too easy.) You should be able to think of some common features held by all members of your type, as well as some other instances that would fit into the class.

As you’re making your first pass through the instances, do not stress out too much about naming these types. You’re likely to go back to them and revise them as you progress through the assignment. If it’s starting to make you feel crazy, my advice is to come up with something temporary and move on; new ideas might pop up once you’ve started to arrange your hierarchy.

Part 2. Organize your types into a hierarchy

Now that you’ve taken a crack at identifying types for each of your animals, begin arranging them into a hierarchy. The top or root element of your hierarchy will be “animals.” The bottom level of your hierarchy will be your instances. When you created your types in Part 1, you added a second level to the hierarchy, more abstract than your instances but less abstract than “animals.” What you’re doing now is adding one more level of abstraction, a new level between your types and “animals.” These are hypernyms or “super-types.”

Think of this as a sorting task. (Sometimes it’s even helpful to write your type names down on pieces of paper or sticky notes and physically sort them.) As you sort, you may discover that some of your original types are too narrow. You may also realize that they’re too broad and don’t leave you enough room to insert another level before getting to “animal.” That’s OK! Revise your types as many times as you need to and record them in your spreadsheet.

At this phase of the assignment, it’s important that you strive for a consistent level of abstraction among your “super-types.” Again, try to think of some common characteristics that would be shared by all members of that super-type. For example, if we had “musical instruments” as our root element and our next level down included both “clarinets” and “stringed instruments,” that might be a sign that the classification wasn’t maintaining a consistent level of abstraction.

When you’re satisfied with your assignment of types to super-types, record them in the second sheet (“Types”) of your spreadsheet.

Part 3. Create a diagram of your hierarchy

Create a Google Docs drawing showing the structure of your classification hierarchy. This does not have to be fancy. Start with “animals” at the top or root of your diagram, then your “super-types,” then your types. You don’t need to include your instances.

Part 4. Define your types and “super-types”

Now, you’re going to write definitions for your types such that an ordinary person would be able to categorize new instances. You’ll be following this formula for definitions:

Hyponym = { adjective } hypernym { distinguishing clause }

For example, suppose you’re classifying instruments. “Instruments” is your root element.

Your first instance was a bass clarinet, and you assigned it to a type called “clarinets.” Then, as you created your taxonomy, you sorted clarinets, saxophones, and flutes together into a “super-type” called “woodwinds.”

Your definitions might then look something like this:

  • clarinets = { reeded } woodwinds { that are approximately cylindrical in shape and have numerous keys }
  • woodwinds = { reed or flute } instruments { that produce sound when air is blown into them}

Remember that your definitions should reflect things that are true for all members of a type. A good sanity check at this stage is to make sure you can think of a hypothetical second instance for each type.

Record each definition in your spreadsheet. Put your type definitions in the second sheet (“Types”) and your super-type definitions in the third sheet (“Super-types”).

Part 5. Reflect on your experience

In a text file called reflection.txt, write a paragraph or two about the approaches you used to identify types and organize them into “super-types.” Be sure to include your name at the top of the file.

Some questions to guide your reflection:

  • What was your thought process like?
  • What specific tips from the readings or lectures did you draw on?
  • Were there any “outliers” that you had to work especially hard to fit in?
  • Were you able to keep your “super-types” to a consistent level of abstraction, and how did you do so?
  • Was this harder or easier for you than the faceted classification of assignment #3, and why?

Submit this assignment.

Computationally Representing Text

Due April 18.

Assignment Overview

In this assignment, you will:

  • Learn how to use a toolkit for text analysis
  • Use the toolkit to analyze various texts related to the topics of the class
  • Reflect on your experiences

Deadline

You must submit your work by creating a new assignment submission page before 12:30PM on Monday, April 18. Late assignments will not be accepted unless you have an exceptionally good excuse.

Submission Requirements

You will submit a (zipped) text file called report.txt. The file will include short answers (50-100 words) to the 8 reflections of this assignment.

Detailed Instructions

You will use some text tools to analyze a collection of documents and you will be asked to answer some questions. Please include short answers (50-100 words) to every question marked as “Reflection” in your assignment submission.

Part 1: Voyeur

Voyeur is a web-based text analysis tool designed to work on text collections.

Note: there are two versions of Voyeur, which can be found at the following URLs:

They are mostly the same, but unfortunately each has different bugs that affect this assignment. Please pay close attention to the instructions below regarding which version to use for which part of the assignment.

  1. Go to http://voyeur.hermeneuti.ca/
  2. In the Add Texts section you can put the links to the documents you want to analyze. Paste the following links (one per row). You will probably recognize them.

    • http://people.ischool.berkeley.edu/~glushko/IFIOIR/Chapter2-20100908.pdf
    • http://people.ischool.berkeley.edu/~glushko/IFIOIR/Chapter3-20100909.pdf
    • http://people.ischool.berkeley.edu/~glushko/IFIOIR/Chapter4-20100917.pdf
    • http://people.ischool.berkeley.edu/~glushko/IFIOIR/Chapter5-20100917.pdf
    • http://people.ischool.berkeley.edu/~glushko/IFIOIR/Chapter6-20100917.pdf
    • Click on reveal.
    • On the right, you should see a Summary tab with with a list of statistics. At the bottom of this list, look at Distinctive words (compared to the rest of the corpus). Take note of the five most distinctive words for each document. Why do you think they are distinctive? What do they tell you about the document? Which of them could be used as the “big concepts” of every chapter? Which of them look more accidental? What is the benefit of seeing the term frequency compared to other documents in the collection? (Reflection 1)
    • Go to http://voyeurtools.org/, add the same texts, and click reveal.
    • You will see different Distinctive words using this version of the tool. Ignore these; that feature appears to be broken in this version of the tool.
    • On the left, you should see a list of all the words found in the texts, in descending order by frequency. Using the checkboxes, select the words classification and descriptions. (Re-sorting the words alphabetically by clicking on the Word column header may help you find them in the list.) New panels will appear at the bottom and on the far right. In the lower left corner you should see a graph based on the term frequency in every document. What does this graph tell you? (Reflection 2)
    • Now select the words people and information. The graph will change. What does this graph tell you? (Reflection 3)
    • Voyeur does not remove stop-words. What would be the effect of stop-word removal? (Reflection 4)
    • Voyeur does not do stemming of words. What is the effect of this? (Reflection 5)
Part 2: Tagfight

Tagfight compares tagging by humans to the tags automatically extracted by computational algorithms.

  1. Go to http://tagfight.appspot.com/
  2. In the URL box you can put the link to the web page you want to analyze. Paste the following link and click on Go:

    • http://www.shirky.com/writings/ontology_overrated.html
  3. If the graph doesn’t work with that URL, do the assignment with the following URL:

    • http://www.theatlantic.com/magazine/archive/1945/07/as-we-may-think/3881/
  4. You will see a graph showing the number of users who have used a given tag to describe the URL. You can hover over any dot to see detailed information about the tag.

  5. The graph also shows the raw count of the terms in the document. Do you see some overlap between the most frequent words in the text and the tags used by delicious users? What types of tags can be inferred by the specific content of the text and what types cannot? (Reflection 6)
  6. The second graph shows the distribution of tags if the tags had been run through a stemmer. What is the effect of stemming in the long tail/the shape of the curve? What kind of benefits/problems does it introduce? (Reflection 7)
  7. On the right column (Top Machine Tags) you will see some entities that were recognized by entity extraction services (these services use dictionaries and NLP techniques to identify named entities like persons, places and organizations in the document). Do you see overlap between the extracted entities and the tags used by humans? What types of tags can be inferred by these services and what types cannot? (Reflection 8)

Have fun with the assignment and make sure you answer the 8 reflections in your assignment submission. We hope you’ll find these tools useful to play around with other text collections that you want to analyze.

Submit this assignment.

Final Exam

Due April 29.

Download the final exam. When you are finished, zip it up, and submit it using the link below.

Submit this assignment.