Faceted Classification Complex subjects from simpler components Outline • Prequel: Arrangement of concepts within hierarchies. • Goals of faceted classification. • Basic design of faceted classifications. • Facet analysis of complex subjects (factoring). • Determination of facet structure. • Faceted browsing on the Web. Arrangement within hierarchies Two forms: • Showing multiple principles of division that relate children to their parent node (subfacets). • Ordering via appropriate principles at each array (Ranganathan’s canons and such). Showing principles of division (subfacets) shoes high heels hiking boots mary-janes pumps running shoes sandals slingbacks stilettos wedges winter boots shoes (by season) winter spring (by function) hiking running (by style) boots pumps sandals (by feature) slingbacks mary-janes (by heel type) stilettos wedges (by heel height) high heels Ordering concepts at each level An “array” is a group of siblings (descriptors at the same level of hierarchy). These should be ordered, even if you don’t need subfacets. Possible orders: • General to specific. • Chronological. • Close to far away. • Order of a process. Example: music tempos Allegro Andante Largo Moderato Presto Vivace Largo Andante Moderato Allegro Vivace Presto (alphabetical) (slowest to fastest) Motivations for faceted classification • The sheer number of documents keeps growing. • The subjects of the documents are both more specific and more complex. • Knowledge itself is rapidly expanding—new subjects are constantly being created. It’s not helpful to put huge numbers of documents in general subject categories (British History, Nuclear Physics). And yet we can’t possibly enumerate all the possible subjects that either currently exist or may soon exist. What to do? Goals of faceted classification If we can create a classification scheme that lists subject components, then we can build complex subjects out of the components as necessary. We facilitate the construction of complex subjects by organizing the component concepts that make up our classification into facets, or potential aspects of the subject. From compound to components Example of complex subject: The history of Japanese tea-drinking etiquette Components (or isolates, or factors): History + Japan + Tea + Drinking + Etiquette Potential fundamental categories (facets) for the components: Disciplines (history); Locations (Japan); Beverages (Tea); Activities (Drinking); Values (Etiquette) Building subjects from components A traditional faceted classification for libraries includes both the facet structure of components and syntax rules for combining the components into complex subjects. These rules are necessary to ensure that documents are filed consistently on shelves. (In an online environment, these rules become superfluous.) To “mechanize” the subject-building process and simplify filing, components are given a notation (such as “soil acidity – sag” that clarifies the component’s position within a facet. Structure of faceted classifications While a facet may be a simple list, components within a facet are typically arranged hierarchically (using a stricter or looser sense of hierarchy as appropriate). Organic farming classification Crops Fruits (by origin) Vines Grapes Bushes Trees Vegetables Herbs Processes Materials Planting Natural soil amendments Controlling pests Fertilizing Compost Mulch Natural pesticides Designing faceted classifications 1. Decompose complex concepts (which you have gathered via your research into the subject literature) into component parts, via syntactic or semantic factoring. 2. Group the simple components into fundamental categories. 3. Organize the components in each facet (with hierarchical relationships, subfacets that indicate multiple principles of division, order within arrays, and so on). Understanding complex concepts There are two kinds of compounds: • A multi-word unit (which may be a simple concept, such as stained glass, or a complex concept, such as glass cutting). • A multi-concept unit (which may be a single word, such as sourdough). Syntactic and semantic factoring Syntactic factoring: A term with multiple words is divided into smaller components. Example: rye bread into rye + bread; Irish emigration into emigration + Irish Semantic factoring: A term is divided into multiple elementary concepts. Example: apartment into dwelling + rental + shared building. Semantic factoring Most standards/authorities don’t recommend semantic factoring, and there aren’t rules you can use to help with it. But semantic factoring can sometimes help you discover missing concepts in your subject language. It might be extreme to describe Passover as “holiday + Jewish + commemoration + Exodus,” but doing so might make us consider both religion and commemoration of events as aspects common to many holidays. Parsing compounds A compound term consists of a focus (the class of things or events) and a difference, which modifies the class and makes a subclass. Examples: • Car tires: Focus is tires, difference is cars. • Opera singing: Focus is singing, difference is opera. • Mushroom hunter: Focus is hunter, difference is mushroom. Action/patient factoring If the term contains an action (focus) modified by the recipient of the action (difference), factor. But if the term refers to a material (focus) as modified by an action (difference), don’t factor. Example: Hair dyeing: hair + dyeing Bronze engraving: bronze + engraving But don’t factor: dyed hair, engraved bronzes Part/whole factoring If the focus refers to a part or property, and the difference refers to the whole or the possessor of the part or property, factor. But if the focus is the whole, and the difference is the part or property, don’t factor. Examples: Soil acidity: soil + acidity Library shelves: libraries + shelves Don’t factor: acid soils, spare tires. Action/performer factoring If the term contains an intransitive action (focus) modified by the performer (difference), factor. If the performer (focus) is modified by its performance of an intransitive action (difference), don’t factor. Examples: Student meeting: students + meetings Lemur migration: lemurs + migrations But don’t factor: migratory birds Determination of facet structure Ranganathan started from the top down: describing fundamental categories (PMEST) for all subjects and organizing components into those universal facets. The Classification Research Group (CRG), as described by Vickery, advocates beginning from the bottom up: reviewing components and assigning preliminary fundamental categories based on the concept’s definition within the classification’s domain, then looking for commonalities in these preliminary choices. Facets are specific to each classification. Principles for creating facets Spiteri, 1998, synthesized the following facet design principles from Ranganathan and the CRG: • Differentiation. • Relevance. • Ascertainability. • Permanence. • Homeogeneity. • Mutual exclusivity. Differentiation principle When creating facets that split a group of entities, choose a principle of division that cleanly splits the group into component parts. For example, dividing people by gender creates two generally unambiguous categories. However, dividing socks according to color can cause problems when considering socks with multiple colors; color does not provide the same level of differentiation for socks as gender does for people. More facet design principles Relevance: Choose facets based on the purpose of the classification. A classification of gardening might divide terrain by sun exposure, but a classification of cycling might divide terrain by elevation. Ascertainability: When possible, choose facets that can be reliably measured. Permanence: When possible, choose facets that will not change over time. Final facet design principles Homogeneity: Each facet (or subfacet) should represent a single principle of division. For example, if we are classifying socks, we should not see colors and patterns in the same array. We would need to separate patterns and colors. Mutual Exclusivity: The contents of any two facets (or subfacets) should not overlap (that is, they should be mutually exclusive). If we are dividing shoes by heel height and by form, we should not find any mixing of values for either facet (for example, we should not see “high-heeled pumps” in the form facet, but merely “pumps”). Faceted browsing on the Web Hearst’s Flamenco is an interface to support browsing of faceted structures on the Web. The Hearst article that you read describes how users preferred the faceted browsing interface to a search engine when exploring the collection. (Note that the facets that Hearst used in the Flamenco system are semi-automatically generated and not, perhaps, the best that one might create...) Your continuing mission • Begin compiling a list of potential concepts for your classification. • Define an audience and purpose for your classification, and use this, as well as your subject knowledge, to more clearly define the scope of your classification, its boundaries and its central and peripheral areas. • Begin defining each concept’s meaning in the context of your classification. Assignment progress checks • No class on April 12; no office hours on April 11. • Extra office hours will be scheduled for April 15, 16, and 19. Sign up for a 15-minute slot (not required but recommended). • In these meetings, be prepared to tell me your subject, audience, and purpose in a few sentences, and explain how your classified structure represents your theory of the subject. • Bring assignment drafts to class on April 20 for peer feedback sessions.