Classification and categorization: a difference that makes a difference
Library Trends, Wntr, 2004 by Elin K. Jacob
ABSTRACT
EXAMINATION OF THE SYSTEMIC PROPERTIES AND FORMS of interaction that characterize classification and categorization reveals fundamental syntactic differences between the structure of classification systems and the structure of categorization systems. These distinctions lead to meaningful differences in the contexts within which information can be apprehended and influence the semantic information available to the individual. Structural and semantic differences between classification and categorization are differences that make a difference in the information environment by influencing the functional activities of an information system and by contributing to its constitution as an information environment.
INTRODUCTION
Many different and sometimes conflicting responses can be made to the question "What is information?" Floridi (in press) identifies three broad categories intended to elucidate the predominant approaches to understanding the ambiguous phenomenon called information: information as reality (or ecological information), information for reality (or instructional information), and information about reality (or semantic information). The approach adopted here is that information is "differences that make a difference" (Bateson, 1979, p. 99). It is an emergent property--the result of meaningful differences--inherently semantic and therefore about reality.
Analysis of the syntactic differences that distinguish systems of classification from systems of categorization can contribute to a philosophy of information (PI) because these distinctions portend significant consequences for the processes that contribute to what Floridi (2002) describes as the "dynamics of information": "(i) the constitution and modelling of information environments, including their systemic properties, forms of interaction, internal developments etc.; (ii) information life cycles, i.e., the series of various stages in form and functional activity through which information can pass ... and (iii) computation, both in the Turing-machine sense of algorithmic processing and in the wider sense of information processing" (p. 15. emphasis in original). Examination of the systemic properties and forms of interaction that characterize classification and categorization reveals fundamental differences in their respective organizational structures--differences that influence the functional activities of an information system and contribute to its constitution as an information environment.
The argument elaborated here is that fundamental syntactic distinctions exist between the structure of classification systems and the structure of categorization systems; that these distinctions lead to meaningful differences in the contexts within which information can be apprehended; and that these differences, in turn, influence the semantic information--the information about reality--that is available to the individual.
INFORMATION SYSTEMS
Shera (1960/1965) has observed that retrieval must be the focus of a theory of library and information science (LIS) and thus "the end toward which all our efforts are directed" (p. 136). Unfortunately, retrieval is too often viewed not as one component in an information system but as a self-contained and independent process. This emphasis on the end product--the retrieval of resources--tends to obscure the fact that effective retrieval depends on both the representation and the organization of a collection of information resources.
Soergel (1985) points out that, because information is used for problem-solving, information systems are developed and extended in response to the problems that confront society. Although this definition of information is not universally accepted, it is useful in understanding the complex set of processes that contribute to the ultimate effectiveness of an information system. Such a system identifies information resources that may be of use in addressing a particular problem; represents the attributes of resources that are relevant to the problem area; organizes these resource representations or the resources themselves for efficient access; and ultimately retrieves a set of resources in response to queries presented to the system by the individual. It would appear, then, that a more productive approach to the problem of retrieval would be to view an information system as a multidimensional whole comprised of several interrelated processes, including, at a minimum, collection development, representation, organization, and retrieval.
Retrieval is the final and therefore the most obvious of the processes that contribute to an information system. Because it is the only process in which an individual actively participates, it is frequently the only process to which she gives serious consideration. When the individual is seeking information on a particular topic, her attention is focused on the set of resources retrieved by the information system. If these resources appear to be pertinent to the immediate problem, she may not give a second thought to the appropriateness of the terms used to query the information system. Nonetheless, it is the processes of selection, representation, and organization that provide the foundation without which information retrieval (IR) is less than effective, if not impossible. How resources are represented constrains the organizational structure(s) that can be imposed on a collection of information resources; the organizational structure of the collection dictates the search strategies that can be used for retrieval; and the representations themselves determine the set of resources that will be retrieved by the system.