The Organization of Information

Taylor and Joudrey (2012) concluded their book, The Organization of Information, by stating that there is much work to be done in both information organization and the development of retrieval systems. With the diffusion of information in today’s world, the effort to analyze, arrange, classify, and make readily available millions of resources is a task that requires sophisticated programming of bibliographic networks, as well as endless hours of critical and analytical work from trained catalogers or indexers. Taylor and Joudrey showed that, despite advances in technology, the human mind is still needed to interpret a myriad of information resources by providing subject analysis, controlled vocabulary, and classification schemes to the descriptive practice of knowledge management.

We have now witnessed almost two centuries of bibliographic control, with many of the foundational principles of cataloging and description still in use today. For example, collocation – the intellectual and proximal ordering of bibliographic materials – was an invention of nineteenth-century founders such as Anthony Panizzi, Charles Ammi Cutter, and Melvil Dewey. These individuals saw the importance of creating subject headings and classification rules, which libraries shortly adopted thereafter in the form of dictionary catalogues, indexes, thesauri, and subject lists. The goal of these systems was to classify the entirety of all knowledge. This all started with the Dewey Decimal Classification system, which had ten main discipline classes with 10,000 subdivisions in which books could be classified. This system was expanded by Cutter in the use of his Expanded Classification system, which included letters to represent subject classes. Cutter’s system ultimately found its way into the Library of Congress Classification system, rather to the chagrin of Dewey.

The development of computerized systems to aid in the structuring and retrieval of knowledge occurred in the late 1960s. Machine-readable Cataloging (MARC) was introduced in 1968. MARC formatting allowed computers to read and encode bibliographic records by utilizing a numeric coding system that corresponded to the areas of description in a written catalog record. These codes contained “variable fields” for areas of variable length (such as a book title or author name); “control fields” for numeric data points (call numbers, ISBNS, LCCN, etc.); and “fixed fields” for bibliographic data of a predetermined length and format, such as a three-letter language abbreviation.

Bibliographic networks were built to accommodate the MARC format. The first major network to emerge was the Ohio College Library Center, which morphed into the OCLC (Online Computer Library Center), still in use today. OCLC allows catalogers the ability to import bibliographic records from a shared network of libraries and information resource centers. Where importing occurs, this is referred to as copy cataloging. A cataloger will add an already-cataloged record to their system, engaging in authority work by ensuring their record was copied from a reliable source like the Library of Congress authority files. Almost all public and academic libraries use OCLC, and this system has streamlined the work of cataloging in technical service departments. But it is important to note that this technology is almost fifty years old now. There are nascent trends in the world of information science that go beyond the reach of time-honored bibliographic networks.

The classical arrangement of knowledge mentioned above was based on a narrow set of information resources; primarily books. But not all resources that users need to be able to search and retrieve are biblio-centric. For example, an information seeker may need to find an artifact. Knowledge artifacts are as varied as the name implies. They can include sound recordings, historical objects, websites, performance art pieces, even concepts. This last example of “concepts” perhaps best illustrates the point. Indeed, a knowledge artifact can be purely conceptual or abstract in nature. Yet, as an artifact, it still needs to be described and collocated for information retrieval. This is done though a “technical reading” of the artifact; a process of critical analysis whereby the cataloger or indexer attempts to define the aboutness of a work.

The process of defining aboutness, referred to as subject analysis by Taylor and Joudrey, is at the heart of information organization. Subject analysis is arguably the most important part of cataloging work, and it is certainly the trickiest. In order to determine the aboutness of a work, the cataloger must be able to accurately represent a knowledge artifact. But the artifact in question might possibly not contain any lexical content. In other words, it may be a nontextual information resource, and thus completely intangible intellectually without the creator’s original insight. Yet, as a cultural information resource, the knowledge artifact still has meaning, which requires it to be abstracted and indexed. How is this to be done? Well, there is still debate among LIS professionals regarding the best practices for subject analysis. The common practice is to isolate subject keywords in an aboutness statement. However, aboutness statements impose the cataloger’s perceptions onto a work, classifying the artifact in a hierarchical manner which may not be culturally precise. Herein lies the danger of subject analysis.

This creates a dilemma for classification of knowledge artifacts. For instance, in order to make an information resource readily retrievable, controlled vocabulary is required. Controlled vocabulary are specific terms which are used for describing all “like” resources. But, as we have seen, describing knowledge artifacts can be difficult. Indeed, sometimes during subject analysis, the cataloger can only describe the of-ness of an artifact (Taylor & Jourdrey, 309). As a general rule, controlled vocabulary makes it easier to find resources in an information system. But if an original cataloger incorrectly represents a knowledge artifact, any surrogate record for that artifact will invariably be misrepresented. Surrogate records can number into the hundreds of thousands. So if the goal of bibliographic networks is to create standardized subject headings in an interoperable system, then hundreds of thousands of inaccurate records could be created. Conversely, if controlled vocabulary is not used in the representation of a knowledge artifact, then that artifact will be made all but impossible to retrieve in an information system. This is the dilemma of subject analysis.

Another argument against classification schemes of the past is that they contain restrictive rules which hinder knowledge discovery. Knowledge discovery is the ability to make connections between wide-ranging subjects that otherwise would not be related in a traditional classification system. For example, we have entered an era where almost all data can be linked together in novel and entertaining ways. This is the basis for the Semantic Web. Internet users can link and categorize anything they want by creating tags or folksonomies that showcase niche interests and new subject matter. By analyzing the content of the semantic web, information scientists are working to harness these folksonomies to improve search engine functionality and retrieval tools. It is an exciting time, but it is also a daunting time. Intellectual mastery of the semantic web is necessary to preserve entrenched disciplines that contain thousands of years of knowledge.

In the future, newer forms of information systems will be tried and tested. These will include natural language processors and artificial intelligence systems. But bibliographic data will still be inputted by humans through the process or cataloging and resource description. This task may become easier for catalogers and indexers as information systems may improve on their ability to offer suggestions or provide prepopulated subject headings. But just the same, the work will continue. Taylor and Joudrey illustrated that knowledge management is not perfect. There are flaws and implicit biases in subject analysis. But where data integrity for abstract and philosophical content is concerned, human intervention is still required. Indeed, knowledge is still the province of human beings, not machines.

Intellectual Freedom

rubin_fullsize_rgbIn Chapter 9 of Foundations of Library and Information Science, Richard Rubin explores the concept of Intellectual Freedom (IF). IF has its roots in nineteenth-century political theory as well as the First Amendment of the U.S. Constitution. Indeed, the right to free speech invariably means having the freedom to think, believe, and form new ideas without outside impedance from the government or any institution under the law. Therefore, it would seem that IF in libraries is a foregone conclusion. But Rubin admits that this is “one of the most difficult aspects of library work” (491).

The four factors discussed by Rubin that tend to restrict access to IF are (1) personal values, (2) community values, (3) the desire to protect children from harm, and (4) the need to ensure survival of the library. Conversely, three factors that promote IF in libraries are (1) the need to educate future generations, (2) adhering to professional standards, and (3) protecting democratic rights. These last three are easy enough to abide by if an LIS professional assumes an impartial attitude toward various ethical situations that may arise within the library. But this is not easy to do if one is concerned with the physical and emotional safety of others, especially children.

Rubin noted that the level of formal education among library staff correlated with the lessening of censorship. Censorship is obviously more of a problem in public and elementary school libraries, where there are children who are not yet adults. The trickiest issue for a librarian in this instance is withdrawing their personal values and allowing minors to check out library materials that have morally questionable subject matter (violence, language, sexual content). Religious background usually is a big factor in what many deem appropriate.

Even in the adult world, however, there can be instances of hesitancy to allow some materials to circulate. For example, we just passed the 22nd anniversary of the Oklahoma City bombing. I read about a connection between William Powell’s The Anarchist Cookbook being used by Timothy McVeigh to perpetrate the attack. The cookbook was written by Powell as an experimental foray into chemistry, as he was a chemistry teacher, but it had deadly implications. With any material that has practical or idealogical advice that can be used for evil purposes, it is immensely difficult to reconcile the right to read against possible evil outcomes. This is why the Freedom of Information Act after 911 made it possible for the government to have access to patron library records.

These are tired examples. Still, I believe in the principle of intellectual freedom. Society is made better when it has unrestricted access to ideas, for the sake of recreation, enlightenment, and progress.

Open Access Publishing

In Authors and Open Access Publishing, Swan and Brown (2004) investigated the preferences of a large sample size of journal authors, and they attempted to determine these authors’ preferences and rationales in choosing either Open Access journals (OA) or traditional subscription-based journals (NOA) for publication. Swan and Brown sent out invitations to 3,000 authors who had published in OA journals, and 5,000 authors who had only published in NOA. Journal subjects were closely matched between each group to ensure common publishing-area interests among respondents.

I was surprised by the small percentage of respondents compared to how many invitations were actually sent out. Only 154 responses were received from the first group and 160 from the second. I imagine Swan and Brown had a predetermined number of authors they wanted to use in their study. But the invitations greatly exceeded the number of participants. It seemed like the majority of authors did not wish to partake in a study of either this magnitude or perhaps this level of contention (as the open access concept started out as a bit of a moral tickler). That was my initial reaction… that the majority of respondents probably declined because they wished to keep their publishing preferences anonymous.

Regarding the study itself, Swan and Brown found that OA authors valued their preference of publication because of the principle of free access, faster publication times, larger perceived readership, and larger perceived citations than NOA. For NOA authors, reasons included unfamiliarity with OA journals, fear of career-changing consequences, lack of grant money to be awarded through OA, and the belief that OA journals were inferior in terms of readership and prestige.

But justifications for using one platform over and above another were surprisingly similar between each group. Respondents on both sides of the aisle justified their reasons using essentially the same answers: “Ours has more citations;” “Ours has a larger readership;” “Ours has distinguished names,” etc. etc. This, I believe, was due to the likely proclivity of defending what was familiar and not alien to these authors. It definitely demonstrates bias on the part of the authors, but I think it is just as likely that this bias came from institutional resistance to OA. Indeed, I imagine that academics choose to publish in a specific journal because it is sponsored by or identified with given institutions of higher learning. Institutional resistance to OA, therefore, is a likely factor.

This study is the second one done by Swan and Brown on the subject of open access. Their previous study was conducted just two years before this one in 2002, and the majority of respondents in the earlier study were not even familiar with the concept of open access. This seems to indicate that the publishing industry had changed dramatically between those two short years. We are seeing a continuation of this change with the greater and more diverse number of options becoming available in the open access publishing market.

Of special note is the fact that both study groups thought that their preferred publication method carried with it more prestige. It seems, then, that those authors publishing in OA journals have done the legwork of investigating their journals and have come to learn that there are open access publishers who are just as respectable and endowed monetarily as some traditional journals. Therefore, the fear that OA is inferior seems to be rooted in simple ignorance of the options available. More than half of the NOA authors indicated that they could not find an OA journal to publish in. I think that for a lot them, they just went with what they knew or used since the start of their careers.

This study was lacking in one major area, though. The age groups of the respondents were not given. It would have been interesting to see if the OA authors were younger than the NOA authors. I have a sneaking suspicion that there is a Digital Native/Digital Immigrant dynamic at play here.

Transformations – Digital Libraries

rubin_fullsize_rgbIn Chapter 4 of Foundations of Library and Information Science, Richard Rubin explores the history of technology in libraries from microform to early bibliographic retrieval systems on through the development of the Internet, Web 2.0, and finally the emergence of digital libraries. This last Rubin neglects to really define. We are not given a concise definition of digital libraries. Instead, we are treated to explanations of the characteristics of a digital library, and mostly from the work of Karen Calhoun, author of Exploring Digital Libraries.

Calhoun defines digital libraries as basically the extension of physical library services into digital space. In other words, digital libraries are meant to be freely accessible like traditional libraries, as well as structured similarly in terms of bibliographic storage and retrieval. Furthermore, digital libraries – according to Calhoun and Rubin – should be interoperable, focused on community engagement, aware of intellectual property issues, and sustainable. Drilling down into these issues a bit more…

  • Interoperable. Interoperability refers to the ability to search the digital library’s collection on a variety of technological devices, as well as being able to integrate with other library systems.
  • Community engagement. This simply refers to the need to base the digital library around a specific user group, ensuring that the digital library’s collection is useful to its users, as well as intuitive and user-friendly. The digital library cannot be mystifying, especially since there may not be reference help via a chat function available during all operational hours. Chat reference may not be guaranteed for all digital libraries.
  • Intellectual property rights. Out of the four key elements identified by Calhoun, intellectual property issues are a bugbear for digital libraries. Indeed, the digital environment creates new challenges to the areas of licensing and use rights. Out of all the issues confronting digital libraries, this is liable to be the trickiest after the digital library is online and functional.
  • Sustainability. This refers to the ability to manage the digital library in much the same ways as an institutional library. For instance, things like management roles, budgeting, managing subscriptions, curating content, database maintenance (including hardware and software development, and webmastering), providing proper oversight in terms of rules and regulations for users, etc. These are all things that a digital library “staff” will have to address.

Rubin goes from the early online digital collection of images or images of artifacts to the born digital resources of today. This vague idea plays out across the field of emerging LIS. I am not quite sure why Rubin talks about early online collections of photos as a precursor to his discussion of digital libraries. I think we can easily distinguish between mere collections of something, like photos for example, and a “library of photos.” Rubin himself said that there were no standards in these early collections for searching and retrieving. There was a lot of entropy involved instead. A library collection, on the other hand, is a collection that is ordered, described, and made easily accessible when searched.

I think Rubin was closer to hitting the mark for a concise definition of digital libraries in his previous chapter; Chapter 3 on libraries as institutions. At the end of that chapter, Rubin talked about embedded librarians. Indeed, I am wondering if a digital library can even be considered a “library” unless it has an embedded library staff available during operational hours. I know we have been seeing a trend toward self-sufficiency when it comes to users and library services, but if there is not an embedded librarian present in a digital library to assist users, we are looking at more of a third-party service rather than an institutional model. At which point, even referring to a digital library as a library is questionable in my opinion.

It is difficult to determine what Rubin thinks of these transformations, and in particular, of digital libraries. He writes in such a straightforward style that the facts are presented to us with little opinion or bias. A good thing. However, this chapter ends with more questions than solutions, and the lining feels quite cautionary. Indeed, it seems that the concern with digital libraries revolves around the fear of data volatility and the ever-changing nature of digital technology. Are digital libraries a viable model for the long-term preservation of a collection? Will they last hundreds (maybe thousands) of years like their traditional counterparts? Or will digital libraries not even make it halfway to the 22nd century? Digital obsolescence remains a frightful possibility, even after all the advancements in storage and computer back-up technology.

The Organization of Knowledge: Then and Now

There are two traditional classification schemes for the organization of information. These we know fairly well. They are the Dewey Decimal System and Library of Congress Classification System (LCCS). They are still used where the physical organization of library materials are concerned. But these systems and the logic they are based on have become problematized in our Internet age.

It is the growth of the Internet, and the ever-increasing diversity of electronic resources that have spurred the need for change in organizing knowledge resources. Traditional catalogs, like the LCCS, relied on Library of Congress Subject Headings (LCSH) that were confined to subject disciplines. LCSH were relatively static, and they greatly restricted the number of access points that subject searches would yield. Thus, LCSH does not facilitate greater resource or knowledge discovery, which our twenty-first century explosion of information demands.

The work of Information Science professionals has turned to seeing a need to base the creation of bibliographic records on an entity relationship model. By grouping all like-resources together in terms of a derivative concept or title, the process of resource discovery can be greatly enhanced.

New organizational methods and standards put in place by the Functional Requirements for Bibliographic Records (FRBR) and Resource Description and Access (RDA) are meant to streamline the process of resource discovery and make the bibliographic universe more easily navigable. The new languages designed by these frameworks embody a larger network of resources, not just traditional analog materials like books. By cataloging the “work, expression, manifestation, or item,” bibliographic subjects take on a whole new meaning and gain interrelationships with other format-specific materials.

MobyDick

FRBR and RDA is quite ingenious. For example, databases that have these four types of entities cataloged can yield search results that cross institutional boundaries. Indeed, if a user is looking for something related to, let us say, “Moby Dick,” they will encounter an entire family of works containing that title or subject when further narrowing results. This could lead the user to not only the original print work from Herman Melville, but to other related items in the library like audiobooks, motion pictures, etc. Or perhaps other resources will be found in digital libraries, consortia, museums, or archives, such as first editions, manuscripts, artworks, plays, and other ephemera relating to Moby Dick, the historical enterprise of “whaling,” illegal whaling activity in contemporary society, or the biology of these ocean majesties.

Electronic retrieval systems have made it possible to retrieve bibliographic records from anywhere around the globe. As today’s information resources are shared, and indeed, born on the World Wide Web, the work of today’s catalogers and subject analysts is of a global scale. This is why it is so important to create and maintain systems like Resource Description Framework (RDF) and XML applications that can aggregate and logically order web resources. This is also why linked data and metadata are so important. These technologies illustrate how far the library profession has come in cataloging and making available bibliographic resources.