The Organization of Information

Taylor and Joudrey (2012) concluded their book, The Organization of Information, by stating that there is much work to be done in both information organization and the development of retrieval systems. With the diffusion of information in today’s world, the effort to analyze, arrange, classify, and make readily available millions of resources is a task that requires sophisticated programming of bibliographic networks, as well as endless hours of critical and analytical work from trained catalogers or indexers. Taylor and Joudrey showed that, despite advances in technology, the human mind is still needed to interpret a myriad of information resources by providing subject analysis, controlled vocabulary, and classification schemes to the descriptive practice of knowledge management.

We have now witnessed almost two centuries of bibliographic control, with many of the foundational principles of cataloging and description still in use today. For example, collocation – the intellectual and proximal ordering of bibliographic materials – was an invention of nineteenth-century founders such as Anthony Panizzi, Charles Ammi Cutter, and Melvil Dewey. These individuals saw the importance of creating subject headings and classification rules, which libraries shortly adopted thereafter in the form of dictionary catalogues, indexes, thesauri, and subject lists. The goal of these systems was to classify the entirety of all knowledge. This all started with the Dewey Decimal Classification system, which had ten main discipline classes with 10,000 subdivisions in which books could be classified. This system was expanded by Cutter in the use of his Expanded Classification system, which included letters to represent subject classes. Cutter’s system ultimately found its way into the Library of Congress Classification system, rather to the chagrin of Dewey.

The development of computerized systems to aid in the structuring and retrieval of knowledge occurred in the late 1960s. Machine-readable Cataloging (MARC) was introduced in 1968. MARC formatting allowed computers to read and encode bibliographic records by utilizing a numeric coding system that corresponded to the areas of description in a written catalog record. These codes contained “variable fields” for areas of variable length (such as a book title or author name); “control fields” for numeric data points (call numbers, ISBNS, LCCN, etc.); and “fixed fields” for bibliographic data of a predetermined length and format, such as a three-letter language abbreviation.

Bibliographic networks were built to accommodate the MARC format. The first major network to emerge was the Ohio College Library Center, which morphed into the OCLC (Online Computer Library Center), still in use today. OCLC allows catalogers the ability to import bibliographic records from a shared network of libraries and information resource centers. Where importing occurs, this is referred to as copy cataloging. A cataloger will add an already-cataloged record to their system, engaging in authority work by ensuring their record was copied from a reliable source like the Library of Congress authority files. Almost all public and academic libraries use OCLC, and this system has streamlined the work of cataloging in technical service departments. But it is important to note that this technology is almost fifty years old now. There are nascent trends in the world of information science that go beyond the reach of time-honored bibliographic networks.

The classical arrangement of knowledge mentioned above was based on a narrow set of information resources; primarily books. But not all resources that users need to be able to search and retrieve are biblio-centric. For example, an information seeker may need to find an artifact. Knowledge artifacts are as varied as the name implies. They can include sound recordings, historical objects, websites, performance art pieces, even concepts. This last example of “concepts” perhaps best illustrates the point. Indeed, a knowledge artifact can be purely conceptual or abstract in nature. Yet, as an artifact, it still needs to be described and collocated for information retrieval. This is done though a “technical reading” of the artifact; a process of critical analysis whereby the cataloger or indexer attempts to define the aboutness of a work.

The process of defining aboutness, referred to as subject analysis by Taylor and Joudrey, is at the heart of information organization. Subject analysis is arguably the most important part of cataloging work, and it is certainly the trickiest. In order to determine the aboutness of a work, the cataloger must be able to accurately represent a knowledge artifact. But the artifact in question might possibly not contain any lexical content. In other words, it may be a nontextual information resource, and thus completely intangible intellectually without the creator’s original insight. Yet, as a cultural information resource, the knowledge artifact still has meaning, which requires it to be abstracted and indexed. How is this to be done? Well, there is still debate among LIS professionals regarding the best practices for subject analysis. The common practice is to isolate subject keywords in an aboutness statement. However, aboutness statements impose the cataloger’s perceptions onto a work, classifying the artifact in a hierarchical manner which may not be culturally precise. Herein lies the danger of subject analysis.

This creates a dilemma for classification of knowledge artifacts. For instance, in order to make an information resource readily retrievable, controlled vocabulary is required. Controlled vocabulary are specific terms which are used for describing all “like” resources. But, as we have seen, describing knowledge artifacts can be difficult. Indeed, sometimes during subject analysis, the cataloger can only describe the of-ness of an artifact (Taylor & Jourdrey, 309). As a general rule, controlled vocabulary makes it easier to find resources in an information system. But if an original cataloger incorrectly represents a knowledge artifact, any surrogate record for that artifact will invariably be misrepresented. Surrogate records can number into the hundreds of thousands. So if the goal of bibliographic networks is to create standardized subject headings in an interoperable system, then hundreds of thousands of inaccurate records could be created. Conversely, if controlled vocabulary is not used in the representation of a knowledge artifact, then that artifact will be made all but impossible to retrieve in an information system. This is the dilemma of subject analysis.

Another argument against classification schemes of the past is that they contain restrictive rules which hinder knowledge discovery. Knowledge discovery is the ability to make connections between wide-ranging subjects that otherwise would not be related in a traditional classification system. For example, we have entered an era where almost all data can be linked together in novel and entertaining ways. This is the basis for the Semantic Web. Internet users can link and categorize anything they want by creating tags or folksonomies that showcase niche interests and new subject matter. By analyzing the content of the semantic web, information scientists are working to harness these folksonomies to improve search engine functionality and retrieval tools. It is an exciting time, but it is also a daunting time. Intellectual mastery of the semantic web is necessary to preserve entrenched disciplines that contain thousands of years of knowledge.

In the future, newer forms of information systems will be tried and tested. These will include natural language processors and artificial intelligence systems. But bibliographic data will still be inputted by humans through the process or cataloging and resource description. This task may become easier for catalogers and indexers as information systems may improve on their ability to offer suggestions or provide prepopulated subject headings. But just the same, the work will continue. Taylor and Joudrey illustrated that knowledge management is not perfect. There are flaws and implicit biases in subject analysis. But where data integrity for abstract and philosophical content is concerned, human intervention is still required. Indeed, knowledge is still the province of human beings, not machines.

Net Neutrality

rubin_fullsize_rgbIn Chapter 8 of Foundations of Library and Information Science, Richard Rubin discusses information policy issues. One of the more prescient issues discussed is that of Internet (Network) Neutrality. Net neutrality is basically the extension of the Intellectual Freedom argument to the realm of the Internet. It is the open design principle of the Internet, and it encourages freedom of speech, as well as unhindered access to information and entertainment. Technically speaking, net neutrality means that Internet Service Providers (ISPs) cannot discriminate or control the speed of Internet traffic. In other words, all Internet companies or domains across the Web should be able to enjoy unhindered access to their web content. We are not talking explicitly about blocking sites or filtering, though these are extreme possibilities. We are discussing the unjust regulation of the Internet in the form of slowing down or accelerating “data” packets” to websites.

Network neutrality is an important issue for librarians because of the many digital services the library provides. In addition to the services which Rubin mentioned at the beginning of this chapter, the library offers free Internet to its patrons. In fact, free access to the Internet through a library network has become the most popular feature of the contemporary library. If ISPs controlled access to the Internet, then the free flow of ideas would be inhibited if not outright restricted to all citizens. This would have a negative impact on the free exchange of ideas, public discourse, and creative expression.

Because this issue so obviously impacts libraries, the American Library Association issued a statement on network neutrality in 2006 called the Resolution Affirming Network Neutrality. This was an attempt to safeguard the rights of library users in searching for information online, ensuring that they get free and equitable access. Importantly, the statement also called for protection from information monopolized by commercial interests.

Commercialization is a scary prospect. A noncommercial network is crucial to upholding the democratic values of the Web. But the commercial or “tiered access” model puts small companies and public institutions, such as libraries, at risk of having to deal with reduced Internet speeds or worse.

Tim Wu, Professor of Law at Columbia University, is credited with coining the term net neutrality. Wu believes many popular content providers and Web applications would not exist without net neutrality. (Can you imagine a world without Netflix? I can’t). The Obama Administration did well to safeguard democratic access to the Web, but again we are seeing businesses trying to merge and monopolize informational content. Wu talked about the possibility of a large-scale media collapse, similar to the financial crisis of 2008. In this scenario, Internet companies that go bust will disappear, and big corporate entities like AT&T or Google will approve content for everybody. The worse-case scenario is that the Internet might turn into something akin to a vacation package, in which consumers pay an exorbitant fee to access information. These are scary possibilities which should compel individuals to campaign for network neutrality.

The notorious case of self-censorship in the Fiske Report

Self-censorship in libraries occurred in the 1950s due to the fear of being “blacklisted,” an outcome produced by the McCarthy era and the House Un-American Activities Committee in their maniacal efforts to root out soviet conspiratorial activity. On the heels of this shameful period in American history, libraries transitioned from a period of careful, “patriotic” book selection to the more enlightened practice of collecting materials on diffuse and even controversial subject matters.

The “Fiske Report,” conducted by Marjorie Fiske between 1956 and 1958, is a 1500-page study which focused on book selection and censorship practices in California libraries. The damning conclusion of that report was that librarians censored themselves, often shamelessly and habitually. Curiously, however, the Intellectual Freedom Committee of the California Library Association (CLA IFC) had already secured victory over McCarthyism, opposing blacklisting and loyalty oaths. Yet, self-censorship was still a reality in library selection processes. For the CLA IFC, the question became: Why is censorship continuing unabated after the pressure of McCarthyism has subsided? Indeed, after these late victories, the CLA IFC attempted to unravel the mystery. There was a search on for a new and unmasked “enemy,” as it were, of intellectual freedom who threatened the newly minted freedom-model of California libraries (if not libraries across the nation). This was the basis of the Fiske Report.

In outlining the goals of the report, intellectual freedom was established as a “sacred” principle that librarians were exhorted to uphold going forward in defiance of what happened during the McCarthy period. Why, then, did this professional call to action not permeate the institutional practice of librarianship?

Until recently, the research data from the Fiske Report went unquestioned. But Latham (2014) points out that there were many problems in Fiske’s original research strategy, and in Fiske’s assumptions about the leverage librarians had to affect real change. Latham has reinterpreted this data using a feminist approach. This makes perfect sense considering the nature of the research data. For instance, the entire report was predicated on female service-oriented librarians in the 1950s when females were considered “timid” and “mousy.” Indeed, “women’s work” was still prevalent; a concept that goes back to the Cult of Domesticity and beyond.

The original report consisted of interview material with California public and school librarians. The gender ratio of the respondents was very unbalanced, with 87% of those interviewed being female librarians. Curiously, interview respondents occupying higher, more “elite” positions, like school administrators, were predominately male: 47 out of 48 individuals, in fact. (Latham 58) Therefore, what we have here in the Fiske Report is not just a random gender ratio imbalance, but – given the social context of the day – a deeply gendered and sexist politics. This structuralism of 1950s librarianship went overlooked, and this is what Latham addresses, informed by the evidence of earlier studies from Serebnick (1979) and Stivers (2002).

Gender norms of the 1950s suggested that men had more authority than women in matters of social importance (religion, morality, politics, etc.). This is reflected in the statement made by Max Lerner, speaker at the UC Berkeley School of Librarianship symposium cited in the article. Lerner said, “Having only petticoats among teachers and, perhaps, among librarians, too is not entirely healthy…” (66) This unfair and glaringly sexist statement reflects the consciousness of the day, which the symposium was rife with. Indeed, the pre-eminent sociologist Talcott Parsons concluded from the Fiske Report that self-censorship was still an issue in libraries because female librarians could not handle the intellectual rigors of reestablishing the authority of the library’s intellectual freedom.

If librarian autonomy and the role of the board or school administrators was compared in the report, why did the reality of the situation escape Fiske, a woman herself? Admittedly, she was a bit of a high-brow. But her attitudes toward women should have been gentler than that of Lerner or Parsons. Moreover, bias should have been tempered by her research support, as Katherine G. Thayer joined the research team to provide a perspective on librarianship. Thayer was the head of the library school at UC Berkeley, and she surely would have had a more intimate understanding of the field at the time. But the sad reality is that the librarians who participated in the study derived little support from the male administrative hierarchy when it came to figuring out the best practices for reversing restrictive collection policies.

Finally, as was already mentioned, Fiske’s research was deeply flawed. Latham writes, “None of [Fiske’s] interviews were taped, and notes were handwritten. When interviewees objected to handwritten notes, the interviewer used memory to reconstruct the data after leaving the interview (64). This is a big red flag. One does not simply – and certainly does not ethically – “fill in the blanks” when doing ethnographic research. When in doubt, clarification from the respondents should have been sought and attained with careful attention to the re-recording of participant perspectives. Therefore, the report suffers from a short-sightedness in both the integrity of data and in a deeper understanding of the cultural milieu of a male-dominated society.

Open Access Publishing

In Authors and Open Access Publishing, Swan and Brown (2004) investigated the preferences of a large sample size of journal authors, and they attempted to determine these authors’ preferences and rationales in choosing either Open Access journals (OA) or traditional subscription-based journals (NOA) for publication. Swan and Brown sent out invitations to 3,000 authors who had published in OA journals, and 5,000 authors who had only published in NOA. Journal subjects were closely matched between each group to ensure common publishing-area interests among respondents.

I was surprised by the small percentage of respondents compared to how many invitations were actually sent out. Only 154 responses were received from the first group and 160 from the second. I imagine Swan and Brown had a predetermined number of authors they wanted to use in their study. But the invitations greatly exceeded the number of participants. It seemed like the majority of authors did not wish to partake in a study of either this magnitude or perhaps this level of contention (as the open access concept started out as a bit of a moral tickler). That was my initial reaction… that the majority of respondents probably declined because they wished to keep their publishing preferences anonymous.

Regarding the study itself, Swan and Brown found that OA authors valued their preference of publication because of the principle of free access, faster publication times, larger perceived readership, and larger perceived citations than NOA. For NOA authors, reasons included unfamiliarity with OA journals, fear of career-changing consequences, lack of grant money to be awarded through OA, and the belief that OA journals were inferior in terms of readership and prestige.

But justifications for using one platform over and above another were surprisingly similar between each group. Respondents on both sides of the aisle justified their reasons using essentially the same answers: “Ours has more citations;” “Ours has a larger readership;” “Ours has distinguished names,” etc. etc. This, I believe, was due to the likely proclivity of defending what was familiar and not alien to these authors. It definitely demonstrates bias on the part of the authors, but I think it is just as likely that this bias came from institutional resistance to OA. Indeed, I imagine that academics choose to publish in a specific journal because it is sponsored by or identified with given institutions of higher learning. Institutional resistance to OA, therefore, is a likely factor.

This study is the second one done by Swan and Brown on the subject of open access. Their previous study was conducted just two years before this one in 2002, and the majority of respondents in the earlier study were not even familiar with the concept of open access. This seems to indicate that the publishing industry had changed dramatically between those two short years. We are seeing a continuation of this change with the greater and more diverse number of options becoming available in the open access publishing market.

Of special note is the fact that both study groups thought that their preferred publication method carried with it more prestige. It seems, then, that those authors publishing in OA journals have done the legwork of investigating their journals and have come to learn that there are open access publishers who are just as respectable and endowed monetarily as some traditional journals. Therefore, the fear that OA is inferior seems to be rooted in simple ignorance of the options available. More than half of the NOA authors indicated that they could not find an OA journal to publish in. I think that for a lot them, they just went with what they knew or used since the start of their careers.

This study was lacking in one major area, though. The age groups of the respondents were not given. It would have been interesting to see if the OA authors were younger than the NOA authors. I have a sneaking suspicion that there is a Digital Native/Digital Immigrant dynamic at play here.

Transformations – Digital Libraries

rubin_fullsize_rgbIn Chapter 4 of Foundations of Library and Information Science, Richard Rubin explores the history of technology in libraries from microform to early bibliographic retrieval systems on through the development of the Internet, Web 2.0, and finally the emergence of digital libraries. This last Rubin neglects to really define. We are not given a concise definition of digital libraries. Instead, we are treated to explanations of the characteristics of a digital library, and mostly from the work of Karen Calhoun, author of Exploring Digital Libraries.

Calhoun defines digital libraries as basically the extension of physical library services into digital space. In other words, digital libraries are meant to be freely accessible like traditional libraries, as well as structured similarly in terms of bibliographic storage and retrieval. Furthermore, digital libraries – according to Calhoun and Rubin – should be interoperable, focused on community engagement, aware of intellectual property issues, and sustainable. Drilling down into these issues a bit more…

  • Interoperable. Interoperability refers to the ability to search the digital library’s collection on a variety of technological devices, as well as being able to integrate with other library systems.
  • Community engagement. This simply refers to the need to base the digital library around a specific user group, ensuring that the digital library’s collection is useful to its users, as well as intuitive and user-friendly. The digital library cannot be mystifying, especially since there may not be reference help via a chat function available during all operational hours. Chat reference may not be guaranteed for all digital libraries.
  • Intellectual property rights. Out of the four key elements identified by Calhoun, intellectual property issues are a bugbear for digital libraries. Indeed, the digital environment creates new challenges to the areas of licensing and use rights. Out of all the issues confronting digital libraries, this is liable to be the trickiest after the digital library is online and functional.
  • Sustainability. This refers to the ability to manage the digital library in much the same ways as an institutional library. For instance, things like management roles, budgeting, managing subscriptions, curating content, database maintenance (including hardware and software development, and webmastering), providing proper oversight in terms of rules and regulations for users, etc. These are all things that a digital library “staff” will have to address.

Rubin goes from the early online digital collection of images or images of artifacts to the born digital resources of today. This vague idea plays out across the field of emerging LIS. I am not quite sure why Rubin talks about early online collections of photos as a precursor to his discussion of digital libraries. I think we can easily distinguish between mere collections of something, like photos for example, and a “library of photos.” Rubin himself said that there were no standards in these early collections for searching and retrieving. There was a lot of entropy involved instead. A library collection, on the other hand, is a collection that is ordered, described, and made easily accessible when searched.

I think Rubin was closer to hitting the mark for a concise definition of digital libraries in his previous chapter; Chapter 3 on libraries as institutions. At the end of that chapter, Rubin talked about embedded librarians. Indeed, I am wondering if a digital library can even be considered a “library” unless it has an embedded library staff available during operational hours. I know we have been seeing a trend toward self-sufficiency when it comes to users and library services, but if there is not an embedded librarian present in a digital library to assist users, we are looking at more of a third-party service rather than an institutional model. At which point, even referring to a digital library as a library is questionable in my opinion.

It is difficult to determine what Rubin thinks of these transformations, and in particular, of digital libraries. He writes in such a straightforward style that the facts are presented to us with little opinion or bias. A good thing. However, this chapter ends with more questions than solutions, and the lining feels quite cautionary. Indeed, it seems that the concern with digital libraries revolves around the fear of data volatility and the ever-changing nature of digital technology. Are digital libraries a viable model for the long-term preservation of a collection? Will they last hundreds (maybe thousands) of years like their traditional counterparts? Or will digital libraries not even make it halfway to the 22nd century? Digital obsolescence remains a frightful possibility, even after all the advancements in storage and computer back-up technology.