Considering Web Classification

For those with a more traditional background in library science, or simply with experience in cataloging departments, I think it may be too easy to feel that cataloging has to be a manual process, controlled by the human cataloger. This may be the case with books, because they have physical dimensions and cataloging-in-publication data which needs to be entered into a cataloging system, either through the process of copy cataloging or original cataloging. Moreover, some libraries may take the liberty to add subject headings to cataloging records that meet the criteria of their own hand-selected collections. However, web resources are a different beast. Classifying web resources can seem like a daunting task because there is such a proliferation of content on the Web, including not just static webpages, but blogs, wiki’s, and videos. The discussion of cataloging web resources once revolved around deciding how to classify just webpages, but now it is a question of classifying web content, which relies increasingly on metadata standards like Dublin Core. The Dublin Core Initiative measures not only standard bibliographic attributes, but those unique to the Web, such as creator(s), format, type of resource, etc.

I think for awhile now we have seen a move away from Library of Congress classification (LCC) or Dewey Decimal classification (DDC), especially in regards to classifying the semantic web. In fact, I have not seen any earnest discussion of applying these classification schemas to web resources. The two projects that had earnestly tried to apply LCC or DDC were the CyberStacks Project out of Iowa State University and OCLC’s NetFirst. These projects seem all but dead now. I think the reason is that applying the alphanumeric codes of LCC and DDC is a process which relies on human matching of subject disciplines, which is just too much of a Sisyphean task when it comes to Web resources. In other words, it is still too difficult for artificial intelligence and machine learning to pin down subject disciplines based on keyword analysis. That being said, we are not without commercialized computer resources to aid in the classification of web resources. There are automated tools which index just about anything they are programmed to index, like web-based keywords or metatags.

These tools make the bibliographic management of the web possible. Bibliometric mapping of the Web can produce large databases of indexed material, which puts the Internet in the cross-hairs of catalogers. So ideally, the best “system” to classify Web materials is to use the many tools that are available to digital librarians which allow for taking bibliographic snapshots of the Web, such as webcrawlers designed for the purpose.

As far as the ephemeral nature of the Web goes, I do not think LIS professionals need to concern themselves too much with cataloging Web material that eventually disappears due to link rot. Canonical webpages – or webpages of content that are sponsored – will provide enough material for catalogers to work on. I see this as being no different than cataloging books that have gone through the publication process. There has always been a certain authority that measures bibliographic worth. Of course, I am aware of the implications of leaving out self-created folk content. But the original purpose of cataloging was to capture the whole of knowledge as nearest as possible, and there is enough information out there to catalog, in print form and on the Web, in order to accomplish this objective.

At any rate, indexing the semantic web through the use of automated products produces large and numerous digital libraries. My ideal system for classifying web resources would be, for starters, a greater emphasis on this endeavor. But also the application of useful digital tools to aid the cataloger in matching content to knowledge base.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s