Monday, June 9, 2014

IR (Information Retrieval) Languages

According to the Linguistic Society of America, there were “6,909 distinct languages [in the world] . . . as of 2009” (LSA, 2014).  Since 2009, this number has increased considerably.  Today, as listed in Ethnologue – an international reference catalog, there are “7,106 known living languages” (Ethnologue, 2014).  Language is basic to the social infrastructure of global communities, as well as the major aspect of IR; without the common everyday language, IR would not exist as a sub-discipline of Library and Information Science (LIS).   We are political people, and politics requires explicit communication.
Chu states that “natural and controlled vocabulary” is intertwined in IR.  And what is the difference between the two vocabularies?  Natural vocabulary is spoken throughout societies on a daily basis; it’s the status quo of communication.  Wherein controlled vocabulary is derived from a set standard of terminology, such as that incorporated by the Library of Congress to store and retrieve data and information (via subject headings, title, etc.)  Controlled vocabulary provides consistency of terms when searching the LOC’s database for specific information.  The particular terms used in data compilation/storage/retrieval have uniformity and standardization, as defined by LIS authorities.  When people search the Internet, utilizing language that’s familiar to them, they use natural vocabulary. 
Chu expresses the issue surrounding synonyms with the example: computer, desktop, and laptop to relate how terminology is utilized.  Most people use one or all of these terms to discuss a distinctive electronic device; and, when performing an Internet search, either term (or all) may be utilized.  However, controlled vocabulary would only utilize one of these terms to be consistent and uniform – with “computer” being the more inclusive term.  I searched the LOC using computer and received total of 160,777 results; desktop search received 3,358 total results; and laptop resulted in 798 hits.  It’s quite clear that “computer” is a controlled vocabulary within the LOC database structure.
The advancement of technologies is the catalyst for our new and improved digital societies – metadata (loosely defined as "data about data") is everywhere.  When search engines manipulate diversified terminology about certain documents, books/journals, websites/webpages, visuals (i.e. images, photos), audio (songs/music, etc.) the electronic mechanisms are utilizing metadata.  Marketing and advertising agencies are top users of metadata, in addition to social media, such as, Twitter and facebook.  The creation and usage of metadata will only increase over time; therefore, we must learn to use it effectively, if we are to be successful researchers.

Have a chocolate smoothie with your IR!

References

Chu, H. (2014). Information representation and retrieval in the digital age. Medford, New Jersey: Information Today, Inc.

Ethnologue. (2014, June 8). Languages of the World. Retrieved from Ethnologue : http://www.ethnologue.com/

LSA. (2014, June 8). How many languages are there in the world? Retrieved from Linguistic Society of America | Advancing the Scientific Study of Languages: http://www.linguisticsociety.org/content/how-many-languages-are-there-world