Monday, June 2, 2014

IR Techniques: Alternative Model - BOOLEAN



In our never-ending quest for knowledge and information, we utilize the Internet and other electronic databases by performing searches.  The actual behind-the-scene processing of searches depends on the search model used, wherein most models construct search results via a ranking function; and ranking creates a score for each set of information retrieved.  The higher the score, the more likely the retrieved data and/or information is what we are looking for in our search query.  The index process (the procedure for retrieving key words and phrases contained in text documents) of various models assigns relevance to each search result in a systematic attempt to predict the end user’s query needs and desires.  

For example, the Google search engine has engineered and fine-tuned its search retrieval mechanism to assist the user in finding relevant information – it’s the most popular search engine in the world.  While performing Internet searches, I’ve often utilized Google and used the Classic models.  And, what are the Classic models?  These are Boolean, Vector and Probabilistic models of searching.  The Boolean model is simplistic, based on algebraic theory that predicts documentation relevance.  As I formulated my queries, I would combine essential words (key words) with operators – AND, OR, NOT – to hone in on specific data within documents.  These operators assist in contracting the information retrieved, as they narrow-down query results.

So that you will understand the usefulness of the Boolean model, I will provide an example.  I queried Google using the following terms: 

  1st: government contracts "small business" veterans - note that I also utilized quote symbols (“”) to further define the query.  It took Google 0.35 seconds to produce 9,550,000 results, and that’s more websites than I could possibly review! 

2nd: To narrow (contract) my results, I then inserted more quotes within my query - "government contracts" "small business" veterans – and this generated 711,000 results in 0.28 seconds.

3rd: revision to my query, I added Boolean operators - "government contracts" AND "small business" AND “disabled-veterans” – results 56,900.  In reviewing some of the previous results, I found hits referring specifically to disabled-veterans and decided to further limit my search.




As you can see, the more limitations you add to the query, the less hits (results) you will obtain.  This is an effective way to utilize the Boolean model.   





So, the next time you search for information, throw in some Boolean operators to hone in on specific information.


Eat chocolate and stimulate your brain!