It does not matter how good a search engine is if the person doing a search does not ask for the desired information in the right way. So far, a great deal of the research on information retrieval has aimed to develop search algorithms and powerful search engines. Yet, a new doctoral thesis on natural language processing from the University of Gothenburg shows that it is also important to look at the terms people type into the search box.
'Users usually know what kind of information they are looking for, but they don't know what question to ask. The problem these days is not for the search engine to locate the right documents but to make the most relevant texts end up towards the top of the list,' says the author of the thesis Karin Friberg Heppin.
Friberg Heppin used a database of medical texts written in Swedish to explore what makes a search term effective or ineffective. What are the features of good search terms and what characterises bad ones?
Today patients often find their own information on the internet, both before and after seeing a doctor. However, not all documents are easily understood by a lay person. Doctors surf for information too, but won't find much new in popular science texts.
'The language differs between texts written for doctors and texts written for patients. People can use these differences to find the types of documents they want, with respect to both subject and target group,' says Friberg Heppin.
Her point is that if a doctor does a search for, say, the word flu, he or she will not find many texts of interest. Yet, a search for the word influenza will yield more texts that suit the needs of doctors.
Another difficulty arises when the used search term is only available in a text as a compound word, or vice versa. For example, if a Swedish user types in the word diabetes (=diabetes), the search engine will not catch a text that only includes the compound word diabetesbehandling (=diabetes treatment).
'This type of problem is more common in Swedish than in English since compound words are rare in English compared to in Swedish. The fact that almost all information retrieval research has focused on English, a language with entirely different inherent problems, suggests that more Swedish research in the area is essential,' says Friberg Heppin, who points to the importance of the field of linguistics in this context.
'Information retrieval is a multidisciplinary subject where the focus has traditionally been on information and computer science. It's time for linguists to start contributing to improved search effectiveness,' says Friberg Heppin.
The thesis hav been successfully defended.
Cite This Page: