Πριν από μερικούς μήνες η Microsoft ζητούσε από τον κόσμο ιδέες για να βελτιώσει το MSN Search, οπότε το υποφαινόμενο ψώνιο τούς έστειλε το παρακάτω μήνυμα:
A search-index problem that neither Google nor you, as it seems, have solved to users’ satisfaction is the problem of accents in various languages. For example, in French you have opted to disregard the accents. A search for ‘tassé’ (=packed) will bring up lots of ‘tasse’, which is a cup. I’m sure the French get very perplexed over this. On the other hand, no one seems to do this for the Greek language. The problem here is the inverse. We Greeks often have two different accented forms for the same word (usually depending on whether it is a colloquial or literary form), e.g. the genitive of the Greek form for Charles may be Καρόλου or Κάρολου, and in upper-case (or if careless people fail to include the accent) ΚΑΡΟΛΟΥ / Καρολου. So anyone looking for pages with this word actually has to conduct three (!) different searches. (Google brought up 4340, 825 and 554 occurrences respectively; yours was temporarily unavailable).
So what happens here is that the rule that has been applied to French has not been applied to Greek. But neither solution is entirely satisfactory and there should actually be two indexing approaches: one that disregards accents (in both French and Greek, as well as other accented languages) and one that distinguishes between accented and unaccented forms. For example, results may display depending on whether the user enters terms in upper case (TASSE) or lower case (‘tasse’ or ‘tassé’); in Greek, ΚΑΡΟΛΟΥ or Καρόλου or Καρόλου. In the former case, all forms will be displayed; in the latter, the search results are restricted only to the corresponding accented or unaccented forms.
Users don’t really care if the search takes 0.34 instead of 0.17, as long as they get a more intelligent result. And this is one way of demonstrating more intelligence.
Είχε προηγηθεί ένα παρόμοιο μήνυμα στους Google.
Η είδηση λοιπόν είναι ότι εδώ και μερικές μέρες το σύστημα αυτό εφαρμόστηκε από το Google. Δηλαδή, θα σας δώσει τα ίδια ευρήματα είτε γράψετε "ανθρωπου" είτε "ανθρώπου" είτε "άνθρωπου". Έτσι μ' ένα ψάξιμο έχετε όλους τους τύπους που θέλετε.
Αν, από την άλλη, θέλετε να βρείτε σελίδες με τον τύπο "ανθρώπου" και μόνον αυτό, τότε θα πρέπει να καταφύγετε στο MSN Search...
Και παρεμπιπτόντως, κάτι άλλο που διάβασα στο Google και φαίνεται να έχει ενδιαφέρον, λέει: "Google now uses stemming technology. Thus, when appropriate, it will search not only for your search terms, but also for words that are similar to some or all of those terms. If you search for 'pet lemur dietary needs', Google will also search for 'pet lemur diet needs', and other related variations of your terms. Any variants of your terms that were searched for will be highlighted in the snippet of text accompanying each result."