purl.org/peter.turney
Synonyms and Attributional Similarity - Applications
- Definition of Attributional Similarity
- The attributional similarity of two words is the degree of similarity in
their meanings. When two words have a very high degree of semantic similarity,
we call them synonyms.
- Recognizing Synonyms
- A measure of attributional similarity can be used to recognize syonyms,
although antonyms also have a high degree of attributional similarity.
(Antonyms can be distinguished by their semantic orientation.)
- Generating Synonyms
- The words in a large corpus can be ranked in order of their attributional
similarity to a given target word. The top ranked words are likely to be
synonyms of the target word. This can be used for automatic thesaurus generation.
- Determining Semantic Orientation
- Some words have negative associations ("immature") and others
have positive associations ("wise"). Suppose we wish to automatically
determine whether a given word or phrase is positive or negative. One approach
is to measure the similarity between the given word and a word that is known to
be positive ("excellent") and a word that is known to be negative ("poor").
We can assign a numerical rating to the given word, based on whether it
is more similar to the positive word or the negative word. This can be extended
from words and phrases to whole documents. For example, we can determine whether
a document, such as a movie review, is positive or negative by calculating the
average semantic orientation of the adjectives and adverbs in the document.
- Lexical Cohesion
- In a cohesive document, we expect many of the words to be
semantically similar to each other. Automatically generated summaries of a
document often lack cohesiveness. A measure of semantic similarity can be
used to identify outliers in an automatically generated summary. We can
improve the quality of the summary by removing these outliers, to make a
more cohesive summary.
- Word Sense Disambiguation
- Consider the ambiguous word "bank". Suppose we encode in a
computer the knowledge that "bank" could be interpreted as "river bank" or
"financial bank". Given the phrase "bank account", the computer can measure the
semantic similarity between "account" and "financial" and compare it to the semantic
similarity between "account" and "river". Since "account" is more similar to
"financial", the computer can infer that "bank" in the phrase "bank account"
probably refers to "financial bank". Thus a good measure of attribuational similarity
can facilitate word sense disambiguation.
- Information Retrieval
- Given a query, a search engine produces a ranked list of matching
documents. The list is sorted by the degree of attributional similarity
between the words in the query and the words in the documents. Typically
the similarity is simply exact matching, but more sophisticated approaches
use stemming (so that "ski" will match with "skiing") or a thesaurus
(so that "car" will match with "automobile"). Latent Semantic Indexing (LSI)
uses a statistical measure of attributional similarity to improve search
engine performance.
- Grading Student Essays
- Automatic grading of student essays typically involves the use
of a statistical measure of attributional similarity, such as
Latent Semantic Analysis (LSA).
Updated: February 3, 2007.