A lemmatizer removes inflections, e.g in case of plurals, pronoun case, and verb endings of a word to revert it back to its base form (a lemma). To use the Lemmatizer node, a POS (Part-of-Speech) tagger, e.g Stanford tagger node, or POS tagger node, has to be applied beforehand, because the lemmatization process relies heavily on the POS tag of each term.
This workflows shows a simple example on how to lemmatize terms in documents using the Stanford Lemmatizer node and also to show what exactly the Lemmatizer does to the input document terms, in comparison to other preprocessing nodes, for example the Snowball Stemmer.
Stemmer and lemmatizer are both commonly used natural language processing techniques in the field of Information Retrieval. Let's look at the example below and assume the result is an index of a search engine. If we now query the word "mouse", only the first document will be returned using the stemmedTerms. However, if we use the lemmatizedTerms the second document will also be returned, because the word "mice" is a plural form of "mouse".
Workflow
Stanford Lemmatizer Example
Used extensions & nodes
Created with KNIME Analytics Platform version 4.1.0
- Go to item
- Go to item
- Go to item
- Go to item
- Go to item
- Go to item
Loading deployments
Loading ad hoc jobs
Legal
By using or downloading the workflow, you agree to our terms and conditions.