Using rake library, we can extract the top keywords from a text.
First, install the rake library. You can use the following command to install using pip.
python3 -m pip install rake
Sample Code:
import yake
keywordExtractor = yake.KeywordExtractor("en", 2, 0.9, 'lev', 2, 5, features=None)
matchedKeywords = keywordExtractor.extract_keywords( "JAVA Program is same as Python Program. So, go with Python Program. I love Python Program and JAVA Program" )
topKeywords = [ keyword[ 0 ] for keyword in matchedKeywords ]
print( topKeywords )
Parameters used:
“en” refers language: language of the input text.
2 refers maximum ngram size: maximum size of the n-grams (sequences of words) that will be considered when extracting keywords.
0.1 refers deduplication threshold: threshold for deduplication of keywords. Keywords with a similar high score than this threshold will be deduplicated.
seqm refers deduplication algorithms to find keywords: deduplication algorithm to be used. Supported values include seqm2 and lev.
1 refers window size: specifies the window size (in words) for computing the term frequency of each keyword. 1 considers each word individually.
5 refers number of keywords: specifies the number of top keywords to be fetched or extracted.