Extract Top Keywords using Python

Extract Top Keywords using Python

Using rake library, we can extract the top keywords from a text.

First, install the rake library. You can use the following command to install using pip.

python3 -m pip install rake

Sample Code:

import yake
keywordExtractor = yake.KeywordExtractor("en", 2, 0.9, 'lev', 2, 5, features=None)
matchedKeywords = keywordExtractor.extract_keywords( "JAVA Program is same as Python Program. So, go with Python Program. I love Python Program and JAVA Program" )
topKeywords = [ keyword[ 0 ] for keyword in matchedKeywords ]
print( topKeywords )

Parameters used:

“en” refers language: language of the input text.

2 refers maximum ngram size: maximum size of the n-grams (sequences of words) that will be considered when extracting keywords.

0.1 refers deduplication threshold: threshold for deduplication of keywords. Keywords with a similar high score than this threshold will be deduplicated.

seqm refers deduplication algorithms to find keywords: deduplication algorithm to be used. Supported values include seqm2 and lev.

1 refers window size: specifies the window size (in words) for computing the term frequency of each keyword. 1 considers each word individually.

5 refers number of keywords: specifies the number of top keywords to be fetched or extracted.

Leave a Reply