Project Status: Active - The project has reached a stable, usable state and is being actively developed. Build Status

Table of Contents


lexicon is a collection of lexical hash tables, dictionaries, and word lists. The data prefixes help to categorize the data types:

Prefix Meaning
key_ A data.frame with a lookup and return value
hash_ A keyed data.table hash table
freq_ A data.table of terms with frequencies
profanity_ A profane words vector
pos_ A part of speech vector
pos_df_ A part of speech data.frame
sw_ A stopword vector


Data Description
cliches Common Cliches
common_names First Names (U.S.)
constraining_loughran_mcdonald Loughran-McDonald Constraining Words
emojis_sentiment Emoji Sentiment Data
freq_first_names Frequent U.S. First Names
freq_last_names Frequent U.S. Last Names
function_words Function Words
grady_augmented Augmented List of Grady Ward’s English Words and Mark Kantrowitz’s Names List
hash_emojis Emoji Description Lookup Table
hash_emojis_identifier Emoji Identifier Lookup Table
hash_emoticons Emoticons
hash_grady_pos Grady Ward’s Moby Parts of Speech
hash_internet_slang List of Internet Slang and Corresponding Meanings
hash_lemmas Lemmatization List
hash_nrc_emotions NRC Emotion Table
hash_sentiment_emojis Emoji Sentiment Polarity Lookup Table
hash_sentiment_huliu Hu Liu Polarity Lookup Table
hash_sentiment_jockers Jockers Sentiment Polarity Table
hash_sentiment_jockers_rinker Combined Jockers & Rinker Polarity Lookup Table
hash_sentiment_loughran_mcdonald Loughran-McDonald Polarity Table
hash_sentiment_nrc NRC Sentiment Polarity Table
hash_sentiment_senticnet Augmented SenticNet Polarity Table
hash_sentiment_sentiword Augmented Sentiword Polarity Table
hash_sentiment_slangsd SlangSD Sentiment Polarity Table
hash_sentiment_socal_google SO-CAL Google Polarity Table
hash_valence_shifters Valence Shifters
key_contractions Contraction Conversions
key_corporate_social_responsibility Nadra Pencle and Irina Malaescu’s Corporate Social Responsibility Dictionary
key_grade Grades Data Set
key_rating Ratings Data Set
key_regressive_imagery Colin Martindale’s English Regressive Imagery Dictionary
key_sentiment_jockers Jockers Sentiment Data Set
modal_loughran_mcdonald Loughran-McDonald Modal List
nrc_emotions NRC Emotions
pos_action_verb Action Word List
pos_df_irregular_nouns Irregular Nouns Word Dataframe
pos_df_pronouns Pronouns
pos_interjections Interjections
pos_preposition Preposition Words
profanity_alvarez Alejandro U. Alvarez’s List of Profane Words
profanity_arr_bad Stackoverflow user2592414’s List of Profane Words
profanity_banned bannedwordlist.com’s List of Profane Words
profanity_racist Titus Wormer’s List of Racist Words
profanity_zac_anger Zac Anger’s List of Profane Words
sw_dolch Leveled Dolch List of 220 Common Words
sw_fry_100 Fry’s 100 Most Commonly Used English Words
sw_fry_1000 Fry’s 1000 Most Commonly Used English Words
sw_fry_200 Fry’s 200 Most Commonly Used English Words
sw_fry_25 Fry’s 25 Most Commonly Used English Words
sw_jockers Matthew Jocker’s Expanded Topic Modeling Stopword List
sw_loughran_mcdonald_long Loughran-McDonald Long Stopword List
sw_loughran_mcdonald_short Loughran-McDonald Short Stopword List
sw_lucene Lucene Stopword List
sw_mallet MALLET Stopword List
sw_python Python Stopword List


To download the development version of lexicon:

Download the zip ball or tar ball, decompress and run R CMD INSTALL on it, or use the pacman package to install the development version:

if (!require("pacman")) install.packages("pacman")


You are welcome to:
- submit suggestions and bug-reports at: https://github.com/trinker/lexicon/issues
- send a pull request on: https://github.com/trinker/lexicon/
- compose a friendly e-mail to: tyler.rinker@gmail.com