Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

A library for topic modeling based on the algorithm: Generative Text Compression with Agglomerative Clustering Summarization (GTCACS)

License

NotificationsYou must be signed in to change notification settings

andrealenzi11/gen-text-compr-aggl-clust-sum

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A library for topic modeling based on the algorithm: Generative Text Compression with Agglomerative Clustering Summarization (GTCACS)

Installation

Use the package managerpip to install gtcacs.

pip3 install gtcacs

Tested Python version:

python3.8

Tested dependencies:

numpy==1.19.5scikit-learn==0.24.1scipy==1.6.1tensorflow==2.4.1tqdm==4.58.0

Usage

fromsklearn.datasetsimportfetch_20newsgroupsfromgtcacs.topic_modelingimportGTCACS# load datasetcorpus,labels=fetch_20newsgroups(subset='all',return_X_y=True,download_if_missing=False)# set stop wordseng_stopwords= {'i','me','my','myself','we','our','ours','ourselves','you',"you're","you've","you'll","you'd",'your','yours','yourself','yourselves','he','him','his','himself','she',"she's",'her','hers','herself','it',"it's",'its','itself','they','them','their','theirs','themselves','what','which','who','whom','this','that',"that'll",'these','those','am','is','are','was','were','be','been','being','have','has','had','having','do','does','did','doing','a','an','the','and','but','if','or','because','as','until','while','of','at','by','for','with','about','against','between','into','through','during','before','after','above','below','to','from','up','down','in','out','on','off','over','under','again','further','then','once','here','there','when','where','why','how','all','any','both','each','few','more','most','other','some','such','no','nor','not','only','own','same','so','than','too','very','s','t','can','will','just','don',"don't",'should',"should've",'now','d','ll','m','o','re','ve','y','ain','aren',"aren't",'couldn',"couldn't",'didn',"didn't",'doesn',"doesn't",'hadn',"hadn't",'hasn',"hasn't",'haven',"haven't",'isn',"isn't",'ma','mightn',"mightn't",'mustn',"mustn't",'needn',"needn't",'shan',"shan't",'shouldn',"shouldn't",'wasn',"wasn't",'weren',"weren't",'won',"won't",'wouldn',"wouldn't"}# instantiate the GTCACS objectgtcacs_obj=GTCACS(num_topics=20,# number of topicsmax_num_words=50,# maximum number of terms to considermax_df=0.95,# maximum document frequencymin_df=15,# minimum document frequencystopwords=eng_stopwords,# stopwords setngram_range=(1,2),# range for ngrammax_features=None,# maximum number of terms to consider (max vocabulary size)lowercase=True,# flag for convert to lowercasenum_epoches=5,# number of epochsbatch_size=128,# number of documents in a batchgen_learning_rate=0.005,# learning rate for optimize the generative partdiscr_learning_rate=0.005,# learning rate for optimize the discriminative partrandom_seed_size=100,# dimension of generator input layergenerator_hidden_dim=512,# dimension of generator hidden layerdocument_dim=None,# dimension of generator output layer and discriminator's input/output layerlatent_space_dim=64,# dimension of discriminator latent spacediscriminator_hidden_dim=256# dimension of discriminator hidden layer)# compuation on corpus (dimensional reduction, clustering, summarization)gtcacs_obj.extract_topics(corpus=corpus)# get the extracted clusters of wordstopics=gtcacs_obj.get_topics_words()fori,topicinenumerate(topics):print(">>> TOPIC",i+1,topic)# get the topics distribution scores for each documentcorpus_transf=gtcacs_obj.get_topics_distribution_scores()print(corpus_transf)

License

MIT

About

A library for topic modeling based on the algorithm: Generative Text Compression with Agglomerative Clustering Summarization (GTCACS)

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp