andrealenzi11/gen-text-compr-aggl-clust-sumPublic template

NotificationsYou must be signed in to change notification settings
Fork0
Star2

A library for topic modeling based on the algorithm: Generative Text Compression with Agglomerative Clustering Summarization (GTCACS)

License

MIT license

2 stars 0 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
gtcacs		gtcacs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README.rst		README.rst
_config.yml		_config.yml
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Repository files navigation

gen-text-compr-aggl-clust-sum

A library for topic modeling based on the algorithm: Generative Text Compression with Agglomerative Clustering Summarization (GTCACS)

Installation

Use the package managerpip to install gtcacs.

pip3 install gtcacs

Tested Python version:

python3.8

Tested dependencies:

numpy==1.19.5scikit-learn==0.24.1scipy==1.6.1tensorflow==2.4.1tqdm==4.58.0

Usage

fromsklearn.datasetsimportfetch_20newsgroupsfromgtcacs.topic_modelingimportGTCACS# load datasetcorpus,labels=fetch_20newsgroups(subset='all',return_X_y=True,download_if_missing=False)# set stop wordseng_stopwords= {'i','me','my','myself','we','our','ours','ourselves','you',"you're","you've","you'll","you'd",'your','yours','yourself','yourselves','he','him','his','himself','she',"she's",'her','hers','herself','it',"it's",'its','itself','they','them','their','theirs','themselves','what','which','who','whom','this','that',"that'll",'these','those','am','is','are','was','were','be','been','being','have','has','had','having','do','does','did','doing','a','an','the','and','but','if','or','because','as','until','while','of','at','by','for','with','about','against','between','into','through','during','before','after','above','below','to','from','up','down','in','out','on','off','over','under','again','further','then','once','here','there','when','where','why','how','all','any','both','each','few','more','most','other','some','such','no','nor','not','only','own','same','so','than','too','very','s','t','can','will','just','don',"don't",'should',"should've",'now','d','ll','m','o','re','ve','y','ain','aren',"aren't",'couldn',"couldn't",'didn',"didn't",'doesn',"doesn't",'hadn',"hadn't",'hasn',"hasn't",'haven',"haven't",'isn',"isn't",'ma','mightn',"mightn't",'mustn',"mustn't",'needn',"needn't",'shan',"shan't",'shouldn',"shouldn't",'wasn',"wasn't",'weren',"weren't",'won',"won't",'wouldn',"wouldn't"}# instantiate the GTCACS objectgtcacs_obj=GTCACS(num_topics=20,# number of topicsmax_num_words=50,# maximum number of terms to considermax_df=0.95,# maximum document frequencymin_df=15,# minimum document frequencystopwords=eng_stopwords,# stopwords setngram_range=(1,2),# range for ngrammax_features=None,# maximum number of terms to consider (max vocabulary size)lowercase=True,# flag for convert to lowercasenum_epoches=5,# number of epochsbatch_size=128,# number of documents in a batchgen_learning_rate=0.005,# learning rate for optimize the generative partdiscr_learning_rate=0.005,# learning rate for optimize the discriminative partrandom_seed_size=100,# dimension of generator input layergenerator_hidden_dim=512,# dimension of generator hidden layerdocument_dim=None,# dimension of generator output layer and discriminator's input/output layerlatent_space_dim=64,# dimension of discriminator latent spacediscriminator_hidden_dim=256# dimension of discriminator hidden layer)# compuation on corpus (dimensional reduction, clustering, summarization)gtcacs_obj.extract_topics(corpus=corpus)# get the extracted clusters of wordstopics=gtcacs_obj.get_topics_words()fori,topicinenumerate(topics):print(">>> TOPIC",i+1,topic)# get the topics distribution scores for each documentcorpus_transf=gtcacs_obj.get_topics_distribution_scores()print(corpus_transf)

License

MIT

About

A library for topic modeling based on the algorithm: Generative Text Compression with Agglomerative Clustering Summarization (GTCACS)

Releases4

0.0.6 Latest

Mar 4, 2021

+ 3 releases

Packages

No packages published

Languages

Python100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

gen-text-compr-aggl-clust-sum

Installation

Usage

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases4

Packages

Uh oh!

Languages

Movatterモバイル変換

License

andrealenzi11/gen-text-compr-aggl-clust-sum

Folders and files

Latest commit

History

Repository files navigation

gen-text-compr-aggl-clust-sum

Installation

Usage

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases4

Packages0

Uh oh!

Languages

Packages