cltk/enm_models_cltkPublic

NotificationsYou must be signed in to change notification settings
Fork0
Star1

Models for Middle English provided by CLTK

You must be signed in to change notification settings

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
semantics		semantics
.gitignore		.gitignore
README.md		README.md

Repository files navigation

CLTK models for Middle English

How the word2vec model was trained

We first need to install some mathematics and machine learning packages.

pip install scipy sklearn gensim

We import a corpus of Middle English texts

importpicklewithopen('./middle_english_txt-.p','rb')asp:corpus=pickle.load(p)

The normalization process involves removal of punctuation, special characters and numbers.

CHARACTERS_TO_REMOVE="..."defremove_punc(text):foreleintext:ifeleinCHARACTERS_TO_REMOVE:returntext.lower().replace(ele,"")else:returntext.lower()new_data= []fortextsincorpus:part_text= []forwordintexts:part_text.append(remove_punc(word))new_data.append(part_text)

The data was in a form of a list of lists of strings or a list of sentences, where a sentence is a list of words.

Then we useWord2Vec class fromgensim

# time to try gensim to create word2vecs# see NLP in action, 6.2.4fromgensim.models.word2vecimportWord2Vecnum_features=50min_word_count=30num_workers=2window_size=20subsampling=1e-3model=Word2Vec(new_data,workers=num_workers,vector_size=num_features,min_count=min_word_count,window=window_size,sample=subsampling)model.init_sims(replace=True)me_w2v_model="me_word_embeddings_model.bin"model.save(me_w2v_model)

The model is now saved in the fileme_word_embeddings_model.bin.

About

Models for Middle English provided by CLTK

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

CLTK models for Middle English

How the word2vec model was trained

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Movatterモバイル変換

cltk/enm_models_cltk

Folders and files

Latest commit

History

Repository files navigation

CLTK models for Middle English

How the word2vec model was trained

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Packages