Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Models for Middle English provided by CLTK

NotificationsYou must be signed in to change notification settings

cltk/enm_models_cltk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

How the word2vec model was trained

We first need to install some mathematics and machine learning packages.

pip install scipy sklearn gensim

We import a corpus of Middle English texts

importpicklewithopen('./middle_english_txt-.p','rb')asp:corpus=pickle.load(p)

The normalization process involves removal of punctuation, special characters and numbers.

CHARACTERS_TO_REMOVE="..."defremove_punc(text):foreleintext:ifeleinCHARACTERS_TO_REMOVE:returntext.lower().replace(ele,"")else:returntext.lower()new_data= []fortextsincorpus:part_text= []forwordintexts:part_text.append(remove_punc(word))new_data.append(part_text)

The data was in a form of a list of lists of strings or a list of sentences, where a sentence is a list of words.

Then we useWord2Vec class fromgensim

# time to try gensim to create word2vecs# see NLP in action, 6.2.4fromgensim.models.word2vecimportWord2Vecnum_features=50min_word_count=30num_workers=2window_size=20subsampling=1e-3model=Word2Vec(new_data,workers=num_workers,vector_size=num_features,min_count=min_word_count,window=window_size,sample=subsampling)model.init_sims(replace=True)me_w2v_model="me_word_embeddings_model.bin"model.save(me_w2v_model)

The model is now saved in the fileme_word_embeddings_model.bin.

About

Models for Middle English provided by CLTK

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp