4
\$\begingroup\$

This is a Python script that generates ngrams using a set of rules of what letters can follow a letter stored in a dictionary.

The output is then preliminarily processed using another script, then it will be filtered further using an api of sorts by number of words containing the ngrams, the result will be used in pseudoword generation.

This is the generation part:

from string import ascii_lowercaseimport sysLETTERS = set(ascii_lowercase)VOWELS = set('aeiouy')CONSONANTS = LETTERS - VOWELSBASETAILS = {    'a': CONSONANTS,    'b': 'bjlr',    'c': 'chjklr',    'd': 'dgjw',    'e': CONSONANTS,    'f': 'fjlr',    'g': 'ghjlrw',    'h': '',    'i': CONSONANTS,    'j': '',    'k': 'hklrvw',    'l': 'l',    'm': 'cm',    'n': 'gn',    'o': CONSONANTS,    'p': 'fhlprst',    'q': '',    'r': 'hrw',    's': 'chjklmnpqstw',    't': 'hjrstw',    'u': CONSONANTS,    'v': 'lv',    'w': 'hr',    'x': 'h',    'y': 'sv',    'z': 'hlvw'}tails = dict()for i in ascii_lowercase:    v = BASETAILS[i]    if type(v) == set:        v = ''.join(sorted(v))    tails.update({i: ''.join(sorted('aeiou' + v))})def makechain(invar, target, depth=0):    depth += 1    if type(invar) == str:        invar = set(invar)    chain = invar.copy()    if depth == target:        return sorted(chain)    else:        for i in invar:            for j in tails[i[-1]]:                chain.add(i + j)        return makechain(chain, target, depth)if __name__ == '__main__':    invar = sys.argv[1]    target = int(sys.argv[2])    if invar in globals():        invar = eval(invar)    print(*makechain(invar, target), sep='\n')

I want to ask about themakechain function, I usedsets because somehow the results can contain duplicates if I usedlists, though the result can be cast toset, I used a nestedfor loop and a recursive function to simulate a variable number of for loops.

For example,makechain(LETTERS, 4) is equivalent to:

chain = set()for a in LETTERS:    chain.add(a)for a in LETTERS:    for b in tails[a]:        chain.add(a + b)for a in LETTERS:    for b in tails[a]:        for c in tails[b]:            chain.add(a + b + c)for a in LETTERS:    for b in tails[a]:        for c in tails[b]:            for d in tails[c]:                chain.add(a + b + c + d)

Obviouslymakechain(LETTERS, 4) is much better than the nested for loop approach, it is much more flexible.

I want to know, is there anyway I can use a function fromitertools instead of the nestedfor loop to generate the same results more efficiently?

I am thinking aboutitertools.product anditertools.combinations but I just can't figure out how to do it.

Any help will be appreciated.

askedJun 21, 2021 at 6:58
Ξένη Γήινος's user avatar
\$\endgroup\$

2 Answers2

3
\$\begingroup\$

Type checking

Don't doif type(var) == cls, instead do:

if isinstance(var, cls):

tails

This can be built via a dictionary comprehension rather than iterating overascii_lowercase and looking up againstBASETAILS every time:

tails = {    i: ''.join(sorted(VOWELS + v))    for i, v in BASETAILS.items()}

makechain vs loops

Your n-depth loops are really just a cartesian product ofLETTERS andtails, which can be accomplished, as you suspected, withitertools.product:

from itertools import productdef build_chain(letters, tails, depth, chain=None):    chain = chain if chain is not None else set()    if not depth - 1:        return chain.union(letters)    num_tails = [letters]    for _ in range(depth - 1):        chain.update(            ''.join(c) for c in product(*num_tails)        )                last = its[-1]        num_tails.append(''.join(map(tails.get, last)))    return chain

Where this assumes that every entry intails.values() is a string andletters is also a string

answeredJul 25, 2023 at 1:50
C.Nivs's user avatar
\$\endgroup\$
4
\$\begingroup\$

Any help will be appreciated

A few suggestions on something that I noticed:

  • In the functionmakechain theelse after thereturn is not necessary.

  • Typo in:VOWELS = set('aeiouy'), there is an extray.

  • This part:

    LETTERS = set(ascii_lowercase)VOWELS = set('aeiou')CONSONANTS = LETTERS - VOWELSBASETAILS = {    'a': CONSONANTS,    'b': 'bjlr',    'c': 'chjklr',    'd': 'dgjw',     ....    }tails = dict()for i in ascii_lowercase:    v = BASETAILS[i]    if type(v) == set:        v = ''.join(sorted(v))    tails.update({i: ''.join(sorted('aeiou' + v))})

    seems to do the following:

    1. Create a dictionary with mixed value's type (strings and sets)
    2. Convert all values to string
    3. Sort dictionary's values

    It could be simplified to:

    1. Create a dictionary where all values are strings
    2. Sort dictionary's values

    Additionally, havingVOWELS as a set andCONSONANTS as a string is a bit confusing. Would be better to use only one type.

    Code with suggestions above:

    LETTERS = ascii_lowercaseVOWELS = 'aeiou'CONSONANTS = ''.join(set(LETTERS) - set(VOWELS))BASETAILS = {    'a': CONSONANTS,    'b': 'bjlr',    'c': 'chjklr',    'd': 'dgjw',    ....    }tails = dict()for i in ascii_lowercase:    v = BASETAILS[i]    tails.update({i: ''.join(sorted(VOWELS + v))})

    In this way, you also avoid sorting twice.

answeredJun 21, 2021 at 11:05
Marc's user avatar
\$\endgroup\$
1
  • \$\begingroup\$Correction, y in'aeiouy' is not a typo, I know y isn't a full vowel, but it is a half-vowel, it acts as a vowel in many English words, such as many, cyber, psycho, dry... So it can be used as a vowel for pseudoword generation.\$\endgroup\$CommentedAug 2, 2023 at 15:40

You mustlog in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.