textgain/graspPublic

NotificationsYou must be signed in to change notification settings
Fork19
Star77

Essential NLP & ML, short & fast pure Python code

License

BSD-3-Clause license

77 stars 19 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 193 Commits
etc		etc
kb		kb
lm		lm
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
__init__.py		__init__.py
graph.js		graph.js
grasp.jpg		grasp.jpg
grasp.py		grasp.py
setup.py		setup.py
svg.py		svg.py
test.py		test.py

Repository files navigation

Grasp.py – Explainable AI

Grasp is a lightweight AI toolkit for Python, with tools for data mining, natural language processing (NLP), machine learning (ML) and network analysis. It has 300+ fast and essential algorithms, with ~25 lines of code per function, self-explanatory function names, no dependencies, bundled into one well-documented file:grasp.py (250KB). Or install withpip, including language models (50MB):

$ pip install git+https://github.com/textgain/grasp

Tools for Data Mining

Download stuff withdownload(url) (ordl), with built-in caching and logging:

src=dl('https://www.textgain.com',cached=True)

Parse HTML withdom(html) into anElement tree and search it withCSS Selectors:

foreindom(src)('a[href^="http"]'):# external linksprint(e.href)

Strip HTML withplain(Element) to get a plain text string:

forword,countinwc(plain(dom(src))).items():print(word,count)

Find articles withwikipedia(str), in HTML:

foreindom(wikipedia('cat',language='en'))('p'):print(plain(e))

Find opinions withbluesky(str):

forpostinfirst(10,bluesky('cats')):# latest 10print(post.id,post.text,post.date)

Deploy APIs withApp. Works with WSGI and Nginx:

app=App()

@app.route('/')defindex(*path,**query):return'Hi! %s %s'% (path,query)

app.run('127.0.0.1',8080,debug=True)

Once this app is up, go checkhttp://127.0.0.1:8080/app?q=cat.

Tools for Natural Language Processing

Get language withlang(str) for 40+ languages and ~92.5% accuracy:

print(lang('The cat sat on the mat.'))# {'en': 0.99}

Get locations withloc(str) for 25K+ EU cities:

print(loc('The cat lives in Catena.'))# {('Catena', 'IT', 43.8, 11.0): 1}

Get words & sentences withtok(str) (tokenize) at ~125K words/sec:

print(tok("Mr. etc. aren't sentence breaks! ;) This is:.",language='en'))

Get word polarity withpov(str) (point-of-view). Is it a positive or negative opinion?

print(pov(tok('Nice!',language='en')))# +0.6print(pov(tok('Dumb.',language='en')))# -0.4

For de, en, es, fr, nl, with ~75% accuracy.
You'll need the language models ingrasp/lm.

Tag word types withtag(str) in 10+ languages using robust ML models fromUD:

forword,posintag(tok('The cat sat on the mat.'),language='en'):print(word,pos)

Parts-of-speech includeNOUN,VERB,ADJ,ADV,DET,PRON,PREP, ...
For ar, da, de, en, es, fr, it, nl, no, pl, pt, ru, sv, tr, with ~95% accuracy.
You'll need the language models ingrasp/lm.

Tag keywords withtrie, a compiled dict that scans ~250K words/sec:

t=trie({'cat*':1,'mat' :2})

fori,j,k,vint.search('Cats love catnip.',etc='*'):print(i,j,k,v)

Get answers withgpt(). You'll need anOpenAI API key.

print(gpt("Why do cats sit on mats? (you're a psychologist)",key='...'))

Tools for Machine Learning

Machine Learning (ML) algorithms learn by example. If you show them 10K spam and 10K real emails (i.e., train a model), they can predict whether other emails are also spam or not.

Each training example is a{feature: weight} dict with a label. For text, the features could be words, the weights could be word count, and the label might bereal orspam.

Quantify text withvec(str) (vectorize) into a{feature: weight} dict:

v1=vec('I love cats! 😀',features=('c3','w1'))v2=vec('I hate cats! 😡',features=('c3','w1'))

c1,c2,c3 count consecutive characters. Forc2,cats → 1xca, 1xat, 1xts.
w1,w2,w3 count consecutive words.

Train models withfit(examples), save as JSON, predict labels:

m=fit([(v1,'+'), (v2,'-')],model=Perceptron)# DecisionTree, KNN, ...

m.save('opinion.json')

m=fit(open('opinion.json'))

print(m.predict(vec('She hates dogs.'))# {'+': 0.4: , '-': 0.6}

Once trained,Model.predict(vector) returns a dict with label probabilities (0.0–1.0).

Tools for Network Analysis

Map networks withGraph, a{node1: {node2: weight}} dict subclass:

g=Graph(directed=True)

g.add('a','b')# a → bg.add('b','c')# b → cg.add('b','d')# b → dg.add('c','d')# c → d

print(g.sp('a','d'))# shortest path: a → b → d

print(top(pagerank(g)))# strongest node: d, 0.8

See networks withviz(graph):

withopen('g.html','w')asf:f.write(viz(g,src='graph.js'))

You'll need to setsrc to thegrasp/graph.js lib.

Tools for Comfort

Easy date handling withdate(v), wherev is an int, a str, or another date:

print(date('Mon Jan 31 10:00:00 +0000 2000',format='%Y-%m-%d'))

Easy path handling withcd(...), which always points to the script's folder:

print(cd('kb','en-loc.csv')

Easy CSV handling withcsv([path]), a list of lists of values:

forcode,country,_,_,_,_,_incsv(cd('kb','en-loc.csv')):print(code,country)

data=csv()data.append(('cat','Kitty'))data.append(('cat','Simba'))data.save(cd('cats.csv'))

Tools for Good

A challenge in AI is bias introduced by human trainers. Remember theModel trained earlier? Grasp has tools toexplain how & why it makes decisions:

print(explain(vec('She hates dogs.'),m))# why so negative?

In the returned dict, the model's explanation is: “you wrotehat +ate (hate)”.

About

Essential NLP & ML, short & fast pure Python code

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Grasp.py – Explainable AI

Tools for Data Mining

Tools for Natural Language Processing

Tools for Machine Learning

Tools for Network Analysis

Tools for Comfort

Tools for Good

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Languages

Movatterモバイル変換

License

textgain/grasp

Folders and files

Latest commit

History

Repository files navigation

Grasp.py – Explainable AI

Tools for Data Mining

Tools for Natural Language Processing

Tools for Machine Learning

Tools for Network Analysis

Tools for Comfort

Tools for Good

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Languages

Packages