- Notifications
You must be signed in to change notification settings - Fork1
Transform JSON output from Senzing SDK for use with graph technologies, semantics, and downstream LLM integration
License
senzing-garage/sz-semantics
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
If you are beginning your journey with [Senzing],please start with [Senzing Quick Start guides].
You are in the [Senzing Garage] where projects are "tinkered" on.Although this GitHub repository may help you understand an approach to using Senzing,it's not considered to be "production ready" and is not considered to be part of the Senzing product.Heck, it may not even be appropriate for your application of Senzing!
Transform JSON output from theSenzing SDKfor use with graph technologies, semantics, and downstream LLM integration.
This library usespoetry for demos:
poetry update
Otherwise, to use the library:
pip install sz_semantics
For thegRCP server, if you don't already have Senzing and its gRPC server otherwiseinstalled pull the latest Docker container:
docker pull senzing/serve-grpc:latest
Mask the PII values within Senzing JSON output with tokens which canbe substituted back later. For example,mask PII values beforecalling a remote service (such as an LLM-based chat) thenunmaskreturned text after the roundtrip, to maintaindata privacy.
importjsonfromsz_semanticsimportMaskdata:dict= {"ENTITY_NAME":"Robert Smith" }sz_mask:Mask=Mask()masked_data:dict=sz_mask.mask_data(data)masked_text:str=json.dumps(masked_data)print(masked_text)unmasked:str=sz_mask.unmask_text(masked_text)print(unmasked)
For an example, run thedemo1.py script with a data file whichcaptures Senzing JSON output:
poetry run python3 demo1.py data/get.json
The two listsMask.KNOWN_KEYS andMask.MASKED_KEYS enumeraterespectively the:
- keys for known elements which do not require masking
- keys for PII elements which require masking
Any other keys encountered will be masked by default and reported aswarnings in the logging. Adjust these lists as needed for a given usecase.
For work with large numbers of entities, subclassKeyValueStore toprovide a distributed key-value store (other than the Python built-indict default) to use for scale-out.
To useSzClient to simplify access to the Senzing SDK, first launchtheserve-grpc container and run it in the background:
docker run -it --publish 8261:8261 --rm senzing/serve-grpc
For example code which runsentity resolution on the "truthset"collection of datasets:
importpathlibimporttomllibfromsz_semanticsimportSzClientwithopen(pathlib.Path("config.toml"),mode="rb")asfp:config:dict=tomllib.load(fp)data_sources:typing.Dict[str,str ]= {"CUSTOMERS":"data/truth/customers.json","WATCHLIST":"data/truth/watchlist.json","REFERENCE":"data/truth/reference.json",}sz:SzClient=SzClient(config,data_sources)sz.entity_resolution(data_sources)forent_jsoninsz.sz_engine.export_json_entity_report_iterator():print(ent_json)
For a demo of running entity resolution on the "truthset", run thedemo2.py script:
poetry run python3 demo2.py
This produces theexport.json file which is JSONL representing theresults of a "get entity" call on each resolved entity.
Note: to show the redo processing, be sure to restart the containereach time before re-running thedemo2.py script -- although theentity resolution results will be the same even without a containerrestart.
Starting with a smallSKOS-based taxonomyin thedomain.ttl file, parse the Senzingentity resolution (ER) results to generate anRDFlibsemantic graph.
In other words, generate the "backbone" for constructing anEntity Resolved Knowledge Graph, as a core component of asemantic layer.
The example code below serializes thethesaurus generated fromSenzing ER results as"thesaurus.ttl" combined with the Senzingtaxonomy definitions, which can be used for constructing knowledgegraphs:
importpathlibfromsz_semanticsimportThesaurusthesaurus:Thesaurus=Thesaurus()thesaurus.load_source(Thesaurus.DOMAIN_TTL)export_path:pathlib.Path=pathlib.Path("data/truth/export.json")withopen(export_path,"r",encoding="utf-8")asfp_json:forlineinfp_json:forrdf_fraginthesaurus.parse_iter(line,language="en"):thesaurus.load_source_text(Thesaurus.RDF_PREAMBLE+rdf_frag,format="turtle", )thesaurus_path:pathlib.Path=pathlib.Path("thesaurus.ttl")thesaurus.save_source(thesaurus_path,format="turtle")
For an example, run thedemo3.py script to process the JSON filedata/truth/export.json which captures Senzing ER exported results:
poetry run python3 demo3.py data/truth/export.json
Check the resulting RDF definitions in the generatedthesaurus.ttlfile.
License and Copyright
Source code forsz_semantics plus any logo, documentation, andexamples have anApache licensewhich is succinct and simplifies use in commercial applications.
All materials herein are Copyright © 2025 Senzing, Inc.
Kudos to@brianmacy,@jbutcher21,@docktermj,@cj2001,@jesstalisman-ia,and the kind folks atGraphGeeks for their support.
About
Transform JSON output from Senzing SDK for use with graph technologies, semantics, and downstream LLM integration
Topics
Resources
License
Code of conduct
Contributing
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Uh oh!
There was an error while loading.Please reload this page.
Contributors3
Uh oh!
There was an error while loading.Please reload this page.
