- Notifications
You must be signed in to change notification settings - Fork1
Example implementation of the NLP Sandbox Person Name Annotator API
License
nlpsandbox/person-name-annotator-example
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
NLPSandbox.io is an open platform for benchmarking modular natural languageprocessing (NLP) tools on both public and private datasets. Academics, students,and industry professionals are invited to browse the available tasks andparticipate by developing and submitting an NLP Sandbox tool.
This repository provides an example implementation of theNLP Sandbox personname Annotator API written in Python-Flask. An NLP Sandbox person nameannotator takes as input a clinical note (text) and outputs a list of predictedperson name annotations found in the clinical note. Here person names areidentified using a dictionary.
This tool is provided to NLP developers who develop in Python as a startingpoint to package their own person name annotator as an NLP Sandbox tool (seesectionDevelopment). This section also describes how togenerate a tool "stub" usingopenapi-generator for 50+ programminglanguages-frameworks. This repository includes a GitHub CI/CD workflow thatlints, tests, builds and pushes a Docker image of this tool to Synapse DockerRegistry. This image of this example tool can be submitted as-is onNLPSandbox.io to benchmark its performance -- just don't expect a highperformance!
- NLP Sandbox schemas version: 1.2.0
- NLP Sandbox tool version: 1.2.0
- Docker image:docker.synapse.org/syn22277123/person-name-annotator-example
- Docker Engine >=19.03.0
The command below starts this NLP Sandbox person name annotator locally.
docker compose up --build
You can stop the container run withCtrl+C
, followed bydocker compose down
.
Create a Conda environment.
conda create --name person-name-annotator python=3.9conda activate person-name-annotator
Install and start this NLP Sandbox person name annotator.
cd server && pip install -r requirements.txtpython -m openapi_server
This NLP Sandbox tool provides a web interface that you can use to annotateclinical notes. This web client has been automatically generated byopenapi-generator. To access the UI, open a new tab in your browser andnavigate to one of the following address depending on whether you are runningthe tool using Docker (production) or Python (development).
- Using Docker:http://localhost/ui
- Using Python:http://localhost:8080/ui
This section describes how to develop your own NLP Sandbox person name annotator inPython-Flask and other programming languages-frameworks. This example tool isalso available in Java in the GitHub repositorynlpsandbox/person-name-annotator-example-java.
- Node >=14
- Java >=1.8 (required byopenapi-generator)
- Conda >=4 and/orPython >= 3.7
- Synapse.org user account to push the image todocker.synapse.org
Depending on the language-frameworks you want to develop with:
- Python-Flask: create a new repository from thisthis GitHub template.
- Other languages-frameworks: create a brand-new GitHub repository beforegenerating a NLP Sandbox tool stub in sectionGenerating a new NLP Sandboxtool usingopenapi-generator.
You can also use a different code repository hosting service likeGitLab andBitbucket.
This repository includes a GitHubCI/CD workflow that lints, tests, builds andpushes a Docker image of this tool to Synapse Docker Registry. Only the imagesthat have been pushed to Synapse Docker Resgitry can be submitted toNLPSandbox.io benchmarks for now.
After creating your GitHub repository, you need to configure the CI/CD workflowif you want to benefit from automatic lint checks, tests and Docker builds.
- Create twoGitHub secrets
SYNAPSE_USERNAME
: YourSynapse.org username.SYNAPSE_TOKEN
: Apersonal access token that has the permissionsView
,Download
andModify
.
- In theCI/CD workflow, update the environment variable
docker_repository
with the valuedocker.synapse.org/<synapse_project_id>/<docker_image>
where:<synapse_project_id>
: the Synapse ID of a project you have created onSynapse.org.<docker_image>
is the name of your image/tool.
This repository includes aDependabot configuration that instructs GitHub tolet you know when an update is available for one of your dependencies (e.g.Python, Node, Docker). Dependabot will automatically open a PR when an update isavailable. If you have configured the CI/CD workflow that comes with thisrepository, the workflow will automatically run and notify you if the update isbreaking your code. You can then resolve the issue before merging the PR, hencemaking the update effective.
For more information on Dependabot, please visit the GitHub pageEnabling anddisabling version updates.
The development of new NLP Sandbox tools is streamlined by using theopenapi-generator to generate tool "stubs" for more than 50 programminglanguages and frameworks. Here a person name annotator stub refers to an initialimplementation that has been automatically generated byopenapi-generator fromtheNLP Sandbox person name annotator API specification.
Run the command below to get the list of languages-framework supported by theopenapi-generator (under the sectionSERVER generators
).
npx @openapitools/openapi-generator-cli list
Generate the person name annotator stub from an empty GitHub repository (here inPython-Flask):
mkdir servernpx @openapitools/openapi-generator-cli generate \ -g python-flask \ -o server \ -i https://nlpsandbox.github.io/nlpsandbox-schemas/person-name-annotator/latest/openapi.json
where the option-i
refers to the OpenAPI specification of theNLP Sandboxperson name annotator API.
The URL is composed of different elements:
person-name-annotator
- The type of NLP Sandbox tool to generate. The list of allthe NLP Sandbox tool types available is defined in theNLP Sandbox schemas.latest
- The latest stable version of theNLP Sandbox schemas. This tokencan be replaced by a specific release versionx.y.z
of theNLP Sandboxschemas.
The NLP Sandbox schemas is updated after receiving contribution from thecommunity. For example, thePatient schema may include in the futureadditional information that NLP Sandbox tools can leverage to generate moreaccurate predictions.
After an update of the NLP Sandbox schemas,NLPSandbox.io will only accept toevaluate tools that implement the latest version of the schemas. It is thereforeimportant to keep your tools up-to-date and re-submit them so that they continueto appear in the leaderboards and to be used by the community.
This GitHub repository includes a workflow that checks daily if a new release oftheNLP Sandbox schemas is available, in which case a PR will be created.Follow the steps listed below to update your tool.
Checkout the branch created by the workflow.
git checkout <branch_name>
Re-run the sameopenapi-generator command you used to generate the toolstub. If you started from an existing tool implementation like the oneincluded in this GitHub repository, run the following command to update yourtool to the latest version of theNLP Sandbox schemas (this command wouldbe defined in
package.json
).npm run generate:server:latest
Review the updates made to this tool in theNLP Sandbox schemas CHANGELOG.
Review and merge the changes. If you are using VS Code, this step can beperformed relatively easily using the section named "Source Control". Thissection lists the files that have been modified by the generator. Whenclicking on a file, VS Code shows side-by-side the current and updatedversion of the file. Changes can be accepted or rejected at the level of anentire file or for a selection of lines.
Submit your updated tool toNLPSandbox.io.
If you started from an existing tool implementation like the one included inthis GitHub repository, run the following command to lint and test your tool.
npm run lintnpm run test
For Python-Flask tools:
- The linter configuration is defined inserver/setup.cfg.
- The configuration of the unit and integration tests lives inserver/tox.ini.
The NLP Sandbox promotes the development of tools that are re-usable,reproducible, portable and cloud-ready. The table below describes how preventinga tool from connecting to remote server contributes to some of these toolproperties.
Property | Description |
---|---|
Reproducibility | The output of a tool may not be reproducible if the tool depends on external resources, for example, that may no longer be available in the future. |
Security | A tool may attempt to upload sensitive information to a remote server. |
The Docker Compose configuration included with this GitHub repository(docker-compose.yml) prevents the tool container toestablish remote connection. This is achieved through the use of ainternal
Docker network and the presence of the Nginx container placed in front of thetool container. One benefit is that you can test your tool locally and ensurethat it works fine while it does not have access to the internet. Note that whenbeing evaluated onNLPSandbox.io, additional measures are put in place toprevent tools from connecting to remote servers.
This repository usessemantic versioning to track the releases of this tool.This repository uses "non-moving" GitHub tags, that is, a tag will always pointto the same git commit once it has been created.
The artifact published by theCI/CD workflow of this GitHub repository is aDocker image pushed to the Synapse Docker Registry. This table lists the imagetags pushed to the registry.
Tag name | Moving | Description |
---|---|---|
latest | Yes | Latest stable release. |
edge | Yes | Latest commit made to the default branch. |
edge-<sha> | No | Same as above with the reference to the git commit. |
<major>.<minor>.<patch> | No | Stable release. |
You should avoid using a moving tag likelatest
when deploying containers inproduction, because this makes it hard to track which version of the image isrunning and hard to roll back.
Visitnlpsandbox.io for instructions on how to submit your NLP Sandbox tooland evaluate its performance.
Thinking about contributing to this project? Get started by reading ourcontribution guide.
About
Example implementation of the NLP Sandbox Person Name Annotator API
Topics
Resources
License
Code of conduct
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Contributors5
Uh oh!
There was an error while loading.Please reload this page.