Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Example implementation of the NLP Sandbox Person Name Annotator API

License

NotificationsYou must be signed in to change notification settings

nlpsandbox/person-name-annotator-example

Repository files navigation

nlpsandbox.io

NLP Sandbox Person Name Annotator Example

GitHub ReleaseGitHub CIGitHub LicenseDockerLeaderboardDiscord

Introduction

NLPSandbox.io is an open platform for benchmarking modular natural languageprocessing (NLP) tools on both public and private datasets. Academics, students,and industry professionals are invited to browse the available tasks andparticipate by developing and submitting an NLP Sandbox tool.

This repository provides an example implementation of theNLP Sandbox personname Annotator API written in Python-Flask. An NLP Sandbox person nameannotator takes as input a clinical note (text) and outputs a list of predictedperson name annotations found in the clinical note. Here person names areidentified using a dictionary.

This tool is provided to NLP developers who develop in Python as a startingpoint to package their own person name annotator as an NLP Sandbox tool (seesectionDevelopment). This section also describes how togenerate a tool "stub" usingopenapi-generator for 50+ programminglanguages-frameworks. This repository includes a GitHub CI/CD workflow thatlints, tests, builds and pushes a Docker image of this tool to Synapse DockerRegistry. This image of this example tool can be submitted as-is onNLPSandbox.io to benchmark its performance -- just don't expect a highperformance!

Contents

Specification

Requirements

Usage

Running with Docker

The command below starts this NLP Sandbox person name annotator locally.

docker compose up --build

You can stop the container run withCtrl+C, followed bydocker compose down.

Running with Python

Create a Conda environment.

conda create --name person-name-annotator python=3.9conda activate person-name-annotator

Install and start this NLP Sandbox person name annotator.

cd server && pip install -r requirements.txtpython -m openapi_server

Accessing this NLP Sandbox tool User Interface

This NLP Sandbox tool provides a web interface that you can use to annotateclinical notes. This web client has been automatically generated byopenapi-generator. To access the UI, open a new tab in your browser andnavigate to one of the following address depending on whether you are runningthe tool using Docker (production) or Python (development).

Development

This section describes how to develop your own NLP Sandbox person name annotator inPython-Flask and other programming languages-frameworks. This example tool isalso available in Java in the GitHub repositorynlpsandbox/person-name-annotator-example-java.

Development requirements

Creating a GitHub repository

Depending on the language-frameworks you want to develop with:

You can also use a different code repository hosting service likeGitLab andBitbucket.

Configuring the CI/CD workflow

This repository includes a GitHubCI/CD workflow that lints, tests, builds andpushes a Docker image of this tool to Synapse Docker Registry. Only the imagesthat have been pushed to Synapse Docker Resgitry can be submitted toNLPSandbox.io benchmarks for now.

After creating your GitHub repository, you need to configure the CI/CD workflowif you want to benefit from automatic lint checks, tests and Docker builds.

  1. Create twoGitHub secrets
  2. In theCI/CD workflow, update the environment variabledocker_repositorywith the valuedocker.synapse.org/<synapse_project_id>/<docker_image>where:
    • <synapse_project_id>: the Synapse ID of a project you have created onSynapse.org.
    • <docker_image> is the name of your image/tool.

Enabling version updates

This repository includes aDependabot configuration that instructs GitHub tolet you know when an update is available for one of your dependencies (e.g.Python, Node, Docker). Dependabot will automatically open a PR when an update isavailable. If you have configured the CI/CD workflow that comes with thisrepository, the workflow will automatically run and notify you if the update isbreaking your code. You can then resolve the issue before merging the PR, hencemaking the update effective.

For more information on Dependabot, please visit the GitHub pageEnabling anddisabling version updates.

Generating a new NLP Sandbox tool using openapi-generator

The development of new NLP Sandbox tools is streamlined by using theopenapi-generator to generate tool "stubs" for more than 50 programminglanguages and frameworks. Here a person name annotator stub refers to an initialimplementation that has been automatically generated byopenapi-generator fromtheNLP Sandbox person name annotator API specification.

Run the command below to get the list of languages-framework supported by theopenapi-generator (under the sectionSERVER generators).

npx @openapitools/openapi-generator-cli list

Generate the person name annotator stub from an empty GitHub repository (here inPython-Flask):

mkdir servernpx @openapitools/openapi-generator-cli generate \  -g python-flask \  -o server \  -i https://nlpsandbox.github.io/nlpsandbox-schemas/person-name-annotator/latest/openapi.json

where the option-i refers to the OpenAPI specification of theNLP Sandboxperson name annotator API.

The URL is composed of different elements:

  • person-name-annotator - The type of NLP Sandbox tool to generate. The list of allthe NLP Sandbox tool types available is defined in theNLP Sandbox schemas.
  • latest - The latest stable version of theNLP Sandbox schemas. This tokencan be replaced by a specific release versionx.y.z of theNLP Sandboxschemas.

Keeping your tool up-to-date

The NLP Sandbox schemas is updated after receiving contribution from thecommunity. For example, thePatient schema may include in the futureadditional information that NLP Sandbox tools can leverage to generate moreaccurate predictions.

After an update of the NLP Sandbox schemas,NLPSandbox.io will only accept toevaluate tools that implement the latest version of the schemas. It is thereforeimportant to keep your tools up-to-date and re-submit them so that they continueto appear in the leaderboards and to be used by the community.

This GitHub repository includes a workflow that checks daily if a new release oftheNLP Sandbox schemas is available, in which case a PR will be created.Follow the steps listed below to update your tool.

  1. Checkout the branch created by the workflow.

    git checkout <branch_name>
  2. Re-run the sameopenapi-generator command you used to generate the toolstub. If you started from an existing tool implementation like the oneincluded in this GitHub repository, run the following command to update yourtool to the latest version of theNLP Sandbox schemas (this command wouldbe defined inpackage.json).

    npm run generate:server:latest
  3. Review the updates made to this tool in theNLP Sandbox schemas CHANGELOG.

  4. Review and merge the changes. If you are using VS Code, this step can beperformed relatively easily using the section named "Source Control". Thissection lists the files that have been modified by the generator. Whenclicking on a file, VS Code shows side-by-side the current and updatedversion of the file. Changes can be accepted or rejected at the level of anentire file or for a selection of lines.

  5. Submit your updated tool toNLPSandbox.io.

Testing

If you started from an existing tool implementation like the one included inthis GitHub repository, run the following command to lint and test your tool.

npm run lintnpm run test

For Python-Flask tools:

Preventing an NLP Sandbox tool from connecting to remote servers

The NLP Sandbox promotes the development of tools that are re-usable,reproducible, portable and cloud-ready. The table below describes how preventinga tool from connecting to remote server contributes to some of these toolproperties.

PropertyDescription
ReproducibilityThe output of a tool may not be reproducible if the tool depends on external resources, for example, that may no longer be available in the future.
SecurityA tool may attempt to upload sensitive information to a remote server.

The Docker Compose configuration included with this GitHub repository(docker-compose.yml) prevents the tool container toestablish remote connection. This is achieved through the use of ainternalDocker network and the presence of the Nginx container placed in front of thetool container. One benefit is that you can test your tool locally and ensurethat it works fine while it does not have access to the internet. Note that whenbeing evaluated onNLPSandbox.io, additional measures are put in place toprevent tools from connecting to remote servers.

Versioning

GitHub release tags

This repository usessemantic versioning to track the releases of this tool.This repository uses "non-moving" GitHub tags, that is, a tag will always pointto the same git commit once it has been created.

Docker image tags

The artifact published by theCI/CD workflow of this GitHub repository is aDocker image pushed to the Synapse Docker Registry. This table lists the imagetags pushed to the registry.

Tag nameMovingDescription
latestYesLatest stable release.
edgeYesLatest commit made to the default branch.
edge-<sha>NoSame as above with the reference to the git commit.
<major>.<minor>.<patch>NoStable release.

You should avoid using a moving tag likelatest when deploying containers inproduction, because this makes it hard to track which version of the image isrunning and hard to roll back.

Benchmarking on NLPSandbox.io

Visitnlpsandbox.io for instructions on how to submit your NLP Sandbox tooland evaluate its performance.

Contributing

Thinking about contributing to this project? Get started by reading ourcontribution guide.

License

Apache License 2.0


[8]ページ先頭

©2009-2025 Movatter.jp