Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Semcor License is likely invalid #250

Open
@ekaf

Description

@ekaf

Problem Statement

Thesemcor corpus is currently distributed innltk_data under the Princeton WordNet License. However, a review of its provenance reveals that this license is likely invalid for distributing the underlying text, makingsemcor a non-free package.

Reasoning forsemcor License Invalidity

  1. Derivative Work Status:semcor is not a new work; it is the Brown Corpus with added semantic annotations. Under copyright law, this makes it a derivative work of the original Brown Corpus.

  2. Restrictive Source License: The Brown Corpus is unequivocally licensed under restrictive terms by the Linguistic Data Consortium (LDC). The LDC is the sole official licensing authority.

  3. No Evidence of Sublicensing: There is no public evidence that Princeton University secured a sublicensing agreement from the LDC that would permit them to strip the LDC's restrictions and re-license the underlying Brown text under a permissive license.

  4. Princeton is Not an LDC Member: Investigation confirms Princeton University is not a member of the LDC consortium, eliminating the possibility of special institutional rights that could justify this re-licensing.

Conclusion: Therefore, the Princeton WordNet License attached tosemcor is an overreach and is almost certainlyinvalid for its core content. Distributingsemcor relies on academic leniency, not a sound legal basis. It must be classified asnon-free.

Why This Does NOT Affectwordnet

It is crucial to understand that the invalidity of thesemcor license does not "infect" the WordNet database itself. The legal arguments are distinct:

  • semcor is a Corpus: It contains the full, expressive text of the copyrighted Brown Corpus. Distributing it copies protected expression.
  • WordNet is a Database of Facts: WordNet usedsemcor to derivesense frequencies—statistical facts about language use. Copyright protectsexpression, notfacts. The structure of WordNet is its own creative, copyrightable work, and the facts it contains are unprotectable.

The creation of WordNet's data fromsemcor is a textbook example offair use (highly transformative, non-expressive purpose) and falls under thefact/expression dichotomy. The WordNet database remains on solid legal ground under its permissive Princeton WordNet License.

Proposed Action

  1. Officially reclassify thesemcor package from "free" to"non-free" in thenltk_data index and documentation.
  2. Ensuresemcor is included in thenltk-edu (restricted) pip package and excluded from thenltk-free (commercial-safe) package.
  3. Update thesemcor documentation to clearly state:

    "Distributed under the Princeton WordNet License, but this license is likely invalid as it is a derivative work of the LDC-licensed Brown Corpus. For academic use only."

This action is necessary to maintain the legal integrity of the NLTK project and protect its users.

Seeking Clarification

To resolve this ambiguity, we would welcome clarification from the WordNet team at Princeton University, particularly Dr. Christiane Fellbaum, regarding the rights obtained for creating and distributingsemcor as a derivative work of the Brown Corpus. I am also contacting the WordNet team by email, with an invitation to take part in this discussion.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions


      [8]ページ先頭

      ©2009-2026 Movatter.jp