⚠ Version 1 of The Plant List has been superseded.

You should refer instead tothe current version of The Plant List.

AboutThe Plant List

The Plant List is a working list of all known plant species. Version 1, released in December 2010, aims to be comprehensive for species of Vascular plant (flowering plants, conifers, ferns and their allies) and ofBryophytes (mosses and liverworts). It does not include algae or fungi. Version 1 contains 1,244,871 scientific plant names of which 298,900 are accepted species names. It includes no vernacular or common plant names.

Collaboration between the Royal Botanic Gardens, Kew and Missouri Botanical Garden enabled the creation ofThe Plant List by combining multiple checklist datasets held by these institutions and othercollaborators.

The Plant List provides theAccepted Latin name for most species, with links to allSynonyms by which that species has been known. It also includesUnresolved names for which the contributing data sources did not contain sufficient evidence to decide whether they were Accepted or Synonyms.

A description of the content, creation and use ofThe Plant List follows.

Overview

The Plant List is a widely accessible working list of known plant species and has been developed and disseminated as a direct response to theGlobal Strategy for Plant Conservation, adopted in 2002 by the 193 governments who are Parties to the Convention on Biological Diversity. The GSPC was designed as a framework for action to halt the loss of plant diversity. Target 1 of the Strategy called for the completion by 2010 ofa widely accessible working list of all known plant species, as a step towards a complete world Flora. Released in December 2010, Version 1 ofThe Plant List aims to be comprehensive for species of Vascular plant (flowering plants, conifers, ferns and their allies) and of Bryophytes (mosses and liverworts). This is consistent with the initial focus of the GSPC.

The Plant List is not perfect and represents work in progress. Our aim was to produce a ‘best effort’ list by 2010 to demonstrate progress and stimulate further work.

The Plant List was produced as a collaborative venture coordinated by the Royal Botanic Gardens, Kew and the Missouri Botanical Garden and involvingcollaborators worldwide.

Data records from numerous existing global checklist databases (derived from primary taxonomic publications) were brought together and combined with regional and national checklist data and other records from Tropicos. These resources were then complemented by the inclusion of additional names found in IPNI (for Angiosperms, Gymnosperms and Fern & Fern Allies).The Plant List may omit some names and may include some duplicate names. Furthermore those names derived from nomenclators may not include any indication of whether they areAccepted names orSynonyms. Our purpose has been to detect inconsistencies between overlapping data sources and resolve them.

The Plant List does not seek to duplicate the efforts of collaborators that have contributed data to the creation ofThe Plant List. This version will not be edited but feedback will be forwarded to our collaborators so that they can extend and improve their original data. (seeEnhancingThe Plant List andRecreatingThe Plant List). Feedback will arise from our own analysis of the data (and its comparison with other resources) and from users ofThe Plant List (seeHow to Submit Feedback).

In the future we hope to

include improved and extended versions of the data sets included in this version ofThe Plant List
to include other data sets which we were unable to include in Version 1 and
to refine the procedures that were used to createThe Plant List: e.g. for locating duplicate name records, for resolving inconsistencies and for detecting conflicting opinions expressed within alternative data sets and then for selecting from among those opinions (seeHowThe Plant List was Created).

We welcomecomments on the content ofThe Plant List, and offers ofcontributions for inclusion in the next edition.

Target audience

The name of a plant is the key to communicating about it and to finding information about its uses, conservation status, relationships and place within ecosystems.The Plant List provides a tool for resolving or verifying the spelling of plant names and a means to find from a global view the botanically accepted name for a plant and all of its alternative synonyms. Since the ability to plan the sustainable use of plants, essential resources for food, medicines, and ecosystem services depends on effective retrieval of information about plants there is a broad constituency of potential users ofThe Plant List.

Scope

The Plant List is a working list of known plant species, which aims to be comprehensive in coverage at species level for all names of mosses and liverworts and their allies (Bryophytes) and ofVascular plants which include the flowering plants (Angiosperms), conifers, cycads and their allies (Gymnosperms) and the ferns and their allies including horsetails and club mosses (Pteridophytes).

For each name at species level we aim to provide the author of the name, the original place of publication and an assessment of whether the name isaccepted or is asynonym for another name from data resources held by Kew, by Missouri Botanical Garden and by ourcollaborators. Wherever possible for each name included links are also provided to the original online database record, to its corresponding entry in IPNI and to further sources of information about that plant.

The names of some subspecies or varieties of plant are also included inThe Plant List primarily where they are synonyms or accepted names for species names and where they were available from the contributing data sets.The Plant List does not aspire to comprehensive coverage of infraspecific taxa (subspecies, varieties, forms etc.).

What doesThe Plant List not contain?

Version 1 ofThe Plant List does not contain:

scientific names for fossil plants, algae or fungi;
common (or vernacular) names for the plants included;
the geographic distribution or any other data about the plants included (though such data may be obtained from the source databases in many instances).

Description ofThe Plant List data set

Taxonomic coverage

The Plant List includes all known species of the following major plant groups:

Angiosperms
Gymnosperms
Pteridophytes
Bryophytes

Genera and species are presented in families which follow the source database(s) except in the case of Angiosperms where we have, wherever possible allocated accepted genera to the families recognised by the Angiosperm Phylogeny Group.

Angiosperms

Angiosperms (SubclassMagnoliidae Novák ex Takht.). Subclass level classification follows Chase, M.W. & Reveal, J.L., 2009.A phylogenetic classification of the land plants to accompany APG III. Botanical Journal of the Linnean Society, 161, 122–127.

Genera and species ofAngiosperms are presented in families following family circumscriptions in The Angiosperm Phylogeny Group, 2009.An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG III. Botanical Journal of the Linnean Society, 161, 105–121.

Within theAngiosperms, data quality varies widely reflecting the patchiness of the taxonomic and geographic coverage of the source databases. Coverage is believed to be most comprehensive and consistent for Monocotyledonen, whereThe Plant List benefited from the existence of comprehensive checklists fully reviewed by experts (see WCSP andGrassBase). ForAngiosperms other than Monocotyledons, expert-reviewed lists of similar quality provide comprehensive and consistent coverage for certain major families. Otherwise coverage is more patchy and likely to be less consistent as the name records have been assembled fromregional lists and/or other sources not fully reviewed by specialist systematists. Coverage is probably least reliable for areas for which regional lists were not available for incorporation, especially for South East Asia, and for genera with names ending with the letters H–Z (as genera beginning with the letters A–G benefited from earlier compilation effort as part of development of the World Checklist of Selected Plant Families).

Gymnosperms

Gymnosperms — Conifers, Cycads, Ephedras, Gnetum, Ginkgo andWelwitschia (including SubclassGinkgooidae Engl; SubclassCycadidae Pax, SubclassPinidae Cronquist, Takht. & Zimmerm.; SubclassGnetidae Pax following Chase and Reveal, 2009)

Gymnosperms records derive primarily from WCSP and incorporate the2001 World Checklist of Conifers by A.Farjon. Coverage is thought to be comprehensive.

Pteridophytes

Pteridophytes — ferns, horsetails and club mosses (including SubclassEquisetidae Warm; SubclassMarattiidae Klinge; SubclassOphioglossidae Klinge; SubclassPolypodiidae Cronquist, Takht. & Zimmerm; SubclassPsilotidae Reveal; SubclassLycopodiidae Beketov.)

No peer-reviewed global lists of any family of ferns or other Pteridophyte has been incorporated. The data presented are compiled from regional and nomenclatural sources not reviewed by experts and are therefore likely to be less comprehensive and consistent than those forAngiosperms andGymnosperms.

Bryophytes

Bryophytes — mosses, liverworts and hornworts (including SubclassAnthocerotidae Engl.; SubclassBryidae Engl; SubclassMarchantiidae Engl.)

Nomenclatural coverage

The Plant List aims to provide all the scientific names for species for these plant groups. A breakdown of the numbers of plants and names included in each plant group is provided, seeStatistics. Coverage of infraspecific taxa (subspecies, varieties, forms etc) is not comprehensive; they are included, primarily where they are synonyms or accepted names for species names.

Coverage and data quality are primarily influenced by thesource data sets used to buildThe Plant List. We are aware of additional data sets which, had they been included, would have enriched and improved the final product. We hope to include such data sets in later releases.

The Status of name records

Each name record included withinThe Plant List is assigned one of the statuses listed below. The Status of each name is derived primarily from the data source from which that name record comes (seeDerivation of Name Status). The decision, for example, that one name is the accepted name of a given plant is based upon a taxonomic opinion recorded within the cited data source. Such decisions were automated using a rules-based approach which, where necessary, selected from among alternative taxonomic opinions expressed, within or between different data sources. For an explanation of how these decisions were reached seeHowThe Plant List was Created.

Accepted Name

This is the name which should be used to refer to the species (or to a subspecies, variety or forma).

For each name with Status of ‘Accepted’The Plant List aims to provide:

the name currently accepted as the one which should be used in preference to refer to this species (or subspecies, variety or forma);
the author(s) credited with publishing that name;
the place and date of original publication of the name where this was supplied;
a reference to the source database supplying this name record that recorded the opinion that it is an accepted name (with, where possible, a link to that record in the source database);
other names (synonyms) considered to refer to that species;
theIPNI identifier (linking the name record to the International Plant Names Index, a bibliographic resource which will provide full original publication details for this name);
an assessment of theConfidence thatThe Plant List attaches to the name being accepted. (This is an indication of the confidence that the Status of the name is correct).

Synonym

A Synonym is an alternative name which has been used to refer to a species (or to a subspecies, variety or forma) but whichThe Plant List does not consider to be the currently Accepted name. The decision to assign a Status of Synonym to a name record is based upon a taxonomic opinion recorded in the cited data source (selected using automated rules-based approach; seeHowThe Plant List was Created).

Synonymy can be derived directly from the source data (showing identical data as the source data) or can be derived indirectly using the automateddecision rules (e.g. ifsource 1 says thatA is a synonym ofB andsource 2 says thatB is a synonym ofC, thenThe Plant List will showA to be a synonym ofC).

For each name with Status of SynonymThe Plant List aims to provide:

the name;
the author(s) credited with publishing that name;
the place and date of original publication of the name where this was supplied;
a link to its Accepted name;
a reference to the source database supplying this name record and expressing the opinion that it is a Synonym (with, where possible, a link to that record in the source database);
theIPNI identifier (linking the name record to the International Plant Names Index, a bibliographic resource which will provide full original publication details for this name);
an assessment of theConfidence thatThe Plant List attaches to the Status of the name being Synonym.

Unresolved Name

Unresolved names are those to which it is not yet possible to assign a status of either ‘Accepted’ or ‘Synonym’. For an explanation of how names were assigned a status please refer toHowThe Plant List was Created. Unresolved names fall into two sub-classes:

Unassessed names

for which there is no evidence within any of the contributing data sources that the status of this name had been evaluated by the data owners. None had recorded that it was either ‘accepted’ or a ‘synonym’. None had recorded that they had attempted such an evaluation. Since, by definition, a name is accepted by the publishing author at the time of publication, it could be argued that all names are putatively Accepted until such time as they are demonstrated to be Synonyms.

Unplaced names

for which the data source supplying that record indicated positively that the data owners had sought to resolve its status and not been able to come to a conclusion so as to place it either in synonymy or as the accepted name of a new species. This is often the case if the name has insufficient description and no herbarium specimens are known.

Among Unresolved names, Unassessed names are much more numerous than Unplaced names.

It is also important to note that in a small number of cases the status ‘Unresolved’ was assigned to a name record during creation ofThe Plant List despite a taxonomic opinion having been recorded in the contributing data source. This occurs where to have followed this opinion would have conflicted with opinions recorded elsewhere in other data sources. To follow both would have resulted in inconsistencies within the working list of plants. In such cases:

the status of the record on this website is indicated with an ‘*’ to indicate that it derives from the procedures used to buildThe Plant List and
the original status of the name (as recorded in the source database) is indicated on the details page for that name.

For each name with Status of ‘Unresolved’The Plant List aims to provide:

the name;
the author(s) credited with publishing that name
the place and date of original publication of the name where this was supplied;
a reference crediting the source database providing the name; (with, where possible, a link to that record in the source database)
theIPNI identifier (linking the name record to the International Plant Names Index which will provide full original publication details for this name);
Unresolved names are generally flagged as ‘LowConfidence’ entries.

Misapplied Names

Some data sets which contributed toThe Plant List record not only how plant namesshould be used but also where in the published literature a given name may previously have been used inappropriately (to refer erroneously to another species). Recording such misapplication of names helps users to avoid pitfalls when interpreting the literature. The decision that a record establishes the misuse of a name is derived from the cited data source (seeHowThe Plant List was Created.)

For each reported misapplication of a plant name we aim to provide:

the name;
the author(s) that published that name and wherever possible an indication of where or by whom this was misused (e.g. ‘sensu Smith’ may appear after the publishing author);
a link to the Accepted name of the species to which this name has been previously and erroneously applied;
a reference crediting the source database recording this misuse of the name; (with a link to that record in the source database and hence the publication details of where this name was misapplied);
an assessment of theConfidence thatThe Plant List attaches to this name having being erroneously applied to the other species.

Annotation of names

Sources which contributed name records toThe Plant List record included, on relatively few occasions, additional information about individual names beyond their status as Accepted or Synonym. Where possible this information is retained withinThe Plant List and made visible to users as annotations attached to the relevant name record.

Invalid and Illegitimate Names

Some of the names inThe Plant List were recorded by the contributing data sets to be either invalidly or illegitimately published according to the rules of theInternational Code of Botanical Nomenclature.

Spelling variants (or Orthographic variants)

Some data sources include names which are recorded as ‘Orthographic variants’ (or spelling variants) of another name. These misspelt names may not have been validly published and yet are nevertheless used in the literature and therefore included inThe Plant List to guide those that find them.

Confidence Levels

For each name recordThe Plant List offers an indication of the confidence that theStatus of the name record is correct: Our confidence assessments are based primarily on the nature and taxonomic integrity of the source data.

High Confidence level

is applied to theStatus of name records derived from taxonomic datasets which treat the whole of the taxonomic group in question on a global basis and have been peer reviewed (e.g. ILDIS, WCSP, seecollaborators).

Medium Confidence level

is applied to theStatus of name records derived from:

Either national or regional databases via a rules-based automated process, reflecting the challenges inherent in resolving taxonomic differences between different name data sets for the same species for different geographic areas. Regional datasets used as sources forThe Plant List are primarily those stored within Tropicos (seecollaborators for details).
Or taxonomic datasets which treat the whole of the taxonomic group in question on a global basis but which have not yet undergone peer review (e.g. GCC and WCSP (in review) seeCollaborators).

Low Confidence level

is applied to theStatus of name records derived from

any of the contributing data sets which were recorded as unresolved in those data sets.
to name records whose status has been inferred from (sometimes conflicting) information from more than one source database.
to records derived from nomenclatural resources such as IPNI which do not contain opinions about the status of the name and which were assigned a status of Unresolved inThe PlantList.

Contributing data sets

The data resources used to build Version 1 ofThe Plant List are listed here and we are grateful to the many collaborators listed below that made their data available.

We welcome offers of additional data sets for inclusion in the future editions ofThe Plant List (seeContributions).

Global species resources

World Checklist of Selected Plant Families
This large database of global monographic treatments was supplied toThe Plant List as two separate data sets which were treated slightly differently:
1. WCSP
  Peer reviewed treatments are available online for 151 Seed Plant families (view published families). WCSP gives information on the accepted scientific names and synonyms of selected plant families. It includes more than 320,000 names and allows the user to search for all the scientific names of a particular plant, or the areas of the world in which it grows (distribution). The data set counts upon the collaboration over 16 years of 132 specialists from 25 countries who have contributed data or acted asreviewers.
2. WCSP (in review)
  In addition to the published family checklists the World Checklist database contains data for many other families which have either been completed and await review by specialists or are still being compiled.The Plant List also incorporates these unpublished data which include more than 290,000 additional names.
GrassBase – The Online World Grass Flora
The nomenclatural component of this database currently holds over 60,000 names and listsnames for any given genus, geographical region or genus within a geographical region; andlinks to the GrassBase description for any species. The nomenclatural data from GrassBaseis made available through theWCSP system.
The Global Compositae Checklist
is an integrated database of nomenclatural and taxonomic information for the second largestvascular plant family in the world. This checklist is published bytheInternational Compositae Alliance and compiledfrom many contributeddatasets. Thedatabase will becontinually updated. Additional information such as references, distribution andinfraspecific taxa are available on the website. All species are marked as ‘provisionallyaccepted names’ in the Beta version. The data set has not yet been fully peer-reviewed andmay contain some errors. More than 100,000 records derived from The Global CompositaeChecklist are included inThe Plant List.
The International Legume Database and Information Service
is a long-term programme of co-operation among legume specialists worldwide to create abiodiversity database for theLeguminosae(Fabaceae) family. The database provides a taxonomic checklist plusbasic factual data on distribution, common names, life-forms, uses, literature references todescriptions, illustrations and maps. More than 40,000 records derived from ILDIS are included inThe Plant List.
The iPlants project
developed and tested the processes and procedures that would be required during productionof an authoritative, global online list of plant names. The project was a collaborationbetween TheRoyal Botanic Gardens, Kew, theMissouri Botanical Garden and theNew York Botanical Garden and was funded from April 2004to May 2006 by the Gordon and Betty Moore Foundation. Checklists for the following familieswere made available forThe Plant List:Bignoniaceae,Iridaceae,Lecythidaceae,Melanophyllaceae,Physenaceae,Sarcolaenaceae,SchlegeliaceaeandSphaerosepalaceae.More than 11,000 records derived from iPlants are included inThe Plant List.
The International Organization for Plant Information
aims to provide a series of computerised databases summarizing taxonomic, biological, andother information on plants of the world. IOPI’s mission is to develop an efficient andeffective means of providing basic plant information to users, and guide them toward sourcesof authoritative data. Their checklist currently holds over 200,000 names fromwhichThe Plant List includes records forJuncaceaecompiled by J. Kirschner (Institute of Botany, Pruhonice) (Over 1,000 namerecords).
Missouri Botanical Gardens
The Bryophyte information was primarily gathered fromA Checklist of Mosses and ongoing projects dealing with mosses and liverworts to create WorldChecklists for these groups. Some liverwort names were not yet available from data sourcesbut are expected to be added in future versions.

Floristic Datasets

Missouri Botanical Gardens
the botanical information system at the Missouri Botanical Garden,Tropicos contains information on over one millionplant names and 3.9 million herbarium specimens. The system was developed through theactions of a wide variety of floristic, nomenclatural, and bibliographic projects both atthe Garden and in collaboration with other institutions. All of this information isavailable on the Internet through the Garden’s web site.
Tropicos provides access to the accumulated data on vascular plant and bryophyte asauthority files for the development of floras and checklists that provide synthesis of localand regional vegetation. Included within each of these syntheses are indications ofacceptance, synonymy and misapplication of names within a floristic region. This informationwas used to evaluate plant names from these regions forThe Plant List.
The project data held by Tropicos and used in the development ofThe Plant List includes:
Information was also gleaned from recent published literature when the acceptance orsynonyms have been recorded in Tropicos.
More than 240,000 records derived from Tropicos were included inThe Plant List.
Madagascan endemics
TheiPlants project also provided a checklist forMadagascan endemics.

Plant nomenclatural resources

The International Plant Names Index
is a database of the names and associated basic bibliographical details of seed plants,ferns and fern allies. Its goal is to eliminate the need for repeated reference to primarysources for basic bibliographic information about plant names. The data are freely availableand are gradually being standardised and checked. IPNI will be a dynamic resource,depending on direct contributions by all members of the botanical community. IPNI is theproduct of a collaboration between theRoyal Botanic Gardens, Kew, theHarvard University Herbaria, and theAustralian National Herbarium.
Uncompiled name data records derived from Kew’s checklist databases.
Uncompiled name data records from Missouri’s Tropicos database.

HowThe Plant List was Created

Development ofThe Plant List has been a collaborative venture coordinated at the Royal Botanic Gardens, Kew and Missouri Botanical Garden and relying on the generosity of manycollaborators who manage significant taxonomic data resources. The purpose was to merge into a single consistent database the best of the nomenclatural information available in these diverse data resources through a defined and automated process. In summary, development ofThe Plant List involved merging many taxonomic data sources taking the accepted name and synonymy relationships from those that were global checklist datasets, augmenting these and adding additional names and synonymy relationships from regional and national floristic datasets following a set ofdecision rules. Species names not accounted for in any of the previously incorporated data sets are added from nomenclatural resources, ensuring the list is comprehensive for all plant names. Finally a further set of rules are applied to the final data set to resolve inconsistencies, conflicting or overlapping statuses and to correct logical data errors.

The Sequence for Merging Data Sets

The starting point was the set of global peer reviewed family checklists published within theWorld Checklist of Selected Plant Families (WCSP). Families available through the WCSP from other sources includingGrassBase,iPlants (Bignoniaceae,Iridaceae,Lecythidaceae,Melanophyllaceae,Physenaceae,Sarcolaenaceae,Schlegeliaceae andSphaerosepalaceae) andIOPI (Juncaceae) were also included. To these were added additional global checklists from collaborating partners: TheGlobal Compositae Checklist from the International Compositae Alliance andThe International Legume Database and Information Service Also incorporated were all of the compiled WCSP data records for families other than those which have been published (i.e. are in the process of being compiled or are under peer review):WCSP (in review).

The second category of information sources was various national and regional checklists. Missouri Botanical Garden’sTropicos system, primarily provided data from nearly ten digital flora projects. Each of these national or regional floras or checklists was created at a different time by a different team of botanists and considers only plant specimens found within that area’s borders. Thus these floras/checklists contain different subsets of plants (and plant names) and record conflicting opinions as to which are the accepted names for particular plants or what are their synonyms. In buildingThe Plant List, therefore, a significant task was to automate procedures to trawl each of these different data sets to locate new information that they might contain about names and synonymy, then to detect and resolve conflicting opinions among these data sets and to add this additional information to the merged data set. A set ofdecision rules was employed to differentiate between and select from among the diverse opinions expressed within these national and regional data sets.

Finally, there were many scientific plant names (recorded inIPNI or included inTropicos orWCSP as uncompiled records) that had not been included in any of the data sets consulted up to that point. The combined set of global and regional data were therefore compared with the IPNI database to detect missing from our merged data set so that they could be added to our final product. Names derived IPNI (and other nomenclatural data sets consulted) were included as ‘Unresolved’, since data was not recorded in these data sets asserting whether these were the Accepted name for a new plant (not yet in the merged data set) or whether they were Synonyms of plants already in the merged list.

A significant component of this and later phases of the creation ofThe Plant List involved the matching of names between different data sets to identify whether a name was unique to one data set or included in multiple data sets. A variety of algorithms were employed to perform name matching at different stages depending upon the requirements at that stage in the process.

Derivation of Name Status

The procedures used to buildThe Plant List were designed to follow the taxonomic opinions recorded within the contributing data sets. Where necessary these procedures selected from among alternative and conflicting opinions recorded between data sets so as to achieve a coherent taxonomic consensus.

Consistent application of the decision rules allowed resolution of most instances of conflicts between data sources so that most species names can be clearly established as either an accepted name or as a synonym with reference to the data source in which that status is recorded. It is important to note that the set of synonyms which point to a given accepted name inThe Plant List may have originated from more than one data source i.e. some synonyms for a given species may derive from a data set other than that from which the accepted name record derived.

Approximately 98% of allStatus values withinThe Plant List derive directly from the data source which supplied that name record.

The Status of the remaining 2% of name records inThe Plant List has been modified from that stored in the source data set as a result of the conflict resolution processes. Such changes were made only where necessary to avoid illogical conflicts detected within the data sets supplied or within the merged data set (i.e. were made to improve the consistency ofThe Plant List). Where such changes were made, these were primarily to downgrade name records recorded as having a status of ‘Accepted’ in the source database to having a status of ‘Unresolved’ inThe Plant List.

Any name records whose status was modified during the creation ofThe Plant List are labelled (using an asterisk) and the original status in the data source also indicated. The Confidence level of any record modified by these procedures was set to ‘Low Confidence’.

Decision Rules to arbitrate between Conflicts of Opinion

A set of decision rules were employed to differentiate between and select from among diverse opinions expressed within all of the data sources consulted. These rules were developed by the team at Kew and Missouri in an attempt to mimic the sort of decision-making rationale a botanist might use in a situation where he/she encounters conflict between taxonomic treatments in the literature but is not in a position to resolve the question by examining the original material. For example:

monographic treatments which consider the group in question in its entirety throughout its distribution are given priority over geographically defined treatments which can result in a single species being treated under different names in different parts of its range;
synonym relationships reported in more recent treatments are given priority over those published earlier;
publication dates are used to assist in detecting likely illegitimate names;
author details are used to detect likely orthographic variants (alternative spellings of the same name);
the decision rules are informed by the principles embedded in the International Code of Botanical Nomenclature.

Data analysis of logical inconsistencies and data integrity issues

The data set created by merging records from the various data sources as described above was found initially to be inconsistent and logically incongruous for a variety of reasons.

Each of the taxonomic data sets incorporated intoThe Plant List are themselves still being developed and improved upon by their owners and editors. None therefore can be considered to be complete or entirely up to date. Nor would their owners claim that these data sets were free of inconsistencies, gaps or data error. Furthermore these databases use terminology in different ways which necessitated some level of standardisation. Some contained fossil plant names or names of taxonomic ranks that are not intended to be included inThe Plant List and yet which, nevertheless, might link to names in the merged data set. Careful filtering of the record set was needed.

Inevitably, the process of bringing many different data sets together added a layer of further complexity. Thus for example it is not straightforward to automate recognition of a particular Latin binomial reliably within different data sets given that the plant name authors may have been cited or abbreviated differently, subtle differences in spelling and punctuation occur between the data sources and not all of them included the place of publication of a name to help resolve suspected matches. This added a degree of uncertainty even before other complexities such as those surrounding homonyms and misapplied use were dealt with. As a result certain circumstances in the procedures created a merged data set in which a few names were used inconsistently based upon records derived from different sources.

The goal ofThe Plant List project is to create a single internally coherent view rather than a set of alternative views. The final stage of development ofThe Plant List therefore involved rigorous logical analysis of the data set. Steps were taken, for example, to identify likely duplicates used in different senses, to detect where a number of Synonyms link one to another but lack any link to an Accepted name, where illegitimate names are assigned Accepted status or where a subspecies included in the dataset occurs within a species which itself does not occur.

Resolution of logical inconsistencies and data integrity issues

For each different data inconsistency detected solutions were derived based upon the concepts and principles as outlined above and used in the previous stages. Additional decision rules were created and new automated steps introduced to perform the following actions on the merged data set:

Standardisation of terminology
Standardisation, Selection and Filtering of name records
Deduplication of names
Resolving referential integrity regarding linkages among synonyms.
Resolving referential integrity regarding taxonomic relationships.
Standardisation of the names of Families and Major Groups so as to create the taxonomic hierarchy necessary to support browsing ofThe Plant List.

Online Publication of The Plant List

Target 1 of the GSPC was to achieve a "widely accessible" working list of all known plant species. To accomplish that aspect of Target 1, this website was created to enable world-wide access to the working list. The final merged and resolved data set of all plant species is accessible through the search and browse features offered here.

Next Steps

As a result of the data analysis and conflict resolution steps described above it is now intended to provide detailed feedback to each of the collaborators that contributed datasets on providing them with enriched data records, information on inconsistencies detected and comparisons with other relevant data sets. Details of the data processing entailed in creatingThe Plant List are to be published for broader discussion. Interest in the process and suggestions for refinements to the decision rules are welcome.

The project team

The Plant List owes its origins to a three-day workshop at Missouri Botanical Garden in May 2008. Bob Allkin, Eimear Nic Lughadha and Alan Paton (Kew) joined Bob Magill and Chuck Miller (MO) to plan how existing resources could best be combined to produce a best efforts working list to meet the 2010 deadline. The principles underlying our approach were agreed at that time, along with many of the decision rules and initial drafts of workflows for the data processing required. Translating that initial plan into action and refining the process to improve the product involved many more people over many months, with datasets, e-mails and occasionally people moving back and forward between Kew and St Louis.

Contributors working at Kew

Bob Allkin – Project Manager
Abigail Barker – Applications Development Manager
Matthew Blissett – Lead Developer Web Application
Charlotte Couch – Support Families and Genera Index
Paramjit Dhaliwal – IT Operations team
Jeff Eden – Graphic Designer
Rafaël Govaerts – Editor of the World Checklist of Selected Plant Families
Graham Hawkes – Developer responsible forThe Plant List data and procedures
Chris Hopkins – Developer Web Application
Eimear Nic Lughadha – Senior Responsible Owner
Nicky Nicolson – Developer responsible for IPNI
Alan Paton – Assistant Keeper, Herbarium
John Stone – Graphic Designer
Julius Welby – Data administration
Ian Wright – IT Operations team leader

Contributors working at Missouri

Bob Magill – Senior Vice President of Science & Conservation
Chuck Miller – Vice President of Information Technology
Chris Freeland – Director of Center for Biodiversity Informatics
Jay Paige – Application and Database Developer
Heather Stimmel – Application and Database Developer
Craig Geil – Application and Database Developer

EnhancingThe Plant List

In the future we envisage producing subsequent versions ofThe Plant List at regular intervals. Subsequent versions could:

merge improved and extended versions of the data sets used to create Version 1. As the custodians of the source datasets enhance their data sets through their own planned additions and corrections, their improvements will feed into subsequent versions ofThe PlantList. We are committed to supplying feedback to the owners of each data set as this arises from the use and creation ofThe Plant List
include additional data sets which were unavailable in Version 1. If you are interested in making a futurecontribution, please contactcontributors@theplantlist.org.
reflect enhancements in the procedures that were used to createThe Plant List: e.g for locating duplicate name records, for resolving inconsistencies and for detecting conflicting opinions expressed within alternative data sets and then for selecting from among those opinions (seeHowThe Plant List was created).

Our immediate priorities following the December 2010 launch are to:

provide feedback to the collaborators who contributed datasets
complete documentation of the decision rules for publication
plan future versions ofThe Plant List

Likely priorities for future work include:

increasing the number of data contributors — especially for South East Asia which is poorly represented in Version 1
adding links to country level geography
providing unique identifiers.

Relationships to other resources

Of the data resources that were used to createThe Plant List many of the previously published global monographic datasets are also available through the Catalogue of Life which provides peer reviewed information for many plant families.

The Plant List goes beyond the scope of Catalogue of Life by also including global treatments which are in peer review or have not yet been published and by seeking to complete the list of all species names by filling the gaps using further digital resources including regional and national floras and nomenclatural databases.The Plant List thus aims to be comprehensive in coverage at species level for all names of mosses and liverworts, flowering plants, conifers, cycads and their allies and the ferns and their allies. It is nevertheless a work in progress. TheConfidence in theStatus assigned each name records is indicated.

Movatterモバイル変換