Names on Nodes: Phylogenetic Query Script (Rough Draft)

T. Michael Keesey

P.O. Box 292304 Los Angeles, CA, USA 90027; keesey@gmail.com

Abstract

The MathML Definitions document shows how MathML may be used to model phylogenetic hypotheses and phylogenetic definitions. Since MathML is verbose, it may be preferable in some instance to have a more succinct scripting language with the same functionality. I have created a plain text version of the mathematical markup specified by the MathML Definitions document.

Operators and Identifiers

Category Description Formula Notes
General equality
entity1 = entity2
It may be desirable to use == instead (as in C and many other computer languages).
inequality
entity1 != entity2
Also used for exclusive disjunction ("xor"). This operator is borrowed from C.
clause
(entity)
conditional statement
proposition ? entity1 : entity2
Evaluates to entity1 if proposition is true, or entity2 if proposition is false. This operator is borrowed from C.
Constants constant name
"name"
Internal quotes may be "escaped", e.g., "\"Iguanodon\" hoggi". Possibly single quotes (') should be allowed as well, or no quotes for names without whitespace.
declaration
"name" := entity.
integer
digits
Base 10. Non-integers and negative numbers are not required, so no method is provided for denoting them.
Set Theory extensional set
{entity1, entity2 …}
empty set
{}
union
set1 | set2 …
The character ∪ would be preferable, but it is not an ASCII character.
intersection
set1 & set2 …
The character ∩ would be preferable, but it is not an ASCII character.
difference
set1 - set2
Some mathematical texts use "\", so this may be preferable.
set membership
entity in set
The character ∈ would be preferable, but it is not an ASCII character.
subset
set1 <= set2
The character ⊆ would be preferable, but it is not an ASCII character.
proper subset
set1 < set2
The character ⊂ (or ⊊) would be preferable, but it is not an ASCII character.
superset
set1 >= set2
The character ⊇ would be preferable, but it is not an ASCII character.
proper superset
set1 > set2
The character ⊃ (or ⊋) would be preferable, but it is not an ASCII character.
Ordered Lists extensional list
[entity1, entity 2 …]
list element selector
list_index
This notation is somewhat unusual. Other languages use brackets (list[index]), but using underscores allows a clearer distinction between element selection and extensional declaration (previous item), and relate better to common mathematical notation (which uses subscripts).
Boolean Logic true
true
false
false
negation
!proposition
This operator is borrowed from C. The character ¬ would be preferable, but it is not an ASCII character. (Possibly not should be used or allowed?)
conjunction
proposition1 && proposition2
This operator is borrowed from C. The character would be preferable, but it is not an ASCII character. (Possibly and should be used or allowed?)
disjunction (inclusive)
proposition1 || proposition2
This operator is borrowed from C. The character would be preferable, but it is not an ASCII character. (Possibly and should be used or allowed?)
Functions application
function(entity1, entity2 …)
composition
function1 * function2
This is an unorthodox usage of this character. The character would be preferable, but it is not an ASCII character.
Phylogeny phylogenetic graph
P
universal taxon
U
maximal members
max(set)
minimal members
min(set)
predecessor union
prc|(set)
predecessor intersection
prc&(set)
successor union
suc|(set)
successor intersection
suc&(set)
exclusive predecessors
set1 <- set2
set1 is the internal set; set2 is the external set.
synapomorphic predecessors
set1 @ set2
set1 is the apomorphic set; set2 is the representative set.
clade
clade(set)
If the minimal members of set1 form a cladogen (a clade ancestor), then this is equivalent to suc|(set). Otherwise, it is equivalent to (suc| * max * prc&)(set).
node-based clade
clade(set1 | set2 …)
or
(suc| * max * prc&)(set1 | set2 …)
branch-based clade
clade(set1 <- set2)
or
suc|(set1 <- set2)
set1 is the internal set; set2 is the external set.
apomorphy-based clade
clade(set1 @ set2)
or
suc|(set1 @ set2)
set1 is the apomorphic set; set2 is the representative set.
crown clade
crown(set1, set2)
set1 is the bounding set; set2 is the set of extant organisms.
total clade
total(set1, set2)
set1 is the internal set; set2 is the set of extant organisms.

Examples

Formula Prose or Diagram Notes
P :=
[
	{
		"Aves*", 
		"Palaeognathae*",
		"Struthio camelus",
		"Tetrao major",
		"Vultur gryphus"
	},
	{
		["Aves*", "Vultur gryphus"],
		["Aves*", "Palaeognathae*"],
		["Palaeognathae*", "Struthio camelus"],
		["Palaeognathae*", "Tetrao major"]
	}
].
  • Aves
    • Vultur gryphus
    • Palaeognathae
      • Struthio camelus
      • Tetrao major
This defines a simple phylogenetic context (a directed, acyclic graph where vertices are taxonomic units and arcs represent immediate descent).
"Tinamus major" := "Tetrao major".
Tinamus major is Tetrao major. These are objective synonyms under the zoological code.
"Aves" := clade("Struthio camelus" | "Tetrao major"
                | "Vultur gryphus").
Aves is all successors of the maximal common predecessors of Struthio camelus, Tetrao major, and Vultur gryphus.
"Saurischia" := clade("Megalosaurus bucklandii"
                <- "Iguanodon bernissartensis").
Saurischia is all successors of the (common) predecessors of Megalosaurus bucklandii exclusive of all predecessors of Iguanodon bernissartensis.
"Avialae" := clade("wings used for powered flight"
                   @ "Vultur gryphus").
Avialae is all successors of the predecessors of Vultur gryphus to share wings used for powered flight synapomorphically with Vultur gryphus.
"Aves" = crown("Avialae", "extant")
       = crown("Saurischia", "extant")
Aves is equivalent to the avialan crown clade and the saurischian crown clade.
"Pan-Aves" := total("Aves", "extant").
Pan-Aves is the avian total clade.
"Avemetatarsalia" := clade("Aves" <- 
                     "Crocodylus niloticus").
Avemetatarsalia is all successors of the (common) predecessors of Aves exclusive of all predecessors of Crocodylus niloticus.
"Pan-Aves" = "Avemetatarsalia"
Pan-Aves is equivalent to Avemetatarsalia.
"Ichthyornithes" := clade("YPM-VP 1450" 
                 <- "Struthio camelus"
                    | "Tetrao major"
                    | "Vultur gryphus").
Ichthyornithes is all successors of the (common) predecessors of the organism represented by YPM-VP 1450 exclusive of all predecessors of Struthio camelus, Tetrao major, and/or Vultur gryphus. YPM-VP 1450 is the Ichthyornis dispar holotype specimen.
"Ichthyornis" := clade("Ichthyornithes"
                 & ("apomorphy 2"
                    | "apomorphy 5"
                    | "apomorphy 6"
                    | "apomorphy 7" 
                    | "apomorphy 8" 
                    @ "YPM-VP 1450")).
Ichthyornis is all successors of all ichthyornithean predecessors of the organism represented by YPM-VP 1450 to share apomorphies 2, 5, 6, 7, and 8 synapomorphically with the organism represented by YPM-VP 1450. The numbers refer to apomorphies described by Clarke (2004).
"Pan-Biota" :=
(clade * prc&)("Homo sapiens").
Pan-Biota is all successors of all (common) predecessors of Homo sapiens.
"Biota" := crown("Pan-Biota", "extant").
Biota is all successors of the maximal common predecessors of all extant members of Pan-Biota.
"S" := "Otaria byronia" | "Odobenus rosmarus"
       | "Phoca vitulina".
"Pinnipedia" :=
             (max * prc&)("S") <= ("flippers" @ "S")
             ? clade("S") : {}.
If the maximal common predecessors of the specifiers (Otaria byronia, Odobenus rosmarus, and Phoca vitulina) possessed flippers synapomorphic with those of the specifiers, then Pinnipedia is all successors of the maximal common predecessors of the specifiers. Otherwise, Pinnipedia is empty.