This function calculates the distances between trees whose tips belong to the same categories but are not necessarily identically labelled

relatedTreeDist(trees, df, checkTrees = TRUE)

Arguments

trees

a list of trees or multiphylo object

df

a data frame specifying to which category each individual (from all the trees) belongs. Each row gives: an individual (column 2) and its corresponding category (column 1)

checkTrees

a logical (default TRUE) specifying whether the trees should be checked. When TRUE, error messages will be helpful in locating problematic trees, that is, any trees with repeated tip labels, or any trees with missing categories.

Examples

# we will simulate some trees as an example, each "based" on the same tree:
baseTree <- rtree(5)
baseTree$tip.label <- letters[5:1]
plot(baseTree)


tree1 <- simulateIndTree(baseTree, itips=3, permuteTips=FALSE)
tree2 <- simulateIndTree(baseTree, itips=4, permuteTips=FALSE)
tree3 <- simulateIndTree(baseTree, itips=4, permuteTips=TRUE, tipPercent=20)
tree4 <- simulateIndTree(baseTree, itips=4, permuteTips=TRUE, tipPercent=60)
tree5 <- simulateIndTree(baseTree, itips=4, permuteTips=TRUE, tipPercent=100)
# combine:
trees <- list(tree1,tree2,tree3,tree4,tree5)

df <- cbind(sort(rep(letters[1:5],4)),sort(paste0(letters[1:5],"_",rep(1:4,5))))
head(df)
#>      [,1] [,2] 
#> [1,] "a"  "a_1"
#> [2,] "a"  "a_2"
#> [3,] "a"  "a_3"
#> [4,] "a"  "a_4"
#> [5,] "b"  "b_1"
#> [6,] "b"  "b_2"

# Find distances:
relatedTreeDist(trees,df)
#>          1        2        3        4
#> 2 0.000000                           
#> 3 1.119780 1.119780                  
#> 4 1.756684 1.756684 1.737680         
#> 5 2.501562 2.501562 1.991192 1.530931

# Note that trees 1 and 2 have different numbers of tips but the relationships between those tips
# are identical at the category level, hence the related tree distance is 0.
# We can see that the distances between trees increase the more the trees are permuted.