Skip to contents

Introduction

In this vignette, we explain the details about each evaluation metrics implemented in poem. These include the minimum level at which the metric can be calculated, the full name of the metric, and the calculation of the metric. For more details, please refer to our manuscript.

Partition-based metrics

Partition-based metrics. The notation used is common throughout the table: consider comparing the predicted partition PP to the ground-truth partition GG; aa is the number of pairs that are in the same group both in PP and GG; bb is the number of pairs that are in the same class in GG but in different clusters in PP; cc is the number of pairs that are in different classes in GG but in the same cluster in PP; dd is the number of pairs that are in different groups both in PP and GG; nn is the total number of objects; EE is the expectation operator; H()H(⋅) is the Shannon entropy; $eta$ is the ratio of weight attributed to homogeneity vs completeness; the expactation value of RI, WH, and WC is calculated when assuming a generalized hypergeometric model.
Min_level Metric Calculation
dataset Rand Index (RI) a+dn(n1)/2\frac{a+d}{n(n-1)/2}; the ratio of the sum of true positive and true negative pairs to the total number of object pairs.
class/cluster Wallace Homogeneity (WH) aa+c\frac{a}{a+c}; the ratio of the true positive pairs to the total number of object pairs that are in the same cluster in PP.
class/cluster Wallace Completeness (WC) aa+b\frac{a}{a+b}; the ratio of the true positive pairs to the total number of object pairs that are in the same classes in GG.
dataset Adjusted Rand Index (ARI) RIE(RI)1E(RI)=2(adbc)(a+b)(b+d)+(a+c)(c+d)\frac{\text{RI}-\mathrm{E}(\text{RI})}{1-\mathrm{E}(\text{RI})} = \frac{2(ad-bc)}{(a+b)(b+d)+(a+c)(c+d)}; adjusting RI by accounting for the expected similarity of all pairings due to chance using the Permutation Model for clusterings. ARI is the harmonic mean of AWH and AWC.
dataset Normalized Class Size Rand Index (NCR) A normalized version of RI, where each concordance quantities are divided by the maximum possible concordance values for their respective class.
dataset Mutual Information (MI) H(G)H(G|P)H(G) - H(G|P); the difference between the shannon entropy of GG and the conditional entropy of GG given PP.
class/cluster Adjusted Wallace Homogeneity (AWH), Adjusted Wallace Completeness (AWC), and Adjusted Mutual Information (AMI) Chance adjusted version of WH, WC and MI, respectively. For a metric M, the chance adjusted version of it is ME(M)1E(M)\frac{\text{M}-\mathrm{E}(\text{M})}{1-\mathrm{E}(\text{M})}.
dataset (Entropy-based) Homogeneity (EH) 1H(G|P)H(G)1-\frac{H(G|P)}{H(G)} if H(G,P)0H(G,P)\neq0, 11 otherwise; the ratio of MI to the individual entropy of GG.
dataset (Entropy-based) Completeness (EC) 1H(P|G)H(P)1-\frac{H(P|G)}{H(P)} if H(P,G)0H(P,G)\neq0, 11 otherwise; the ratio of MI to the individual entropy of PP.
class/cluster V Measure (VM) (1+β)×EH×ECβ×EH+EC\frac{(1+\beta)\times\text{EH}\times\text{EC}}{\beta\times\text{EH}+\text{EC}}; the harmonic mean between EH and EC. It is identical to normalized mutual information (NMI) when arithmetic mean is used for averaging in NMI calculation.
class/cluster (weighted average) F Measure (wFM) Here we calculate weighted F1-score, where the weights are based on the sizes of classes.

Embedding-based metrics

Embedding-based metrics.
Min_level Metric Calculation
dataset Silhouette score nmmax(m,n)\frac{n-m}{\text{max}(m, n)}, where nn is the mean distance between a sample and the nearest class that the sample is not a part of, and mm is the mean intra-class distance.
dataset Composed Density between and within Clusters (CDbw) The CDbw index consists of three main components: cohesion, compactness, and separation between clusters. It uses multiple representative points selected from each cluster to calculate intra-cluster density and between-cluster distances, reflecting the geometry of the clusters and capturing changes in intra-cluster density.
dataset Density Based Clustering Validation index (DBCV) A density-based index that computes the least dense region inside a cluster and the most dense region between the clusters, to measure the within and between cluster density connectedness of clusters.

Graph-based metrics

Graph-based metrics.
Min_level Metric Calculation
dataset Modularity For a given graph partition, it quantifies the number of edges within communities relative to what would be expected by random chance. Q=12mij(Aijγkikj2m)δ(ci,cj)Q = \frac{1}{2m} \sum_{ij} \left( A_{ij} - \gamma \frac{k_i k_j}{2m} \right) \delta(c_i, c_j), where mm is the number of edges, AA is the adjacency matrix of the graph, kik_i is the (weighted) degree of ii, γ\gamma is the resolution parameter, and δ(ci,cj)\delta(c_i, c_j) is 11 if ii and jj are in the same community else 00.
element Local Inverse Simpson’s Index (LISI) For a given node in a weighted kNN graph, the expected number of nodes needed to be sampled before two nodes are drawn from the same classes within its neighborhood.
element Neighborhood Purity (NP) For each node in a graph, the proportion of its neighborhood that is of the same class as it.
element Proportion of Weakly Connected (PWC) For a given community in a graph, the proportion of nodes that have more connections to the outside of the community than the inside of the community.
element Cohesion The minimum number of nodes that must be removed to split a graph.
class/cluster Adhesion The minimum number of edges that must be removed to split a graph.
class/cluster Adjusted Mean Shortest Path (AMSP) A measure of the disconnectness and spread of the subgraph connecting elements of a given class. If the graph subclass is disconnected, the mean shortest path of each connected subgraph mm are summed. i(1+mi)N\frac{\sum_{i} (1+m_i)}{\sqrt{N}}, where mm is the mean shortest path and NN is the number of nodes of the given class. Note that the normalization for size is only approximative, and only applicable for kNN graphs.
class/cluster Neighborhood Class Enrichment (NCE) The log2 fold-enrichment (i.e. over-representation) of the node’s class among its nearest neighbors, over the expected given its relative abundance.

Metrics for spatial clusterings

Metrics for spatial clusterings.
Min_level Metric Calculation
class/cluster Percentage of Abnormal Spots (PAS) PAS measures the percentage of abnormal spots, which is defined as spots with a spatial domain label differing from more than half of its nearest neighbors.
class/cluster Spatial Chaos Score (CHAOS) CHAOS is the mean length of the graph edges in the 1-nearest neighbor (1NN) graph for each domain averaged across domains.
element Entropy-based Local indicator of Spatial Association (ELSA) For a site ii, Ei=Eai×EciE_i = E_{ai} \times E_{ci}, where EaiE_{ai} summarizes the dissimilarity between site ii and the neighbouring sites, and EciE_{ci} quantifies the diversity of the categories within the neighbourhood of site ii.
dataset Spatial RI, ARI, WH, WC, AWH, and AWC Spatial versions of the pair-sorting indices, based on fuzzy versions of the metrics. Specifically, we use the Normalized Degree of Concordance (NDC, see Hullermeier et al., 2012) and the Adjusted Concordance Index (ACI, see D’Errico et al., 2021) as fuzzy versions of RI and ARI respectively, and developed fuzzy versions of the other metrics using the same logic. In the spatial context, we first make a fuzzy version of the true labels based on the spatial neighborhood, and then track the maximum pair concordance between the predicted labels and either the hard or fuzzy ground truth.
element Spot-wise Pair Concordance (SPC) The proportion, for each spot, of the pairs it forms with all other spots that are concordant (i.e. in the same partition or not in both) across the clustering and ground truth. This value will be the same for all spots that share the same combination of cluster and class, and is especially useful for visualization. A variant of this can be computed that ignores negative pairs (i.e. that are discordant in both the clustering and ground truth). When negative pairs are included, the average of SPC equals to the Rand Index.
element Spatial SPC Like the non-spatial Spot-wise Pair Concordance, with the difference that the clustering is evaluated against both a ‘hard’ and ‘fuzzy’ version of the ground truth, as for the computation of the Spatial versions of the pair-sorting indices.
dataset Spatial Set Matching Accuracy An accuracy that downweights misclassifications based on the spatial neighborhood. Instead of counting as zero in the accuracy computation, the misclassified node counts as the proportion of its spatial neighborhood that is of node’s predicted class.

Session info

## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.5 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Europe/Zurich
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] BiocStyle_2.32.1
## 
## loaded via a namespace (and not attached):
##  [1] vctrs_0.6.5         svglite_2.1.3       cli_3.6.3          
##  [4] knitr_1.48          rlang_1.1.4         xfun_0.46          
##  [7] stringi_1.8.4       textshaping_0.3.6   jsonlite_1.8.8     
## [10] glue_1.8.0          colorspace_2.1-1    htmltools_0.5.8.1  
## [13] ragg_1.3.2          sass_0.4.9          scales_1.3.0       
## [16] rmarkdown_2.27      munsell_0.5.1       evaluate_0.24.0    
## [19] jquerylib_0.1.4     kableExtra_1.4.0    fastmap_1.2.0      
## [22] yaml_2.3.10         lifecycle_1.0.4     bookdown_0.40      
## [25] stringr_1.5.1       BiocManager_1.30.23 compiler_4.4.2     
## [28] fs_1.6.4            htmlwidgets_1.6.4   rstudioapi_0.16.0  
## [31] systemfonts_1.1.0   digest_0.6.36       viridisLite_0.4.2  
## [34] R6_2.5.1            magrittr_2.0.3      bslib_0.8.0        
## [37] tools_4.4.2         xml2_1.3.6          pkgdown_2.1.1      
## [40] cachem_1.1.0        desc_1.4.3