Compute external metrics for fuzzy clusterings
Source:R/getFuzzyPartitionMetrics.R
getFuzzyPartitionMetrics.RdComputes a selection of external fuzzy clustering evaluation metrics.
Usage
getFuzzyPartitionMetrics(
hardTrue = NULL,
fuzzyTrue = NULL,
hardPred = NULL,
fuzzyPred = NULL,
metrics = c("fuzzyWH", "fuzzyAWH", "fuzzyWC", "fuzzyAWC"),
level = "class",
nperms = NULL,
verbose = TRUE,
returnElementPairAccuracy = FALSE,
BPPARAM = BiocParallel::SerialParam(),
useNegatives = TRUE,
usePairs = NULL,
...
)Arguments
- hardTrue
An atomic vector coercible to a factor or integer vector containing the true hard labels.
- fuzzyTrue
A object coercible to a numeric matrix with membership probability of elements (rows) in clusters (columns).
- hardPred
An atomic vector coercible to a factor or integer vector containing the predicted hard labels.
- fuzzyPred
A object coercible to a numeric matrix with membership probability of elements (rows) in clusters (columns).
- metrics
The metrics to compute. See details.
- level
The level to calculate the metrics. Options include
"element","class"and"dataset".- nperms
The number of permutations (for correction for chance). If NULL (default), a first set of 10 permutations will be run to estimate whether the variation across permutations is above 0.0025, in which case more (max 1000) permutations will be run.
- verbose
Logical; whether to print info and warnings, including the standard error of the mean across permutations (giving an idea of the precision of the adjusted metrics).
- BPPARAM
BiocParallel params for multithreading (default none)
- useNegatives
Logical; whether to include negative pairs in the concordance score (tends to result in a larger overall concordance and lower dynamic range of the score). Default TRUE.
- usePairs
Logical; whether to compute over pairs instead of elements Recommended and TRUE by default.
- lowMemory
Logical, whether to use a low memory mode. This is only useful when
hardTrueandfuzzyPredis used. IfTRUE, the function will compute the metrics in a low memory mode, which is slower but uses less memory. IfFALSE, the function will compute the metrics in a high memory mode, which is faster but uses more memory. By default it is set automatically based on the size of the input data. SeefuzzyHardMetrics.- ...
Optional arguments for
fuzzyPartitionMetrics:tnorm. Only useful whenfuzzyTrueandfuzzyPredis used.
Details
The allowed values for metrics depend on the value of level:
If
level = "element", the allowedmetricsare:"fuzzySPC".If
level = "class", the allowedmetricsare:"fuzzyWH","fuzzyAWH","fuzzyWC","fuzzyAWC".If
level = "dataset", the allowedmetricsare:"fuzzyRI","fuzzyARI","fuzzyWH","fuzzyAWH","fuzzyWC","fuzzyAWC".
Examples
# generate fuzzy partitions:
m1 <- matrix(c(0.95, 0.025, 0.025,
0.98, 0.01, 0.01,
0.96, 0.02, 0.02,
0.95, 0.04, 0.01,
0.95, 0.01, 0.04,
0.99, 0.005, 0.005,
0.025, 0.95, 0.025,
0.97, 0.02, 0.01,
0.025, 0.025, 0.95),
ncol = 3, byrow=TRUE)
m2 <- matrix(c(0.95, 0.025, 0.025,
0.98, 0.01, 0.01,
0.96, 0.02, 0.02,
0.025, 0.95, 0.025,
0.02, 0.96, 0.02,
0.01, 0.98, 0.01,
0.05, 0.05, 0.95,
0.02, 0.02, 0.96,
0.01, 0.01, 0.98),
ncol = 3, byrow=TRUE)
colnames(m1) <- colnames(m2) <- LETTERS[seq_len(3)]
getFuzzyPartitionMetrics(fuzzyTrue=m1,fuzzyPred=m2, level="class")
#> Comparing between a fuzzy truth and a fuzzy prediction...
#> Running 100 extra permutations.
#> Standard error of the mean NDC across permutations:0.00238
#> fuzzyWC fuzzyAWC class fuzzyWH fuzzyAWH cluster
#> 1 0.3445840 0.04491864 1 NA NA NA
#> 2 0.7242508 -0.03533695 2 NA NA NA
#> 3 0.7520319 0.06834937 3 NA NA NA
#> 4 NA NA NA 0.9359492 0.8310175 1
#> 5 NA NA NA 0.9214151 0.8267223 2
#> 6 NA NA NA 0.1588990 -1.2635621 3
# generate a fuzzy truth:
fuzzyTrue <- matrix(c(
0.95, 0.025, 0.025,
0.98, 0.01, 0.01,
0.96, 0.02, 0.02,
0.95, 0.04, 0.01,
0.95, 0.01, 0.04,
0.99, 0.005, 0.005,
0.025, 0.95, 0.025,
0.97, 0.02, 0.01,
0.025, 0.025, 0.95),
ncol = 3, byrow=TRUE)
# a hard truth:
hardTrue <- apply(fuzzyTrue,1,FUN=which.max)
# some predicted labels:
hardPred <- c(1,1,1,1,1,1,2,2,2)
getFuzzyPartitionMetrics(hardPred=hardPred, hardTrue=hardTrue,
fuzzyTrue=fuzzyTrue, nperms=3, level="class")
#> Comparing between a fuzzy truth and a hard prediction...
#> Standard error of the mean NDC across permutations:0.00158
#> fuzzyWC fuzzyAWC class fuzzyWH fuzzyAWH cluster
#> 1 0.7195238 0.4967246 1 NA NA NA
#> 2 1.0000000 NaN 2 NA NA NA
#> 3 1.0000000 NaN 3 NA NA NA
#> 4 NA NA NA 1.00000000 1 1
#> 5 NA NA NA 0.06166667 -Inf 2
getFuzzyPartitionMetrics(hardTrue=hardPred, hardPred=hardTrue,
fuzzyPred=fuzzyTrue, nperms=3, level="class")
#> Comparing between a hard truth and a fuzzy prediction...
#> Standard error of the mean NDC across permutations:0.0353
#> You might want to increase the number of permutations to increase the robustness of the adjusted metrics.
#> fuzzyWC fuzzyAWC class fuzzyWH fuzzyAWH cluster
#> 1 1.00000000 1.000000 1 NA NA NA
#> 2 0.06166667 -3.540323 2 NA NA NA
#> 3 NA NA NA 0.7195238 0.46737 1
#> 4 NA NA NA 1.0000000 NaN 2
#> 5 NA NA NA 1.0000000 NaN 3