drcme.bin.run_refine_unstable_coclusters¶
Script to merge cells from unstable clusters into the most similar stable ones.
The script identifies unstable clusters by their Jaccard coefficients falling below a specified
threshold (unstable_threshold
). It then determines how similar the unstable cluster is
to other stable clusters. If it is too dissimilar, it is kept as its own cluster. Otherwise,
the unstable cluster is dissolved and its cells are assigned to their best-matching
clusters.
It determines whether to dissolve a cluster by calculating whether or not each cell in the
unstable cluster has a good match to a stable cluster (i.e., the co-clustering rate exceeds
coclust_threshold
). If enough of the cells of the unstable cluster have good matches
(the fraction of matching cells exceeds pct_needed
), the cluster is dissolved and the
cells of that cluster are reassigned to stable clusters.
- class drcme.bin.run_refine_unstable_coclusters.RefineParameters(extra=None, only=None, exclude=(), prefix='', strict=None, many=False, context=None, load_only=(), dump_only=(), partial=False)[source]¶
Parameter schema for unstable cluster refinement
This schema is designed to be a schema_type for an ArgSchemaParser object
¶ key
description
default
field_type
json_type
input_json
file path of input json file
NA
InputFile
str
output_json
file path to output json file
NA
OutputFile
str
log_level
set the logging level of the module
ERROR
LogLevel
str
cocluster_matrix_file
File path for co-clustering matrix
NA
InputFile
str
jaccards_file
File path for Jaccard coefficients
NA
InputFile
str
cluster_labels_file
File path for cluster labels
NA
InputFile
str
refined_labels_file
Output file path for refined cluster labels (as integers)
NA
OutputFile
str
refined_text_labels_file
Output file path for refined cluster labels (as strings with me_prefix)
NA
OutputFile
str
refined_ordering_file
Output file path for refined cluster labels
NA
OutputFile
str
unstable_threshold
Threshold for Jaccard coefficients to determine stability
0.5
Float
float
coclust_threshold
Threshold for co-clustering rate to be considered a match to another cluster
0.4
Float
float
pct_needed
Minimum fraction of matching cells to dissolve a cluster
0.33
Float
float
me_prefix
prefix for refined cluster text labels
NA
String
str
- drcme.bin.run_refine_unstable_coclusters.stable_match_rates(clust_labels, shared, stable_clusters)[source]¶
Calculate the co-clustering rates of cells within the stable clusters
- Parameters
clust_labels ((n, ) array) – Cluster labels for the n samples
shared ((n, n) array) – Co-clustering rates between all n samples
stable_clusters (list) – List of labels of the stable clusters
- Returns
Returns list of the average within-cluster co-clustering rates for every cell found within stable clusters
- Return type
list
- drcme.bin.run_refine_unstable_coclusters.match_rates_for_unstable_clusters(unstable_clusters, stable_clusters, clust_labels, shared, threshold)[source]¶
Calculate the fraction of cells in unstable clusters that match to a stable cluster.
The highest co-clustering rate with a stable cluster is calculated for each cell in an unstable cluster. If that rate exceeds
threshold
, that cell is categorized as matching another cluster. The fraction of matching cells is returned for each unstable cluster.- Parameters
unstable_clusters (list) – List of labels of the unstable clusters
stable_clusters (list) – List of labels of the stable clusters
clust_labels ((n, ) array) – Cluster labels for the n samples
shared ((n, n) array) – Co-clustering rates between all n samples
threshold (float) – Minimum co-clustering rate to be considered a match with another cluster
- Returns
Dictionary of unstable clusters (keys) and their fractions of matching cells (values)
- Return type
dict
- drcme.bin.run_refine_unstable_coclusters.new_labels_for_dissolved_cluster(cl, clust_labels, shared, stable_clusters)[source]¶
Relabel the cells in a dissolved cluster with their new assignments
- Parameters
cl (int) – Cluster that will be dissolved
clust_labels ((n, ) array) – Cluster labels for the n samples
shared ((n, n) array) – Co-clustering rates between all n samples
stable_clusters (list) – List of labels of the stable clusters
- Returns
Array with updated cluster labels
- Return type
(n, ) array
Functions
|
Main runner function for script. |
Calculate the fraction of cells in unstable clusters that match to a stable cluster. |
|
Relabel the cells in a dissolved cluster with their new assignments |
|
|
Calculate the co-clustering rates of cells within the stable clusters |
Classes
|
Parameter schema for unstable cluster refinement |