drcme.bin.run_refine_unstable_coclusters

Script to merge cells from unstable clusters into the most similar stable ones.

The script identifies unstable clusters by their Jaccard coefficients falling below a specified threshold (unstable_threshold). It then determines how similar the unstable cluster is to other stable clusters. If it is too dissimilar, it is kept as its own cluster. Otherwise, the unstable cluster is dissolved and its cells are assigned to their best-matching clusters.

It determines whether to dissolve a cluster by calculating whether or not each cell in the unstable cluster has a good match to a stable cluster (i.e., the co-clustering rate exceeds coclust_threshold). If enough of the cells of the unstable cluster have good matches (the fraction of matching cells exceeds pct_needed), the cluster is dissolved and the cells of that cluster are reassigned to stable clusters.

class drcme.bin.run_refine_unstable_coclusters.RefineParameters(extra=None, only=None, exclude=(), prefix='', strict=None, many=False, context=None, load_only=(), dump_only=(), partial=False)[source]

Parameter schema for unstable cluster refinement

This schema is designed to be a schema_type for an ArgSchemaParser object

RefineParameters

key

description

default

field_type

json_type

input_json

file path of input json file

NA

InputFile

str

output_json

file path to output json file

NA

OutputFile

str

log_level

set the logging level of the module

ERROR

LogLevel

str

cocluster_matrix_file

File path for co-clustering matrix

NA

InputFile

str

jaccards_file

File path for Jaccard coefficients

NA

InputFile

str

cluster_labels_file

File path for cluster labels

NA

InputFile

str

refined_labels_file

Output file path for refined cluster labels (as integers)

NA

OutputFile

str

refined_text_labels_file

Output file path for refined cluster labels (as strings with me_prefix)

NA

OutputFile

str

refined_ordering_file

Output file path for refined cluster labels

NA

OutputFile

str

unstable_threshold

Threshold for Jaccard coefficients to determine stability

0.5

Float

float

coclust_threshold

Threshold for co-clustering rate to be considered a match to another cluster

0.4

Float

float

pct_needed

Minimum fraction of matching cells to dissolve a cluster

0.33

Float

float

me_prefix

prefix for refined cluster text labels

NA

String

str

drcme.bin.run_refine_unstable_coclusters.stable_match_rates(clust_labels, shared, stable_clusters)[source]

Calculate the co-clustering rates of cells within the stable clusters

Parameters
  • clust_labels ((n, ) array) – Cluster labels for the n samples

  • shared ((n, n) array) – Co-clustering rates between all n samples

  • stable_clusters (list) – List of labels of the stable clusters

Returns

Returns list of the average within-cluster co-clustering rates for every cell found within stable clusters

Return type

list

drcme.bin.run_refine_unstable_coclusters.match_rates_for_unstable_clusters(unstable_clusters, stable_clusters, clust_labels, shared, threshold)[source]

Calculate the fraction of cells in unstable clusters that match to a stable cluster.

The highest co-clustering rate with a stable cluster is calculated for each cell in an unstable cluster. If that rate exceeds threshold, that cell is categorized as matching another cluster. The fraction of matching cells is returned for each unstable cluster.

Parameters
  • unstable_clusters (list) – List of labels of the unstable clusters

  • stable_clusters (list) – List of labels of the stable clusters

  • clust_labels ((n, ) array) – Cluster labels for the n samples

  • shared ((n, n) array) – Co-clustering rates between all n samples

  • threshold (float) – Minimum co-clustering rate to be considered a match with another cluster

Returns

Dictionary of unstable clusters (keys) and their fractions of matching cells (values)

Return type

dict

drcme.bin.run_refine_unstable_coclusters.new_labels_for_dissolved_cluster(cl, clust_labels, shared, stable_clusters)[source]

Relabel the cells in a dissolved cluster with their new assignments

Parameters
  • cl (int) – Cluster that will be dissolved

  • clust_labels ((n, ) array) – Cluster labels for the n samples

  • shared ((n, n) array) – Co-clustering rates between all n samples

  • stable_clusters (list) – List of labels of the stable clusters

Returns

Array with updated cluster labels

Return type

(n, ) array

Functions

main(cocluster_matrix_file, jaccards_file, …)

Main runner function for script.

match_rates_for_unstable_clusters(…)

Calculate the fraction of cells in unstable clusters that match to a stable cluster.

new_labels_for_dissolved_cluster(cl, …)

Relabel the cells in a dissolved cluster with their new assignments

stable_match_rates(clust_labels, shared, …)

Calculate the co-clustering rates of cells within the stable clusters

Classes

RefineParameters([extra, only, exclude, …])

Parameter schema for unstable cluster refinement