drcme.bin.run_rf_prediction

Script to predict type labels for new data using a random forest classifier and training data.

The electrophysiology files are split such that the reference_ephys_file contains data for training the classifier and prediction_ephys_file contains data for new predictions. The morph_file, if used, has morphology data for both sets of cells.

class drcme.bin.run_rf_prediction.RfPredictionParameters(extra=None, only=None, exclude=(), prefix='', strict=None, many=False, context=None, load_only=(), dump_only=(), partial=False)[source]

Parameter schema for random-forest prediction

This schema is designed to be a schema_type for an ArgSchemaParser object

RfPredictionParameters

key

description

default

field_type

json_type

input_json

file path of input json file

NA

InputFile

str

output_json

file path to output json file

NA

OutputFile

str

log_level

set the logging level of the module

ERROR

LogLevel

str

reference_ephys_file

Path to electrophysiology data file for reference cells

NA

InputFile

str

prediction_ephys_file

Path to electrophysiology data file for cells that will have predicted labels

NA

InputFile

str

reference_label_file

Path to type labels for reference cells

NA

InputFile

str

label_key

Column name of type label in ‘reference_label_file’

NA

String

str

morph_file

Path to morphology data file for all cells

None

InputFile

str

output_file

Path to output file with predicted labels

NA

OutputFile

str

ref_id_file

Path to file with subset of IDs for reference cells

None

InputFile

str

pred_id_file

Path to file with subset of IDs for predicted cells

None

InputFile

str

n_trees

Number of trees for random forest classifier

500

Integer

int

class_weight

Class weight parameter for random forest classifier

None

String

str

drcme.bin.run_rf_prediction.construct_datasets(ephys_ref, ephys_pred, morph_df)[source]

Build reference and test data sets

Parameters
  • ephys_ref (DataFrame) – DataFrame with reference electrophysiology data

  • ephys_pred (DataFrame) – DataFrame with electrophysiology data for label prediction

  • morph_df (DataFrame) – DataFrame with morphology data for all cells

Returns

  • ref_df (DataFrame) – Combined ephys/morph data set for reference cells

  • test_df (DataFrame) – Combined ephys/morph data set for cells that will have labels predicted

drcme.bin.run_rf_prediction.intersect_ephys_morph(ephys_df, morph_df)[source]

Make combined DataFrame with shared cells from ephys_df and morph_df

Parameters
  • ephys_df (DataFrame) – DataFrame with electrophysiology data

  • morph_df (DataFrame) – DataFrame with morphology data

Returns

Combined ephys/morph data set

Return type

DataFrame

Functions

construct_datasets(ephys_ref, ephys_pred, …)

Build reference and test data sets

intersect_ephys_morph(ephys_df, morph_df)

Make combined DataFrame with shared cells from ephys_df and morph_df

main(reference_ephys_file, …)

Main runner function for script.

Classes

RfPredictionParameters([extra, only, …])

Parameter schema for random-forest prediction