Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running the CellTypist training function celltypist.train on a subset of genes #107

Open
dkapadia612 opened this issue Feb 15, 2024 · 1 comment

Comments

@dkapadia612
Copy link

I would like to train a CellTypist model to identify certain cell types with a specific gene set. I tried feeding the function a list of genes using the 'genes' argument but it still trained using all features. Besides only keeping the select genes in the adata.var, are there any other approaches to make this work? Additionally, does training the model on <50 genes affect the accuracy of prediction of the trained celltypist model, or is there a threshold gene count below which you wouldn't recommend training a model? I would appreciate any help you can provide!

@ChuanXu1
Copy link
Collaborator

@dkapadia612, you can train the model using any numbers of genes. There is no definitive relationship between the accuracy of the model and the number of genes (for example, a dataset with clearly distinct cell types may only rely on a handful of genes). To train the model using a subset of genes, you can use model = celltypist.train(adata[:, a_subset_genes], check_expression = False, ...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants