Skip to content

some doubt about DCA's result on Usoskin dataset  #2

@yuxiaokang-source

Description

@yuxiaokang-source

I try to reproduce the DCA's result on Usoskin dataset compared to Autoclass, and I found my result is very different from the result(figure4 in the paper of AutoClass, I download Usoskin dataset in https://github.com/BMILAB/scLINE/blob/master/data/Usoskin.Rdata

and run following code in jupyter

# make sure that rpy2 is installed: https://rpy2.bitbucket.io/
%load_ext rpy2.ipython

import warnings
from rpy2.rinterface import RRuntimeWarning

# ignore R warning messages
warnings.filterwarnings("ignore", category=RRuntimeWarning)
%%R -o counts  -o cellinfo -o geneinfo

load("./Usoskin.Rdata")

exp_mat=Usoskin$rawdata
print(dim(exp_mat))
label = data.frame(Usoskin$label)
rownames(label)=colnames(exp_mat)

print(table(label))

counts = t(exp_mat)
cellinfo = label
geneinfo = rownames(exp_mat)

image

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import scanpy as sc
from dca.api import dca

import scanpy as sc 
adata = sc.AnnData(counts, obs=cellinfo)
adata.var_names = np.array(geneinfo)
adata.obs["Group"] = adata.obs["Usoskin.label"].astype(str).astype("category")
adata.obs["Batch"] = "1"

#adata.X = adata.X.astype(int)
adata.X = adata.X.astype(int).astype(float)
sc.pp.filter_genes(adata, min_counts=1)

adata.raw =adata
sc.pp.filter_genes(adata, min_counts=1)
sc.pp.normalize_per_cell(adata)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata,n_top_genes=1000)
hvg = list(adata.var_names[adata.var.highly_variable])
print(adata)

adata = adata.raw.to_adata()
adata = adata[:,hvg].copy()
print(adata)


dca(adata, threads=1)

print(adata)
sc.pp.normalize_per_cell(adata)
sc.pp.log1p(adata)
sc.pp.neighbors(adata)
sc.tl.umap(adata)
sc.tl.tsne(adata)

sc.pl.tsne(adata, color='Group', size=20, title='Denoised counts')
sc.pl.umap(adata, color="Group", size =20,title="Denoised counts")

my result is below
image

import matplotlib.pyplot as plt
from sklearn.metrics import silhouette_score

sil_dca = silhouette_score(adata.obsm["X_tsne"],adata.obs["Group"])
print(sil_dca)

my result is blow
image
could you give some suggestions about what causes this different result( DCA's silhoutte_score is only 0.1 in paper)?Or am I doing something wrong about DCA??

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions