some doubt about DCA's result on Usoskin dataset  

I try to reproduce the `DCA`'s result on Usoskin dataset compared to `Autoclass`, and I found my result is very different from the result(figure4 in the paper of `AutoClass`, I download Usoskin dataset  in `https://github.com/BMILAB/scLINE/blob/master/data/Usoskin.Rdata`

and run following code in jupyter
```
# make sure that rpy2 is installed: https://rpy2.bitbucket.io/
%load_ext rpy2.ipython

import warnings
from rpy2.rinterface import RRuntimeWarning

# ignore R warning messages
warnings.filterwarnings("ignore", category=RRuntimeWarning)
```

```
%%R -o counts  -o cellinfo -o geneinfo

load("./Usoskin.Rdata")

exp_mat=Usoskin$rawdata
print(dim(exp_mat))
label = data.frame(Usoskin$label)
rownames(label)=colnames(exp_mat)

print(table(label))

counts = t(exp_mat)
cellinfo = label
geneinfo = rownames(exp_mat)
```
![image](https://github.com/datapplab/AutoClass/assets/59059267/09b795c8-535b-4007-92e1-c17869e04ee8)


```
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import scanpy as sc
from dca.api import dca

import scanpy as sc 
adata = sc.AnnData(counts, obs=cellinfo)
adata.var_names = np.array(geneinfo)
adata.obs["Group"] = adata.obs["Usoskin.label"].astype(str).astype("category")
adata.obs["Batch"] = "1"

#adata.X = adata.X.astype(int)
adata.X = adata.X.astype(int).astype(float)
sc.pp.filter_genes(adata, min_counts=1)

adata.raw =adata
sc.pp.filter_genes(adata, min_counts=1)
sc.pp.normalize_per_cell(adata)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata,n_top_genes=1000)
hvg = list(adata.var_names[adata.var.highly_variable])
print(adata)

adata = adata.raw.to_adata()
adata = adata[:,hvg].copy()
print(adata)


dca(adata, threads=1)

print(adata)
sc.pp.normalize_per_cell(adata)
sc.pp.log1p(adata)
sc.pp.neighbors(adata)
sc.tl.umap(adata)
sc.tl.tsne(adata)

sc.pl.tsne(adata, color='Group', size=20, title='Denoised counts')
sc.pl.umap(adata, color="Group", size =20,title="Denoised counts")
```
my result is below
![image](https://github.com/datapplab/AutoClass/assets/59059267/29dd56ac-18a4-41e0-8774-cb8c6121da4a)


```
import matplotlib.pyplot as plt
from sklearn.metrics import silhouette_score

sil_dca = silhouette_score(adata.obsm["X_tsne"],adata.obs["Group"])
print(sil_dca)
```
my result is blow
![image](https://github.com/datapplab/AutoClass/assets/59059267/3a64f852-60d0-4b5e-943d-f8e4c9429b7e)
could you give some suggestions about what causes this  different result( DCA's silhoutte_score is only 0.1 in paper)?Or am I doing something wrong about DCA?? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

some doubt about DCA's result on Usoskin dataset #2

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

some doubt about DCA's result on Usoskin dataset #2

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions