Skip to content

Add BVBRC statistics R Markdown document#13

Open
AbhirupaGhosh wants to merge 2 commits into
mainfrom
BVBRC_stats
Open

Add BVBRC statistics R Markdown document#13
AbhirupaGhosh wants to merge 2 commits into
mainfrom
BVBRC_stats

Conversation

@AbhirupaGhosh
Copy link
Copy Markdown
Contributor

Description

What kind of change(s) are included?

  • Feature (adds or updates new capabilities)
  • Bug fix (fixes an issue).
  • Enhancement (adds functionality).
  • Breaking change (these changes would cause existing functionality to not work as expected).

Checklist

Please ensure that all boxes are checked before indicating that this pull request is ready for review.

  • I have read and followed the CONTRIBUTING.md guidelines.
  • I have searched for existing content to ensure this is not a duplicate.
  • I have performed a self-review of these additions (including spelling, grammar, and related).
  • I have added comments to my code to help provide understanding.
  • I have added a test which covers the code changes found within this PR.
  • I have deleted all non-relevant text in this pull request template.
  • Reviewer assignment: Tag a relevant team member to review and approve the changes.

Copy link
Copy Markdown
Member

@jananiravi jananiravi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs the following to serve as a vignette

  • to reside in vignettes/
  • call specific tidyverse packages (bioc requirement)
  • check path for bvbrc_data.tsv & bvbrc vs. bvbrc_clean
  • use of named r chunks
  • text description between chunks
  • missing YAML
  • documentation
    cc: @AbhirupaGhosh @eboyer221

@jananiravi jananiravi requested a review from eboyer221 April 13, 2026 20:43
Copy link
Copy Markdown
Member

@jananiravi jananiravi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs annotation (text around code chunks)

Copy link
Copy Markdown
Contributor

@eboyer221 eboyer221 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested code changes to format the vignette shown in review.
Additional comment : The Rmd should live at vignettes/BVBRC_stats.Rmd rather than doc/ (R packages auto-generated doc/ from vignettes/ at build time).

Comment thread doc/BVBRC_stats.Rmd
```{r setup, include=FALSE}
library(tidyverse)

bvbrc <- read.table("bvbrc_data.tsv", sep = "\t", header = TRUE, fill = TRUE,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this file , 'bvbrc_data.tsv', coming from? does the user export it from BV-BRC? Right now I don't think the Rmd will work for anyone without that file.

Comment thread doc/BVBRC_stats.Rmd
Comment on lines +1 to +6
---
title: "bvbrc_stats"
author: "Abhirupa"
date: "`r Sys.Date()`"
output: html_document
---
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
---
title: "bvbrc_stats"
author: "Abhirupa"
date: "`r Sys.Date()`"
output: html_document
---
---
title: "BV-BRC Metadata Statistics"
author: "Abhirupa Ghosh"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{BV-BRC Metadata Statistics}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---

Bioconductor vignette-style header with the vignette: block so R recognizes it as a package vignette.

Comment thread doc/BVBRC_stats.Rmd
---

```{r setup, include=FALSE}
library(tidyverse)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
library(tidyverse)
library(dplyr)
library(stringr)
library(purrr)
library(tibble)

Bioconductor does not want library(tidyverse) import if we can just import the parts that are actually used.

Comment thread doc/BVBRC_stats.Rmd
Comment on lines +88 to +96
col_stats <- tibble(column = colnames(bvbrc)) %>%
mutate(
n_total = nrow(bvbrc),
n_missing = map_int(column, ~ sum(is.na(bvbrc[[.x]]))),
n_present = n_total - n_missing,
pct_missing = 100 * n_missing / n_total,
n_distinct_non_missing = map_int(column, ~ n_distinct(bvbrc[[.x]], na.rm = TRUE))
) %>%
arrange(desc(pct_missing), column)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
col_stats <- tibble(column = colnames(bvbrc)) %>%
mutate(
n_total = nrow(bvbrc),
n_missing = map_int(column, ~ sum(is.na(bvbrc[[.x]]))),
n_present = n_total - n_missing,
pct_missing = 100 * n_missing / n_total,
n_distinct_non_missing = map_int(column, ~ n_distinct(bvbrc[[.x]], na.rm = TRUE))
) %>%
arrange(desc(pct_missing), column)
col_stats <- tibble(column = colnames(bvbrc_clean)) %>%
mutate(
n_total = nrow(bvbrc_clean),
n_missing = map_int(column, ~ sum(is.na(bvbrc_clean[[.x]]))),
n_present = n_total - n_missing,
pct_missing = 100 * n_missing / n_total,
n_distinct_non_missing = map_int(column, ~ n_distinct(bvbrc_clean[[.x]], na.rm = TRUE))
) %>%
arrange(desc(pct_missing), column)

currently, col_stats is using bvbrc, but the stats below use bvbrc_clean so that the missing values are accounted for. Suggesting changing from bvbrc -> bvbrc_clean here so that it is consistent if that is appropriate.

Comment thread doc/BVBRC_stats.Rmd
arrange(desc(pct_missing), column)
```

```{r}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
```{r}
``````{r col_stats_output}

Comment thread doc/BVBRC_stats.Rmd
# --- Outputs -----------------------------------------------------------------
cat("\n--- Per-column stats (whole table) ---\n"); print(col_stats, n = Inf)
```
```{r}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
```{r}
``````{r host_block_output}

Comment thread doc/BVBRC_stats.Rmd
```{r}
cat("\n--- Host block stats ---\n"); print(host_block_stats)
```
```{r}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
```{r}
``````{r geo_block_output}

Comment thread doc/BVBRC_stats.Rmd
```{r}
cat("\n--- Geographic block stats ---\n"); print(geo_block_stats)
```
```{r}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
```{r}
``````{r diag_output}

Comment thread doc/BVBRC_stats.Rmd
)
```

```{r functions}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each of these code chunks needs a paragraph saying what it computes and why. Need to add a paragraph before this functions chunk, before stats (line 84), and before each of the four output chunks (lines 134-145).

Copy link
Copy Markdown
Contributor

@eboyer221 eboyer221 May 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes should resolve #13 (review)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants