I have a dataframe like this:
dat <- data.frame(ID=c("12345", "54321", "12345", "54321", "99999", "12345"), cat1 = c(1,0,0,0,0,0), cat2= c(1,1,1,1,1,1), cat3= c(1,0,1,0,1,0), cat4=c(0,0,0,0,0,0))And I would like to spit out a table like this:
cat1 1cat2 3cat3 2cat4 0That counts how many unique ID's there are when the value of each column is 1. This is my code for doing this for a single column:
dat |> filter(cat1==1) |> summarise(n=n_distinct(ID))And this works fine. But I want to be able to do this across all needed columns at once. I'm attempting to write a function and then use summarize(across()) like this:
my_function <- function(data, column) { data |> filter({{column}}==1) |> summarise(n=n_distinct(ID))}And then apply this function to all needed columns like this:
dat |> summarise(across(c(cat1, cat2, cat3, cat4), ~my_function(.x)))I keep getting this error though:
Error insummarise():ℹ In argument:across(c(cat1, cat2, cat3, cat4), ~my_function(.x)).Caused by error inacross():! Can't compute columncat1.Caused by error inUseMethod():! no applicable method for 'filter' applied to an object of class "c('double', 'numeric')"
What am I doing wrong?
Thank you!
- 6You already have several solutions thatdo what you want. The answer to your actual question (What am I doing wrong?) is that the function passed to
acrossshould take a single argument, which represents the column being manipulated. Your function has two parameters, the first of which is assumed to be the dataframe in which the column exists. Hence the error. More fundamentally, consider making your data frame tidy. It isn't at the moment because information you need is contained in the columns' names, not in their values.Limey– Limey2025-02-25 15:06:47 +00:00CommentedFeb 25 at 15:06 - Thank you - this was the explanation I was looking for even if some of the other solutions work.lemnbalm– lemnbalm2025-02-25 16:12:55 +00:00CommentedFeb 25 at 16:12
5 Answers5
Expanding on my comment and providing a tidy solution...
You already have several solutions that do what you want. The answer to your actual question (What am I doing wrong?) is that the function passed to across should take a single argument, which represents the column being manipulated. Your function has two parameters, the first of which is assumed to be the dataframe in which the column exists. Hence the error.
This means that your current approach won't work in this case. Your summary function needs access to both the column name and its values. That's why your data frame isn't tidy. There are other ways to get what you want usingsummarise andacross, but they won't be pretty because the code would be fighting against the fundamental assumption of the tidyverse - that it is working with tidy data.
You can learn more about tidy datahere.
Here is a tidy approach to your problem.
First, tidy your data:
long_dat <- dat %>% pivot_longer( starts_with("cat"), values_to = "value", names_to = "column" ) long_dat# A tibble: 24 × 3 ID column value <chr> <chr> <dbl> 1 12345 cat1 1 2 12345 cat2 1 3 12345 cat3 1 4 12345 cat4 0 5 54321 cat1 0 6 54321 cat2 1 7 54321 cat3 0 8 54321 cat4 0 9 12345 cat1 010 12345 cat2 1Now, summarise the observations that meet your criterion:
long_dat %>% filter(value == 1) %>% group_by(column) %>% summarise(n = length(unique(ID)))# A tibble: 3 × 2 column n <chr> <int>1 cat1 12 cat2 33 cat3 2To obtain zero counts for categories that have no rows that meet your criterion, a little more work is required.
long_dat %>% filter(value == 1) %>% group_by(column) %>% summarise(n = length(unique(ID))) %>% right_join(long_dat %>% distinct(column), by = "column") %>% replace_na(list(n = 0))# A tibble: 4 × 2 column n <chr> <int>1 cat1 12 cat2 33 cat3 24 cat4 0Comments
You could do this quickly in base R usingsapply:
sapply(dat[,-1], \(x) length(unique(dat$ID[x == 1])))# cat1 cat2 cat3 cat4 # 1 3 2 0This outputs a names vector of integers. If you wanted in in a table structure, wrap the whole thing inas.table()
Comments
In base R you could do:
a <- col(dat[-1])[!duplicated(dat$ID) & unlist(dat[-1]) == 1]table(factor(names(dat[-1]))[a])cat1 cat2 cat3 cat4 1 3 2 0Comments
Do you mean this?
> dat %>%+ reframe(across(everything(), sum), .by = ID) ID cat1 cat2 cat3 cat41 12345 1 3 2 02 54321 0 2 0 03 99999 0 1 1 0or probably more likely to be the solution as @Onyambu suggested
> dat %>%+ reframe(across(everything(), sum), .by = ID) %>%+ reframe(across(-ID, ~ sum(.x > 0))) cat1 cat2 cat3 cat41 1 3 2 02 Comments
%>% summarise(across(-ID, ~sum(.x>0))) is sufficientThis is whereaggregate is made for
> aggregate(.~ID, dat, sum) ID cat1 cat2 cat3 cat41 12345 1 3 2 02 54321 0 2 0 03 99999 0 1 1 0or probably more likely to be the solution as @Onyambu suggested
> colSums(aggregate(.~ID, dat, sum)[-1] > 0)cat1 cat2 cat3 cat4 1 3 2 0No need for any dependencies.
Explore related questions
See similar questions with these tags.

