4

I have a dataframe like this:

dat <- data.frame(ID=c("12345", "54321", "12345", "54321", "99999", "12345"),                  cat1 = c(1,0,0,0,0,0),                  cat2= c(1,1,1,1,1,1),                  cat3= c(1,0,1,0,1,0),                  cat4=c(0,0,0,0,0,0))

And I would like to spit out a table like this:

cat1 1cat2 3cat3 2cat4 0

That counts how many unique ID's there are when the value of each column is 1. This is my code for doing this for a single column:

dat |>  filter(cat1==1) |>  summarise(n=n_distinct(ID))

And this works fine. But I want to be able to do this across all needed columns at once. I'm attempting to write a function and then use summarize(across()) like this:

my_function <- function(data, column) {  data |>    filter({{column}}==1) |>    summarise(n=n_distinct(ID))}

And then apply this function to all needed columns like this:

dat |>  summarise(across(c(cat1, cat2, cat3, cat4), ~my_function(.x)))

I keep getting this error though:

Error insummarise():ℹ In argument:across(c(cat1, cat2, cat3, cat4), ~my_function(.x)).Caused by error inacross():! Can't compute columncat1.Caused by error inUseMethod():! no applicable method for 'filter' applied to an object of class "c('double', 'numeric')"

What am I doing wrong?

Thank you!

askedFeb 25 at 14:47
lemnbalm's user avatar
2
  • 6
    You already have several solutions thatdo what you want. The answer to your actual question (What am I doing wrong?) is that the function passed toacross should take a single argument, which represents the column being manipulated. Your function has two parameters, the first of which is assumed to be the dataframe in which the column exists. Hence the error. More fundamentally, consider making your data frame tidy. It isn't at the moment because information you need is contained in the columns' names, not in their values.CommentedFeb 25 at 15:06
  • Thank you - this was the explanation I was looking for even if some of the other solutions work.CommentedFeb 25 at 16:12

5 Answers5

4

Expanding on my comment and providing a tidy solution...

You already have several solutions that do what you want. The answer to your actual question (What am I doing wrong?) is that the function passed to across should take a single argument, which represents the column being manipulated. Your function has two parameters, the first of which is assumed to be the dataframe in which the column exists. Hence the error.

This means that your current approach won't work in this case. Your summary function needs access to both the column name and its values. That's why your data frame isn't tidy. There are other ways to get what you want usingsummarise andacross, but they won't be pretty because the code would be fighting against the fundamental assumption of the tidyverse - that it is working with tidy data.

You can learn more about tidy datahere.

Here is a tidy approach to your problem.

First, tidy your data:

long_dat <- dat %>%   pivot_longer(    starts_with("cat"),    values_to = "value",    names_to = "column"  ) long_dat# A tibble: 24 × 3   ID    column value   <chr> <chr>  <dbl> 1 12345 cat1       1 2 12345 cat2       1 3 12345 cat3       1 4 12345 cat4       0 5 54321 cat1       0 6 54321 cat2       1 7 54321 cat3       0 8 54321 cat4       0 9 12345 cat1       010 12345 cat2       1

Now, summarise the observations that meet your criterion:

long_dat %>%   filter(value == 1) %>%   group_by(column) %>%   summarise(n = length(unique(ID)))# A tibble: 3 × 2  column     n  <chr>  <int>1 cat1       12 cat2       33 cat3       2

To obtain zero counts for categories that have no rows that meet your criterion, a little more work is required.

long_dat %>%   filter(value == 1) %>%   group_by(column) %>%   summarise(n = length(unique(ID))) %>%   right_join(long_dat %>% distinct(column), by = "column") %>%   replace_na(list(n = 0))# A tibble: 4 × 2  column     n  <chr>  <int>1 cat1       12 cat2       33 cat3       24 cat4       0
answeredFeb 25 at 18:45
Limey's user avatar
Sign up to request clarification or add additional context in comments.

Comments

2

You could do this quickly in base R usingsapply:

sapply(dat[,-1], \(x) length(unique(dat$ID[x == 1])))# cat1 cat2 cat3 cat4 #    1    3    2    0

This outputs a names vector of integers. If you wanted in in a table structure, wrap the whole thing inas.table()

answeredFeb 25 at 14:59
jpsmith's user avatar

Comments

2

In base R you could do:

a <- col(dat[-1])[!duplicated(dat$ID) & unlist(dat[-1]) == 1]table(factor(names(dat[-1]))[a])cat1 cat2 cat3 cat4    1    3    2    0
answeredFeb 25 at 17:35
Onyambu's user avatar

Comments

1

Do you mean this?

> dat %>%+     reframe(across(everything(), sum), .by = ID)     ID cat1 cat2 cat3 cat41 12345    1    3    2    02 54321    0    2    0    03 99999    0    1    1    0

or probably more likely to be the solution as @Onyambu suggested

> dat %>%+     reframe(across(everything(), sum), .by = ID) %>%+     reframe(across(-ID, ~ sum(.x > 0)))  cat1 cat2 cat3 cat41    1    3    2    0
answeredFeb 25 at 14:55
ThomasIsCoding's user avatar

2 Comments

OP is interested in number of unique ID's. Perhaps adding%>% summarise(across(-ID, ~sum(.x>0))) is sufficient
@Onyambu aha, I think you are right. Thanks!
1

This is whereaggregate is made for

> aggregate(.~ID, dat, sum)     ID cat1 cat2 cat3 cat41 12345    1    3    2    02 54321    0    2    0    03 99999    0    1    1    0

or probably more likely to be the solution as @Onyambu suggested

> colSums(aggregate(.~ID, dat, sum)[-1] > 0)cat1 cat2 cat3 cat4    1    3    2    0

No need for any dependencies.

answeredFeb 25 at 15:04
Friede's user avatar

2 Comments

The code is incoplete. Note that OP is interested in number of unique IDs. Thus you should havecolSums( aggregate(.~ID, dat, sum)[-1] > 0)
@Onyambu, thank you.

Your Answer

Sign up orlog in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

By clicking “Post Your Answer”, you agree to ourterms of service and acknowledge you have read ourprivacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.