data.table column creation with `:=` and double indexing

Question 1

I am processing adata.table instance and looking to create an extra column using:= this worked fine until I did some double indexing.

For the following instance of adata.table:

example_data= structure(list(V1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,1L, 1L),                              .Label = c("AUDUSD", "EURUSD", "GBPUSD", "NZDUSD", "USDCAD","USDJPY"), class = "factor"),                              V2 = structure(c(1L, 1L, 1L, 1L,1L, 1L, 1L, 1L, 1L, 1L),                               .Label = c("2007", "2008", "2009", "2010", "2011", "2012", "2013", "2014"),                              class = "factor"),                              V3 = c("1.RData", "10.RData", "11.RData", "12.RData", "2.RData", "3.RData", "4.RData", "5.RData", "6.RData", "7.RData")),                         .Names = c("V1", "V2", "V3"),                         class = c("data.table", "data.frame"), row.names = c(NA, -10L))

which gives this data:

example_data V1 V2 V3 1: AUDUSD 2007 1.RData 2: AUDUSD 2007 10.RData 3: AUDUSD 2007 11.RData 4: AUDUSD 2007 12.RData 5: AUDUSD 2007 2.RData 6: AUDUSD 2007 3.RData 7: AUDUSD 2007 4.RData 8: AUDUSD 2007 5.RData 9: AUDUSD 2007 6.RData10: AUDUSD 2007 7.RData

I am looking to split the "V3" column on a "." and get the preceding number as a character in a new column in the same table.

Doing this is straightforward in normal R:

example_data$MONTH = apply(example_data,1, function(x) { strsplit(as.character(x[["V3"]]),"\\.")[[1]][1]})

I thought that doing this indata.table would be even more straightforward:

example_data[,MONTH:=strsplit(as.character(V3),"\\.")[[1]][1]]

However the double indexing is not being interpreted as I intended, because it is changing all values to the outcome of the first row. Removing the indexing does perform the correct operation (just not extracting and placing the data in the right place):

example_data[,strsplit(as.character(V3),"\\.")]

I also attempted tointernalize the indexing by applying a function but got to the same wrong result:

myfunc <- function(x) { strsplit(as.character(x),"\\.")[[1]][1] }example_data[,MONTH:=myfunc(V3)]

I can always use the standard R solution but if anyone knows of adata.table based solution that would be appreciated. I am not interested in other standard R or(d)plyr based alternatives (they are great - just not what I am asking).

Question 2

You need the sameapply (or bettersapply) to get the first element of each list. But we should provide a function to do this conveniently. Could you please file an issuehere?

Question 3

I will,data.table is so much quicker than some other options it is worth the syntax learning curve.

Question 4

You should usesub instead ofstrsplit:

example_data[ , MONTH := sub("\\..*", "", V3)]        V1   V2       V3 MONTH 1: AUDUSD 2007  1.RData     1 2: AUDUSD 2007 10.RData    10 3: AUDUSD 2007 11.RData    11 4: AUDUSD 2007 12.RData    12 5: AUDUSD 2007  2.RData     2 6: AUDUSD 2007  3.RData     3 7: AUDUSD 2007  4.RData     4 8: AUDUSD 2007  5.RData     5 9: AUDUSD 2007  6.RData     610: AUDUSD 2007  7.RData     7

However, it works withstrsplit too:

example_data[ , MONTH := unlist(strsplit(V3, "\\..*"))]

Question 5

Sven, that's great! But note thatstrsplit() returns a list. You might have tounlist().

Question 6

@Arun Agreed. I addedunlist.

Question 7

+1. ThecSplit approach from "splitstackshape" doesn't save us any work... :-(cSplit(example_data, "V3", "\\..*", drop = FALSE, type.convert = FALSE, fixed = FALSE)

Question 8

Just pushed two functionstranspose() andtstrsplit() indata.table v1.9.5.

With this we can do:

require(data.table)setDT(example_data)[, col := tstrsplit(V3, ".", fixed=TRUE)[[1L]]]#         V1   V2       V3 col#  1: AUDUSD 2007  1.RData   1#  2: AUDUSD 2007 10.RData  10#  3: AUDUSD 2007 11.RData  11#  4: AUDUSD 2007 12.RData  12#  5: AUDUSD 2007  2.RData   2#  6: AUDUSD 2007  3.RData   3#  7: AUDUSD 2007  4.RData   4#  8: AUDUSD 2007  5.RData   5#  9: AUDUSD 2007  6.RData   6# 10: AUDUSD 2007  7.RData   7

tstrsplit is a wrapper fortranspose(strsplit(...)).transpose() can also be used onlists,data frames anddata tables. Please check the documentation and examples for more.

Question 9

Now that is service for you - thanks - it is much appreciated.

Sven Hohenstein 82k17 gold badges150 silver badges173 bronze badges · Accepted Answer · 2014-11-15 19:57:34Z

You should usesub instead ofstrsplit:

example_data[ , MONTH := sub("\\..*", "", V3)]        V1   V2       V3 MONTH 1: AUDUSD 2007  1.RData     1 2: AUDUSD 2007 10.RData    10 3: AUDUSD 2007 11.RData    11 4: AUDUSD 2007 12.RData    12 5: AUDUSD 2007  2.RData     2 6: AUDUSD 2007  3.RData     3 7: AUDUSD 2007  4.RData     4 8: AUDUSD 2007  5.RData     5 9: AUDUSD 2007  6.RData     610: AUDUSD 2007  7.RData     7

However, it works withstrsplit too:

example_data[ , MONTH := unlist(strsplit(V3, "\\..*"))]

Sven, that's great! But note thatstrsplit() returns a list. You might have tounlist().
+1. ThecSplit approach from "splitstackshape" doesn't save us any work... :-(cSplit(example_data, "V3", "\\..*", drop = FALSE, type.convert = FALSE, fixed = FALSE)

Movatterモバイル変換

Collectives™ on Stack Overflow

data.table column creation with `:=` and double indexing

2 Answers2

3 Comments

1 Comment

Your Answer

Sign up orlog in

Post as a guest

Related

Hot Network Questions

Subscribe to RSS