1

I am processing adata.table instance and looking to create an extra column using:= this worked fine until I did some double indexing.

For the following instance of adata.table:

example_data= structure(list(V1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,1L, 1L),                              .Label = c("AUDUSD", "EURUSD", "GBPUSD", "NZDUSD", "USDCAD","USDJPY"), class = "factor"),                              V2 = structure(c(1L, 1L, 1L, 1L,1L, 1L, 1L, 1L, 1L, 1L),                               .Label = c("2007", "2008", "2009", "2010", "2011", "2012", "2013", "2014"),                              class = "factor"),                              V3 = c("1.RData", "10.RData", "11.RData", "12.RData", "2.RData", "3.RData", "4.RData", "5.RData", "6.RData", "7.RData")),                         .Names = c("V1", "V2", "V3"),                         class = c("data.table", "data.frame"), row.names = c(NA, -10L))

which gives this data:

example_data V1 V2 V3 1: AUDUSD 2007 1.RData 2: AUDUSD 2007 10.RData 3: AUDUSD 2007 11.RData 4: AUDUSD 2007 12.RData 5: AUDUSD 2007 2.RData 6: AUDUSD 2007 3.RData 7: AUDUSD 2007 4.RData 8: AUDUSD 2007 5.RData 9: AUDUSD 2007 6.RData10: AUDUSD 2007 7.RData

I am looking to split the "V3" column on a "." and get the preceding number as a character in a new column in the same table.

Doing this is straightforward in normal R:

example_data$MONTH = apply(example_data,1, function(x) { strsplit(as.character(x[["V3"]]),"\\.")[[1]][1]})

I thought that doing this indata.table would be even more straightforward:

example_data[,MONTH:=strsplit(as.character(V3),"\\.")[[1]][1]]

However the double indexing is not being interpreted as I intended, because it is changing all values to the outcome of the first row. Removing the indexing does perform the correct operation (just not extracting and placing the data in the right place):

example_data[,strsplit(as.character(V3),"\\.")]

I also attempted tointernalize the indexing by applying a function but got to the same wrong result:

myfunc <- function(x) { strsplit(as.character(x),"\\.")[[1]][1] }example_data[,MONTH:=myfunc(V3)]

I can always use the standard R solution but if anyone knows of adata.table based solution that would be appreciated. I am not interested in other standard R or(d)plyr based alternatives (they are great - just not what I am asking).

askedNov 15, 2014 at 19:32
crogg01's user avatar
2
  • You need the sameapply (or bettersapply) to get the first element of each list. But we should provide a function to do this conveniently. Could you please file an issuehere?CommentedNov 15, 2014 at 19:46
  • I will,data.table is so much quicker than some other options it is worth the syntax learning curve.CommentedNov 15, 2014 at 19:57

2 Answers2

3

You should usesub instead ofstrsplit:

example_data[ , MONTH := sub("\\..*", "", V3)]        V1   V2       V3 MONTH 1: AUDUSD 2007  1.RData     1 2: AUDUSD 2007 10.RData    10 3: AUDUSD 2007 11.RData    11 4: AUDUSD 2007 12.RData    12 5: AUDUSD 2007  2.RData     2 6: AUDUSD 2007  3.RData     3 7: AUDUSD 2007  4.RData     4 8: AUDUSD 2007  5.RData     5 9: AUDUSD 2007  6.RData     610: AUDUSD 2007  7.RData     7

However, it works withstrsplit too:

example_data[ , MONTH := unlist(strsplit(V3, "\\..*"))]
answeredNov 15, 2014 at 19:44
Sven Hohenstein's user avatar
Sign up to request clarification or add additional context in comments.

3 Comments

Sven, that's great! But note thatstrsplit() returns a list. You might have tounlist().
@Arun Agreed. I addedunlist.
+1. ThecSplit approach from "splitstackshape" doesn't save us any work... :-(cSplit(example_data, "V3", "\\..*", drop = FALSE, type.convert = FALSE, fixed = FALSE)
1

Just pushed two functionstranspose() andtstrsplit() indata.table v1.9.5.

With this we can do:

require(data.table)setDT(example_data)[, col := tstrsplit(V3, ".", fixed=TRUE)[[1L]]]#         V1   V2       V3 col#  1: AUDUSD 2007  1.RData   1#  2: AUDUSD 2007 10.RData  10#  3: AUDUSD 2007 11.RData  11#  4: AUDUSD 2007 12.RData  12#  5: AUDUSD 2007  2.RData   2#  6: AUDUSD 2007  3.RData   3#  7: AUDUSD 2007  4.RData   4#  8: AUDUSD 2007  5.RData   5#  9: AUDUSD 2007  6.RData   6# 10: AUDUSD 2007  7.RData   7

tstrsplit is a wrapper fortranspose(strsplit(...)).transpose() can also be used onlists,data frames anddata tables. Please check the documentation and examples for more.

answeredJan 28, 2015 at 3:13
Arun's user avatar

1 Comment

Now that is service for you - thanks - it is much appreciated.

Your Answer

Sign up orlog in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

By clicking “Post Your Answer”, you agree to ourterms of service and acknowledge you have read ourprivacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.