| Title: | A Package for Processing Lexical Response Data |
| Version: | 0.1.0 |
| Description: | Lexical response data is a package that can be used for processing cued-recall, free-recall, and sentence responses from memory experiments. |
| Depends: | R (≥ 3.5.0) |
| Imports: | stats, utils, knitr |
| Suggests: | ggplot2, rmarkdown, reshape |
| License: | LGPL-3 |
| Encoding: | UTF-8 |
| LazyData: | true |
| RoxygenNote: | 7.1.2 |
| VignetteBuilder: | knitr |
| URL: | https://npm27.github.io/lrd/ |
| NeedsCompilation: | no |
| Packaged: | 2021-12-06 21:49:29 UTC; nickm |
| Author: | Nicholas Maxwell |
| Maintainer: | Nicholas Maxwell <nicholas.maxwell@usm.edu> |
| Repository: | CRAN |
| Date/Publication: | 2021-12-09 09:50:02 UTC |
Answer Key Example Data
Description
Dataset that includes the answer key for free recall data.Pair with the wide_data dataset for examples.
Usage
data(answer_key_free)Format
A data frame of answers for a free recall test
Answer_Key: a list of free recall answers
Answer Key Example Data
Description
Dataset that includes the answer key for free recall data.Pair with the free_data dataset for examples.
Usage
data(answer_key_free2)Format
A data frame of answers for a free recall test
Answer_Key: a list of free recall answers
Arrange Data for Free Recall Scoring
Description
This function takes wide format free recall data where allresponses are stored in the same cell and converts it to long format.
Usage
arrange_data(data, responses, sep, id, repeated = NULL)Arguments
data | a dataframe of the variables you would like to return.Other variables will be included in the returned output in long formatif they represent a one to one match with the participant ID. If youhave repeated data, please use the repeated argument or run thisfunction several times for each trial. |
responses | a column name in the dataframe that containsthe participant answers for each item in quotes (i.e., "column") |
sep | a character separating each response in quotes - example: ",". |
id | a column name containing participant ID numbers fromthe original dataframe |
repeated | (optional) a single column name or set of columnsthat indicate repeated measures columns you would like tokeep with the data. You should include all columns that are not a oneto one match with the subject ID (i.e., participants saw multipletrials). Please see our vignette for an example. |
Value
A dataframe of the participant answers including:
Sub.ID | The participant id number |
response | The participant response |
position | The position number of the response listed |
other | Any additional columns included |
Examples
#This dataset includes a subject number, set of answers, and#experiment condition.data(wide_data)DF_long <- arrange_data( data = wide_data, responses = "Response", sep = ",", id = "Sub.ID")head(DF_long)Conditional Response Probability
Description
This function calculates the conditional responseprobability of each lag position. Participants' lagbetween subsequent named items is tallied and thendivided by the possible combination of subsequent lagsgiven their response pattern.
Usage
crp(data, position, answer, id, key, scored)Arguments
data | a dataframe of the scored free recall that you wouldlike to calculate - use prop_correct_free() for best formatting. |
position | a column name in the dataframe that containsanswered position of each response in quotes (i.e., "column") |
answer | a column name of the answer given for that positionin the original dataframe. |
id | a column name of the participant id in the originaldataframe. |
key | a vector containing the scoring key or data column name.This column does not have to be included in the original dataframe.We assume your answer key is in the tested position order. You shouldnot include duplicates in your answer key. |
scored | a column in the original dataframe indicating if theparticipant got the answer correct (1) or incorrect (0). |
Details
This output can then be used to create a CRP visualizations,and an example can be found in our manuscript/vignettes.
Important: The code is written assuming the data provided are fora single recall list. If repeated measures are used (i.e., there aremultiple lists completed by each participant or multiple list versions),you should use this function several times, once on each list/answer key.
Value
DF_CRP | A dataframe of the proportion correct for eachconditional lag position including any other between subjectsvariables present in the data. |
Examples
data(free_data)data(answer_key_free2)free_data <- subset(free_data, List_Type == "Cat_Recall_L1")DF_long <- arrange_data(data = free_data, responses = "Response", sep = " ", id = "Username")scored_output <- prop_correct_free( data = DF_long, responses = "response", key = answer_key_free2$Answer_Key, id = "Sub.ID", cutoff = 1, flag = TRUE, group.by = "Version")crp_output <- crp(data = scored_output$DF_Scored, position = "position", answer = "Answer", id = "Sub.ID", key = answer_key_free2$Answer_Key, scored = "Scored") head(crp_output)Conditional Response Probability for Multiple Lists
Description
This function calculates the conditional responseprobability of each lag position. Participants' lagbetween subsequent named items is tallied and thendivided by the possible combination of subsequent lagsgiven their response pattern. This function was designedto handle multiple or randomized lists across participants.
Usage
crp_multiple(data, position, answer, id, key, key.trial, id.trial, scored)Arguments
data | a dataframe of the scored free recall that you wouldlike to calculate - use prop_correct_free() for best formatting. |
position | a column name in the dataframe that containsanswered position of each response in quotes (i.e., "column") |
answer | a column name of the answer given for that positionin the original dataframe. |
id | a column name of the participant id in the originaldataframe. |
key | a vector containing the scoring key or data column name.This column does not have to be included in the original dataframe.We assume your answer key is in the tested position order. You shouldnot include duplicates in your answer key. |
key.trial | a vector containing the trial numbers for each answer.Note: If you input long data (i.e., repeating trial-answer responses),we will take the unique combination of the responses. If a trial numberis repeated, you will receive an error. Key and key.trial can also bea separate dataframe, depending on how your output data is formatted. |
id.trial | a column name containing the trial numbersfor the participant data from the original dataframe. Note thatthe free response "key" trial and this trial number should match.The trial key will be repeated for each answer a participant gave. |
scored | a column in the original dataframe indicating if theparticipant got the answer correct (1) or incorrect (0). |
Details
This output can then be used to create a CRP visualizations,and an example can be found in our manuscript/vignettes.
Value
DF_CRP | A dataframe of the proportion correct for eachconditional lag position including any other between subjectsvariables present in the data. |
Examples
data("multi_data")data("multi_answers")DF_long <- arrange_data(data = multi_data, responses = "Response", sep = " ", id = "Sub.ID", repeated = "List.Number")library(reshape)multi_answers$position <- 1:nrow(multi_answers)answer_long <- melt(multi_answers, measured = colnames(multi_answers), id = "position")colnames(answer_long) <- c("position", "List.ID", "Answer")answer_long$List.ID <- gsub(pattern = "List", replacement = "", x = answer_long$List.ID)DF_long$response <- tolower(DF_long$response)answer_long$Answer <- tolower(answer_long$Answer)answer_long$Answer <- gsub(" ", "", answer_long$Answer)scored_output <- prop_correct_multiple(data = DF_long, responses = "response", key = answer_long$Answer, key.trial = answer_long$List.ID, id = "Sub.ID", id.trial = "List.Number", cutoff = 1, flag = TRUE)head(scored_output$DF_Scored)head(scored_output$DF_Participant)crp_output <- crp_multiple(data = scored_output$DF_Scored, key = answer_long$Answer, position = "position", scored = "Scored", answer = "Answer", id = "Sub.ID", key.trial = answer_long$List.ID, id.trial = "List.Number") head(crp_output)Cued Recall Data
Description
Dataset that includes cued recall data in long format.Participants were given a cue, and they were required toremember the response listed in the dataset. This datasetis in long format, which is required for most functions.
Usage
data(cued_data)Format
A data frame of answers for a cued recall test data
id: the participant idtrial: the trial idresponse: the response the participant gave to the cuekey: the answer for this trial idcondition: the between subjects group the participants were in
Cued Recall Data with Multiple Conditions
Description
Dataset that includes cued recall data in long format.Participants were given a cue, and they were required toremember the response listed in the dataset. This datasetis in long format, which is required for most functions.
Usage
data(cued_data_groupby)Format
A data frame of answers for a cued recall test data
Subject: the participant idTarget: the answer for this trial idResponse: the response the participant gave to the cueCondition: the between subjects group the participantswere inCondition2: the second between subjects group theparticipants were in
Cued Recall Data from Manuscript
Description
Dataset that includes cued recall data in long format.Participants were given a cue, and they were required toremember the response listed in the dataset. This datasetis in long format, which is required for most functions.
Usage
data(cued_data)Format
A data frame of answers for a cued recall test data
Sub.ID: the participant idTrial_num: the trial idCue: the cue shown to participantsTarget: the answer for this trial idAnswer: the participant answer for this trial
Free Recall Data
Description
Dataset that includes free recall data in long format.Participants were given a list of words to remember, andthen asked to recall the words. This datasetis in wide format, which should be converted with arrangedata.
Usage
data(free_data)Format
A data frame of answers for a free recall test data
Username: the participant idList_Types: a repeated measures condition participants were inResponse: the response the participant gave to the cueVersion: the version of the list_type givenBatch: the batch of participants that were run together
Cohen's Kappa
Description
This function returns Cohen's Kappa k for two raters. Kappa indicatesthe inter-rater reliability for categorical items. High scores (closerto one) indicate agreement between raters, while low scores (closerto zero) indicate low agreement between raters. Negative numbers indicatethey don't agree at all!
Usage
kappa(rater1, rater2, confidence = 0.95)Arguments
rater1 | Rater 1 scores or categorical listings |
rater2 | Rater 2 scores or categorical listings |
confidence | Confidence interval proportion for the kappa intervalestimate. You must supply a value between 0 and 1. |
Details
Note: All missing values will be ignored. This function calculates kappafor 0 and 1 scoring. If you pass categorical variables, thefunction will return a percent match score between these values.
Value
p_agree | Percent agreement between raters |
kappa | Cohen's kappa for yes/no matching |
se_kappa | Standard error for kappa wherein standard erroris the square root of: (agree \* (1-agree)) / (N \* (1 - randomagreement)^2) |
kappa_LL | Lower limit for the confidence interval of kappa |
kappa_UL | Upper limit for the confidence interval of kappa |
Examples
#This dataset includes two raters who wrote the word listed by#the participant and rated if the word was correct in the recall#experiment.data(rater_data)#Consider normalizing the text if raters used different styles#Calculate percent match for categorical answerskappa(rater_data$rater1_word, rater_data$rater2_word)kappa(rater_data$rater1_score, rater_data$rater2_score)Answer Key Example Data for Multiple Lists
Description
Dataset that includes the answer key for free recall data.Pair with the multi_data dataset for examples.
Usage
data(multi_answers)Format
A data frame of answers for a free recall test
List1: a list of free recall answersList2: a second list of free recall answersetc.
Free Recall Data in Wide Format with Multiple Lists
Description
Dataset that includes free recall data in long format.Participants were given a list of words to remember, andthen asked to recall the words. This datasetis in wide format, which should be converted with arrangedata.
Usage
data(multi_data)Format
A data frame of answers for a free recall test data
Sub.ID: the participant idList.Type: the type of list a person sawResponse: the response the participant gave to the cueList.Number: the number of the list they completed
Probability of First Recall
Description
This function calculates the probability of first recallfor each serial position. The total number of times anitem was recalled first is divided by the total number offirst recalls (i.e., the number of participants who wroteanything down!).
Usage
pfr(data, position, answer, id, key, scored, group.by = NULL)Arguments
data | a dataframe of the scored free recall that you wouldlike to calculate - use prop_correct_free() for best formatting. |
position | a column name in the dataframe that containsanswered position of each response in quotes (i.e., "column") |
answer | a column name of the answer given for that positionin the original dataframe. |
id | a column name of the participant id in the originaldataframe. |
key | a vector containing the scoring key or data column name.This column does not have to be included in the original dataframe.We assume your answer key is in the tested position order. You shouldnot include duplicates in your answer key. |
scored | a column in the original dataframe indicating if theparticipant got the answer correct (1) or incorrect (0). |
group.by | an optional argument that can be used to group theoutput by condition columns. These columns should be in the originaldataframe and concatenated c() if there are multiple columns |
Details
This output can then be used to create a PFR visualizations,and an example can be found in our manuscript/vignettes.
Important: The code is written assuming the data provided are fora single recall list. If repeated measures are used (i.e., there aremultiple lists completed by each participant or multiple list versions),you should use this function several times, once on each list/answer key.
Value
DF_PFR | A dataframe of the probability of first responsefor each position including group by variables if indicated. |
Examples
data(free_data)data(answer_key_free2)free_data <- subset(free_data, List_Type == "Cat_Recall_L1")DF_long <- arrange_data(data = free_data, responses = "Response", sep = " ", id = "Username")scored_output <- prop_correct_free(data = DF_long, responses = "response", key = answer_key_free2$Answer_Key, id = "Sub.ID", cutoff = 1, flag = TRUE, group.by = "Version")pfr_output <- pfr(data = scored_output$DF_Scored, position = "position", answer = "Answer", id = "Sub.ID", key = answer_key_free2$Answer_Key, scored = "Scored", group.by = "Version") head(pfr_output)Probability of First Recall for Multiple Lists
Description
This function calculates the probability of first recallfor each serial position. The total number of times anitem was recalled first is divided by the total number offirst recalls (i.e., the number of participants who wroteanything down!).
Usage
pfr_multiple( data, position, answer, id, key, key.trial, id.trial, scored, group.by = NULL)Arguments
data | a dataframe of the scored free recall that you wouldlike to calculate - use prop_correct_free() for best formatting. |
position | a column name in the dataframe that containsanswered position of each response in quotes (i.e., "column") |
answer | a column name of the answer given for that positionin the original dataframe. |
id | a column name of the participant id in the originaldataframe. |
key | a vector containing the scoring key or data column name.This column does not have to be included in the original dataframe.We assume your answer key is in the tested position order. You shouldnot include duplicates in your answer key. |
key.trial | a vector containing the trial numbers for each answer.Note: If you input long data (i.e., repeating trial-answer responses),we will take the unique combination of the responses. If a trial numberis repeated, you will receive an error. Key and key.trial can also bea separate dataframe, depending on how your output data is formatted. |
id.trial | a column name containing the trial numbersfor the participant data from the original dataframe. Note thatthe free response "key" trial and this trial number should match.The trial key will be repeated for each answer a participant gave. |
scored | a column in the original dataframe indicating if theparticipant got the answer correct (1) or incorrect (0). |
group.by | an optional argument that can be used to group theoutput by condition columns. These columns should be in the originaldataframe and concatenated c() if there are multiple columns |
Details
This output can then be used to create a PFR visualizations,and an example can be found in our manuscript/vignettes.
Value
DF_PFR | A dataframe of the probability of first responsefor each position including group by variables if indicated. |
Examples
data("multi_data")data("multi_answers")DF_long <- arrange_data(data = multi_data, responses = "Response", sep = " ", id = "Sub.ID", repeated = "List.Number")library(reshape)multi_answers$position <- 1:nrow(multi_answers)answer_long <- melt(multi_answers, measured = colnames(multi_answers), id = "position")colnames(answer_long) <- c("position", "List.ID", "Answer")answer_long$List.ID <- gsub(pattern = "List", replacement = "", x = answer_long$List.ID)DF_long$response <- tolower(DF_long$response)answer_long$Answer <- tolower(answer_long$Answer)answer_long$Answer <- gsub(" ", "", answer_long$Answer)scored_output <- prop_correct_multiple(data = DF_long, responses = "response", key = answer_long$Answer, key.trial = answer_long$List.ID, id = "Sub.ID", id.trial = "List.Number", cutoff = 1, flag = TRUE)head(scored_output$DF_Scored)head(scored_output$DF_Participant)head(scored_output$DF_Group)pfr_output <- pfr_multiple(data = scored_output$DF_Scored, key = answer_long$Answer, position = "position", scored = "Scored", answer = "Answer", id = "Sub.ID", key.trial = answer_long$List.ID, id.trial = "List.Number") head(pfr_output)Proportion Correct Cued Recall
Description
This function computes the proportion of correct responsesper participant. Proportions can either be separated bycondition or collapsed across conditions. You will need to ensureeach trial is marked with a unique id to correspond to the answerkey.
Usage
prop_correct_cued( data, responses, key, key.trial, id, id.trial, cutoff = 0, flag = FALSE, group.by = NULL)Arguments
data | a dataframe of the variables you would like to return.Other variables will be included in the scored output andin the participant output if they are a one to one match withthe participant id. |
responses | a column name in the dataframe that containsthe participant answers for each item in quotes (i.e., "column") |
key | a vector containing the scoring key or data column name.This column does not have to be included in the original dataframe. |
key.trial | a vector containing the trial numbers for each answer.Note: If you input long data (i.e., repeating trial-answer responses),we will take the unique combination of the responses. If a trial numberis repeated, you will receive an error. Key and key.trial can also bea separate dataframe, depending on how your output data is formatted. |
id | a column name containing participant ID numbers fromthe original dataframe. |
id.trial | a column name containing the trial numbersfor the participant data from the original dataframe. |
cutoff | a numeric value that determines the criteria forscoring (i.e., 0 = strictest, 5 = is most lenient). The scoringcriteria uses a Levenshtein distance measure to match participantresponses to the answer key. |
flag | a logical argument if you want to flag participant scoresthat are outliers using z-scores away from the mean score for group |
group.by | an optional argument that can be used to group theoutput by condition columns. These columns should be in the originaldataframe and concatenated c() if there are multiple columns |
Details
Note: other columns included in the dataframe will be foundin the final scored dataset. If these other columns arebetween subjects data, they will also be included in theparticipant dataset (i.e., there's a one to one match ofparticipant ID and column information).
Value
DF_Scored | The dataframe of the original response, answer,scoring, and any other or grouping variables. This dataframe canbe used to determine if the cutoff score and scoring matched youranswer key as intended. Distance measures are not perfect! Issuesand suggestions for improvement are welcome. |
DF_Participant | A dataframe of the proportion correct byparticipant, which also includes optional z-scoring, grouping, andother variables. |
DF_Group | A dataframe of the summary scores by any optionalgrouping variables, along with overall total proportion correctscoring. |
Examples
#This data contains cued recall test with responses and answers together.#You can use a separate answer key, but this example will show you an#embedded answer key. This example also shows how you can use different#stimuli across participants (i.e., each person sees a randomly selected#set of trials from a larger set).data(cued_data)scored_output <- prop_correct_cued(data = cued_data, responses = "response", key = "key", key.trial = "trial", id = "id", id.trial = "trial", cutoff = 1, flag = TRUE, group.by = "condition")head(scored_output$DF_Scored)head(scored_output$DF_Participant)head(scored_output$DF_Group)Proportion Correct Free Recall
Description
This function computes the proportion of correct responsesper participant. Proportions can either be separated bycondition or collapsed across conditions.
Usage
prop_correct_free( data, responses, key, id, cutoff = 0, flag = FALSE, group.by = NULL)Arguments
data | a dataframe of the variables you would like to return.Other variables will be included in the scored output andin the participant output if they are a one to one match withthe participant id. |
responses | a column name in the dataframe that containsthe participant answers for each item in quotes (i.e., "column") |
key | a vector containing the scoring key or data column name.This column does not have to be included in the original dataframe. |
id | a column name containing participant ID numbers fromthe original dataframe |
cutoff | a numeric value that determines the criteria forscoring (i.e., 0 = strictest, 5 = is most lenient). The scoringcriteria uses a Levenshtein distance measure to match participantresponses to the answer key. |
flag | a logical argument if you want to flag participant scoresthat are outliers using z-scores away from the mean score for group |
group.by | an optional argument that can be used to group theoutput by condition columns. These columns should be in the originaldataframe and concatenated c() if there are multiple columns |
Details
Note: other columns included in the dataframe will be foundin the final scored dataset. If these other columns arebetween subjects data, they will also be included in theparticipant dataset (i.e., there's a one to one match ofparticipant ID and column information).
Value
DF_Scored | The dataframe of the original response, answer,scoring, and any other or grouping variables. This dataframe canbe used to determine if the cutoff score and scoring matched youranswer key as intended. Distance measures are not perfect! Issuesand suggestions for improvement are welcome. |
DF_Participant | A dataframe of the proportion correct byparticipant, which also includes optional z-scoring, grouping, andother variables. |
DF_Group | A dataframe of the summary scores by any optionalgrouping variables, along with overall total proportion correctscoring. |
Examples
data(wide_data)data(answer_key_free)DF_long <- arrange_data(data = wide_data, responses = "Response", sep = ",", id = "Sub.ID")scored_output <- prop_correct_free(data = DF_long, responses = "response", key = answer_key_free$Answer_Key, id = "Sub.ID", cutoff = 1, flag = TRUE, group.by = "Disease.Condition")head(scored_output$DF_Scored)head(scored_output$DF_Participant)head(scored_output$DF_Group)Proportion Correct Free Recall for Multiple Lists
Description
This function computes the proportion of correct responsesper participant. Proportions can either be separated bycondition or collapsed across conditions. This functionextends prop_correct_free() to include multiple or randomizedlists for participants.
Usage
prop_correct_multiple( data, responses, key, key.trial, id, id.trial, cutoff = 0, flag = FALSE, group.by = NULL)Arguments
data | a dataframe of the variables you would like to return.Other variables will be included in the scored output andin the participant output if they are a one to one match withthe participant id. |
responses | a column name in the dataframe that containsthe participant answers for each item in quotes (i.e., "column") |
key | a vector containing the scoring key or data column name.This column does not have to be included in the original dataframe. |
key.trial | a vector containing the trial numbers for each answer.Note: If you input long data (i.e., repeating trial-answer responses),we will take the unique combination of the responses. If a trial numberis repeated, you will receive an error. Key and key.trial can also bea separate dataframe, depending on how your output data is formatted. |
id | a column name containing participant ID numbers fromthe original dataframe. |
id.trial | a column name containing the trial numbersfor the participant data from the original dataframe. Note thatthe free response "key" trial and this trial number should match.The trial key will be repeated for each answer a participant gave. |
cutoff | a numeric value that determines the criteria forscoring (i.e., 0 = strictest, 5 = is most lenient). The scoringcriteria uses a Levenshtein distance measure to match participantresponses to the answer key. |
flag | a logical argument if you want to flag participant scoresthat are outliers using z-scores away from the mean score for group |
group.by | an optional argument that can be used to group theoutput by condition columns. These columns should be in the originaldataframe and concatenated c() if there are multiple columns |
Details
Note: other columns included in the dataframe will be foundin the final scored dataset. If these other columns arebetween subjects data, they will also be included in theparticipant dataset (i.e., there's a one to one match ofparticipant ID and column information).
Value
DF_Scored | The dataframe of the original response, answer,scoring, and any other or grouping variables. This dataframe canbe used to determine if the cutoff score and scoring matched youranswer key as intended. Distance measures are not perfect! Issuesand suggestions for improvement are welcome. |
DF_Participant | A dataframe of the proportion correct byparticipant, which also includes optional z-scoring, grouping, andother variables. |
DF_Group | A dataframe of the summary scores by any optionalgrouping variables, along with overall total proportion correctscoring. |
Examples
data("multi_data")data("multi_answers")DF_long <- arrange_data(data = multi_data, responses = "Response", sep = " ", id = "Sub.ID", repeated = "List.Number")library(reshape)multi_answers$position <- 1:nrow(multi_answers)answer_long <- melt(multi_answers, measured = colnames(multi_answers), id = "position")colnames(answer_long) <- c("position", "List.ID", "Answer")answer_long$List.ID <- gsub(pattern = "List", replacement = "", x = answer_long$List.ID)DF_long$response <- tolower(DF_long$response)answer_long$Answer <- tolower(answer_long$Answer)answer_long$Answer <- gsub(" ", "", answer_long$Answer)scored_output <- prop_correct_multiple(data = DF_long, responses = "response", key = answer_long$Answer, key.trial = answer_long$List.ID, id = "Sub.ID", id.trial = "List.Number", cutoff = 1, flag = TRUE)head(scored_output$DF_Scored)head(scored_output$DF_Participant)Proportion Correct for Sentences
Description
This function computes the proportion of correct sentence responsesper participant. Proportions can either be separated bycondition or collapsed across conditions. You will need to ensureeach trial is marked with a unique id to correspond to the answerkey.
Usage
prop_correct_sentence( data, responses, key, key.trial, id, id.trial, cutoff = 0, flag = FALSE, group.by = NULL, token.split = " ")Arguments
data | a dataframe of the variables you would like to return.Other variables will be included in the scored output andin the participant output if they are a one to one match withthe participant id. |
responses | a column name in the dataframe that containsthe participant answers for each item in quotes (i.e., "column") |
key | a vector containing the scoring key or data column name.This column does not have to be included in the original dataframe. |
key.trial | a vector containing the trial numbers for each answer.Note: If you input long data (i.e., repeating trial-answer responses),we will take the unique combination of the responses. If a trial numberis repeated, you will receive an error. Key and key.trial can also bea separate dataframe, depending on how your output data is formatted. |
id | a column name containing participant ID numbers fromthe original dataframe |
id.trial | a column name containing the trial numbersfor the participant data from the original dataframe |
cutoff | a numeric value that determines the criteria forscoring (i.e., 0 = strictest, 5 = is most lenient). The scoringcriteria uses a Levenshtein distance measure to match participantresponses to the answer key. |
flag | a logical argument if you want to flag participant scoresthat are outliers using z-scores away from the mean score for group |
group.by | an optional argument that can be used to group theoutput by condition columns. These columns should be in the originaldataframe and concatenated c() if there are multiple columns |
token.split | an optional argument that can be used to delineatehow to separate tokens. The default is a space after punctuation andadditional spacing is removed. |
Details
Note: other columns included in the dataframe will be foundin the final scored dataset. If these other columns arebetween subjects data, they will also be included in theparticipant dataset (i.e., there's a one to one match ofparticipant ID and column information).
Value
DF_Scored | The dataframe of the original response, answer,scoring, and any other or grouping variables. This dataframe canbe used to determine if the cutoff score and scoring matched youranswer key as intended. Distance measures are not perfect! Issuesand suggestions for improvement are welcome. |
DF_Participant | A dataframe of the proportion correct byparticipant, which also includes optional z-scoring, grouping, andother variables. |
DF_Group | A dataframe of the summary scores by any optionalgrouping variables, along with overall total proportion correctscoring. |
Examples
#This data contains sentence recall test with responses and answers together.#You can use a separate answer key, but this example will show you an#embedded answer key. This example also shows how you can use different#stimuli across participants (i.e., each person sees a randomly selected#set of trials from a larger set).data(sentence_data)scored_output <- prop_correct_sentence(data = sentence_data, responses = "Response", key = "Sentence", key.trial = "Trial.ID", id = "Sub.ID", id.trial = "Trial.ID", cutoff = 1, flag = TRUE, group.by = "Condition", token.split = " ")head(scored_output$DF_Scored)head(scored_output$DF_Participant)head(scored_output$DF_Group)Rater Data
Description
Dataset that contains scoring and ratings for a recall testthat was rated by two raters. Use with the kappa functionas an example.
Usage
data(rater_data)Format
A data frame of scored answers for inter-rater reliability
Sub.ID: the participant idrater1_word: the word choice for the subject the rater selectedrater1_score: the score for the participant given by the raterrater2_word: the word choice for the subject the rater selectedrater2_score: the score for the participant given by the rater
Sentence Recall Data
Description
Dataset that includes sentence recall data in long format.Participants were given a sentence to remember, andthen asked to recall the words. This datasetis in long format, which is required for these functions.
Usage
data(sentence_data)Format
A data frame of answers for a sentence recall test data
Sub.ID: the participant idTrial.ID: the id for the trial given to participantSentence: the answer to the trial that the participantshould have givenResponse: the response the participant gave to that trialCondition: the between subjects condition the participantwas in
Serial Position Calculator
Description
This function calculates the proportion correct of each item in theserial position curve. Data should include the participant's answersin long format (use arrange_data() in this package for help), the answerkey of the items in order, and a column that denotes the order aparticipant listed each item. The function will then calculatethe items remembered within a window of 1 before or 1 after thetested position. The first and last positions must be answered in thecorrect place.
Usage
serial_position(data, position, answer, key, scored, group.by = NULL)Arguments
data | a dataframe of the scored free recall that you wouldlike to calculate - use prop_correct_free() for best formatting. |
position | a column name in the dataframe that containsanswered position of each response in quotes (i.e., "column") |
answer | a column name of the answer given for that positionin the original dataframe. |
key | a vector containing the scoring key or data column name.This column does not have to be included in the original dataframe.We assume your answer key is in the tested position order. You shouldnot include duplicates in your answer key. |
scored | a column in the original dataframe indicating if theparticipant got the answer correct (1) or incorrect (0). |
group.by | an optional argument that can be used to group theoutput by condition columns. These columns should be in the originaldataframe and concatenated c() if there are multiple columns |
Details
This output can then be used to create a serial position curve visualizations,and an example can be found in our manuscript/vignettes.
Important: The code is written assuming group.by variables arebetween subjects for an individual recall list.If repeated measures are used (i.e., there aremultiple lists completed by each participant or multiple list versions),you should use this function several times, once on each list/answer key.
Value
DF_Serial | A dataframe of the proportion correct for eachtested position by any optional grouping variables included. |
Examples
data(free_data)data(answer_key_free2)free_data <- subset(free_data, List_Type == "Cat_Recall_L1")DF_long <- arrange_data(data = free_data, responses = "Response", sep = " ", id = "Username")scored_output <- prop_correct_free(data = DF_long, responses = "response", key = answer_key_free2$Answer_Key, id = "Sub.ID", cutoff = 1, flag = TRUE, group.by = "Version")serial_output <- serial_position(data = scored_output$DF_Scored, key = answer_key_free2$Answer_Key, position = "position", scored = "Scored", answer = "Answer", group.by = "Version") head(serial_output)Serial Position Calculator for Multiple Lists
Description
This function calculates the proportion correct of each item in theserial position curve. Data should include the participant's answersin long format (use arrange_data() in this package for help), the answerkey of the items in order, and a column that denotes the order aparticipant listed each item. The function will then calculatethe items remembered within a window of 1 before or 1 after thetested position. The first and last positions must be answered in thecorrect place. Specifically, this function is an extension ofserial_position() for free recall when there are multiple listsor randomized lists.
Usage
serial_position_multiple( data, position, answer, key, key.trial, id.trial, scored, group.by = NULL)Arguments
data | a dataframe of the scored free recall that you wouldlike to calculate - use prop_correct_multiple() for best formatting. |
position | a column name in the dataframe that containsanswered position of each response in quotes (i.e., "column") |
answer | a column name of the answer given for that positionin the original dataframe. |
key | a vector containing the scoring key or data column name.This column does not have to be included in the original dataframe.We assume your answer key is in the tested position order. You shouldnot include duplicates in your answer key. |
key.trial | a vector containing the trial numbers for each answer.Note: If you input long data (i.e., repeating trial-answer responses),we will take the unique combination of the responses. If a trial numberis repeated, you will receive an error. Key and key.trial can also bea separate dataframe, depending on how your output data is formatted. |
id.trial | a column name containing the trial numbersfor the participant data from the original dataframe. Note thatthe free response "key" trial and this trial number should match.The trial key will be repeated for each answer a participant gave. |
scored | a column in the original dataframe indicating if theparticipant got the answer correct (1) or incorrect (0). |
group.by | an optional argument that can be used to group theoutput by condition columns. These columns should be in the originaldataframe and concatenated c() if there are multiple columns |
Details
This output can then be used to create a serial position curve visualizations,and an example can be found in our manuscript/vignettes.
Value
DF_Serial | A dataframe of the proportion correct for eachtested position by any optional grouping variables included. |
Examples
data("multi_data")data("multi_answers")DF_long <- arrange_data(data = multi_data, responses = "Response", sep = " ", id = "Sub.ID", repeated = "List.Number")library(reshape)multi_answers$position <- 1:nrow(multi_answers)answer_long <- melt(multi_answers, measured = colnames(multi_answers), id = "position")colnames(answer_long) <- c("position", "List.ID", "Answer")answer_long$List.ID <- gsub(pattern = "List", replacement = "", x = answer_long$List.ID)DF_long$response <- tolower(DF_long$response)answer_long$Answer <- tolower(answer_long$Answer)answer_long$Answer <- gsub(" ", "", answer_long$Answer)scored_output <- prop_correct_multiple(data = DF_long, responses = "response", key = answer_long$Answer, key.trial = answer_long$List.ID, id = "Sub.ID", id.trial = "List.Number", cutoff = 1, flag = TRUE)head(scored_output$DF_Scored)head(scored_output$DF_Participant)serial_output <- serial_position_multiple(data = scored_output$DF_Scored, position = "position", answer = "Answer", key = answer_long$Answer, key.trial = answer_long$List.ID, scored = "Scored", id.trial = "List.Number") head(serial_output)Free Recall Data in Wide Format
Description
Dataset that includes free recall data in long format.Participants were given a list of words to remember, andthen asked to recall the words. This datasetis in wide format, which should be converted with arrangedata.
Usage
data(wide_data)Format
A data frame of answers for a free recall test data
Sub.ID: the participant idResponse: the response the participant gave to the cueDisease.Condition: healthy or sick participant condition