Movatterモバイル変換


[0]ホーム

URL:


Title:Explore and Process Synesthesia Consistency Test Data
Version:1.0.0
Description:Explore synesthesia consistency test data, calculate consistency scores, and classify participant data as valid or invalid.
Depends:R (≥ 3.6.0)
Imports:methods (≥ 3.6), data.table (≥ 1.12), ggplot2 (≥ 3.3.0),dbscan (≥ 1.1)
Suggests:testthat (≥ 2.1.0), dplyr (≥ 1.0.0), knitr, rmarkdown,tidyr, plotly
License:MIT + file LICENSE
URL:https://datalowe.github.io/synr/,https://github.com/datalowe/synr
Encoding:UTF-8
LazyData:true
RoxygenNote:7.2.1
VignetteBuilder:knitr
NeedsCompilation:no
Packaged:2024-01-13 12:12:52 UTC; lowe
Author:Lowe Wilsson [aut, cre]
Maintainer:Lowe Wilsson <datalowe@posteo.de>
Repository:CRAN
Date/Publication:2024-01-13 21:50:02 UTC

synr: Explore and Process Synesthesia Consistency Test Data

Description

synr helps you work with data resulting from grapheme-color consistency tests for synesthesia.

To learn more about synr, start with the vignettes:⁠browseVignettes(package = "synr")⁠

Author(s)

Maintainer: Lowe Wilssondatalowe@posteo.de

See Also

Useful links:


A Reference Class for representing consistency test graphemes

Description

A Reference Class for representing consistency test graphemes

Fields

symbol

A one-element character vector containing the symbol/set of symbolsthat describe(s) the grapheme, e. g. '7' or 'Monday'. Set at class new() call or using set_symbol method.

response_colors

A matrix where each row specifies color coordinates for each participantresponse. Set using set_colors method.

response_times

A numeric vector of response times. Set using set_times method.

color_space

A one-element character vector which describes the color spacethat response colors are coded in. Set when using set_colors method.

Methods

get_abbreviated_symbol()

Return a short (3 character) representation ofthe grapheme's symbol.

get_consistency_score(na.rm = FALSE, method = "euclidean")

Calculate the consistency score based on theGrapheme instance's response colors. Throws anerror if no responses have been registered yet.Always returns NA if all grapheme responses are NA.If na.rm=FALSE, returns NA if any grapheme responseis NA. If na.rm=TRUE, returns the consistency scorefor non-NA responses. This function relies on thebase/stats function dist() and so supports onlydistance calculation methods implemented by dist()(use help(dist) to learn more about it).

get_mean_color(na.rm = FALSE)

Average all registered response colors andreturn the result (using the color spaceset at grapheme initialization) as a 3-element vector.Example: if color space is RGB, element 1 representsmean R value, element 2 mean G value, element 3B value.

If na.rm=FALSE and any of the response colors is missing,return a 3-element NA vector. If na.rm=TRUE, return a3-element NA vector if all response colors are missing,otherwise return mean of all available colors.

get_mean_response_time(na.rm = FALSE)

Get the mean of the grapheme's associatedresponse times.

get_num_non_na_colors()

Get the number of response colors that are non-NA, returned asa one-element numeric vector.

get_plot_data_list()

Get a list of the grapheme's data, bundled up ina format ready for use in Participant.get_plot_data()method as a row of plotdata.

has_only_non_na_colors()

Returns TRUE if the grapheme only has responses with valid colors,FALSE if there are responses with nonvalid colors or there areno responses at all.

set_colors(hex_codes, color_space_spec)

Set response colors, using passed RGB hex codes. Convertsthe hex codes to color coordinates in the specifiedcolor space. Supports the following color spaces:"XYZ", "sRGB", "Apple RGB", "Lab", and "Luv".For all NA values passed, a row of NA values will be includedin the matrix (preserving order of responses). Returned/setresponse colors are in the format of a matrix where eachrow represents one response/color, and eachcolumn represents one color coordinate axis (there are always3 axes used for the currently supported color spaces)

set_symbol(symbol_chars)

Set the grapheme's symbol attribute, using a passedone-element character vector.

set_times(times)

Add response times, using passed numeric vector.

Examples

a <- synr::Grapheme$new(symbol='a')a$set_colors(c("#101010", NA), "Luv")a$set_times(c(5, 10))a$get_num_non_na_colors()

A Reference Class for representing consistency test participants

Description

A Reference Class for representing consistency test participants

Fields

id

A one-element character vector containing the participant's ID.Set at class new() call.

test_date

A one-element Date vector which specifies the dateon which the participant did the consistency test.

graphemes

A list ofGrapheme class instances.

Methods

add_grapheme(grapheme)

Add a passed grapheme to the participant's listof graphemes. The grapheme's entry inthe list is named based on the grapheme'ssymbol. Note that if you try to adda grapheme with a symbol that's identicalto one of the graphemes already in theparticipant's list of graphemes, thealready existing same-symbol graphemeis overwritten.

add_graphemes(grapheme_list)

Go through a passed list of Grapheme instancesand add each one using the add_grapheme() method.

check_valid_get_twcv( min_complete_graphemes = 5, dbscan_eps = 20, dbscan_min_pts = 4, max_var_tight_cluster = 150, max_prop_single_tight_cluster = 0.6, safe_num_clusters = 3, safe_twcv = 250, complete_graphemes_only = TRUE, symbol_filter = NULL)

Checks if this participant's data are valid based on passed arguments.This method aims to identify participants who had too few responses orvaried their response colors too little, by marking them as invalid.Note that there are no absolutely correct values, as what is 'too littlevariation' is highly subjective. You might need to tweak parameters to bein line with your project's criteria, especially if you use another colorspace than CIELUV, since the default values are based on what seemsto make sense in a CIELUV context. If you use the results in aresearch article, make sure to reference synr and specify what parametervalues you passed to the function.

This method relies heavily on the DBSCAN algorithm and the package'dbscan', and involves calculating a synr-specific 'Total Within-ClusterVariance' (TWCV) score. You can find more information, andwhat the parameters here mean, inthe documentation for the functionvalidate_get_twcv.

Parameters

  • min_complete_graphemes The minimum number of graphemeswith complete (all non-NA color) responses that the participant datamust have for them to not be categorized as invalid based on thiscriterion. Defaults to 5.

  • dbscan_eps Radius of 'epsilon neighborhood' when applyingDBSCAN clustering. Defaults to 20.

  • dbscan_min_pts Minimum number of points required in theepsilon neighborhood for core points (including the core pointitself). Defaults to 4.

  • max_var_tight_cluster Maximum variance for an identifiedDBSCAN cluster to be considered 'tight-knit'. Defaults to 150.

  • max_prop_single_tight_cluster Maximum proportion ofpoints allowed to be within a single 'tight-knit' cluster (exceedingthis leads to classification as invalid). Defaults to 0.6.

  • safe_num_clusters Minimum number of identified DBSCANclusters (including 'noise' cluster only if it consists of at least'dbscan_min_pts' points) that guarantees validity ifpoints are 'non-tight-knit'. Defaults to 3.

  • safe_twcv Minimum total within-cluster variance (TWCV)score that guarantees validity if points are 'non-tight-knit'.Defaults to 250.

  • complete_graphemes_only A logical vector. If TRUE, only data from graphemes that have all non-NA color responsesare used; if FALSE, even data from graphemes with some NA colorresponses are used. Defaults to TRUE.

  • symbol_filter A character vector (or NULL) that specifieswhich graphemes' data to use. Defaults to NULL, meaning data fromall of the participant's graphemes will be used.

Returns

A list with components

  • valid TRUE if categorized as valid, otherwise FALSE.

  • reason_invalid One-element character vector describingwhy participant's data were deemed invalid, or empty string ifvalid is TRUE.

  • twcv One-element numeric (or NA if there are no/too fewgraphemes with complete responses) vector indicating participant'scalculated TWCV.

  • num_clusters One-element numeric (or NA if there are no/too fewgraphemes with complete responses) vector indicatingthe number of identified clusters counting toward thetally compared with 'safe_num_clusters'.

get_all_colored_symbols(symbol_filter = NULL)

Returns a character vector of symbols corresponding to graphemes forwhich all responses have an associated non-NA color. If acharacter vector is passed to symbol_filter, onlysymbols in the passed vector are returned.

get_consistency_scores( method = "euclidean", symbol_filter = NULL, na.rm = FALSE)

Returns a list of grapheme symbols with associated consistency scores.If na.rm = TRUE, for each grapheme a consistency score calculation isforced (except if ALL response colors associated with the graphemeare NA). That probably isn't what you want, because it leads to thingslike a perfect consistency score if all except one response color areNA. Defaults to na.rm = FALSE.

If a character vector is passed tosymbol_filter, only consistency scores for graphemes with symbolsin the passed vector are returned.

Use the method argument to specify what kind of color spacedistances should be used when calculating consistency score(usually 'manhattan' or 'euclidean' - see documentation forthe base R dist function for all options)

get_grapheme_mean_colors(symbol_filter = NULL, na.rm = FALSE)

Returns a list of grapheme symbols with associated mean colors,using the color space set at participant creation. Colors are representedby 3-element vectors.

Example: if color space is RGB, vector element 1 representsgrapheme mean R value, element 2 mean G value, element 3B value.

If na.rm = TRUE, for each grapheme a mean color is calculated evenif one its associated response colors is missing. Defaults tona.rm = FALSE.

If a character vector is passed to symbol_filter, onlymean colors for graphemes with symbolsin the passed vector are returned.

get_mean_consistency_score( symbol_filter = NULL, method = "euclidean", na.rm = FALSE)

Returns the mean consistency score with respect toGrapheme instances associated with the participant.

If na.rm = FALSE, calculates the mean consistency score ifall of the participants' graphemes only have responsecolors that are non-NA, otherwise returns NA.If na.rm = TRUE, returns the mean consistency score forall of the participant's graphemes that only havenon-NA response colors, while ignoring graphemesthat have at least one NA response color value. Note thatNA is returned in either case, if ALL of the participants'graphemes have at least one NA response color value.

If a character vector is passed tosymbol_filter, only data from graphemes with symbolsin the passed vector are used when calculating themean score.

Use the method argument to specify what kind of color spacedistances should be used when calculating consistency score(usually 'manhattan' or 'euclidean' - see documentation forthe base R dist function for all options)

get_mean_response_time(symbol_filter = NULL, na.rm = FALSE)

Returns the mean response time, with respect to allGrapheme instances associated with the participant.Weights response times based on number of valid responsesthat each grapheme has. If na.rm = TRUE, returns mean responsetime even if there are missing response times. If na.rm = FALSE,returns mean response time if there is at least one response timevalue for at least one of the participants' graphemes. If acharacter vector is passed to symbol_filter, only data fromgraphemes with symbols in the passed vector are used whencalculating the mean response time.

get_nonna_color_resp_mat(symbol_filter = NULL)

Returns an n-by-3 matrix of all non-NA color responses' data,where each column represents a color axis and each row a responsecolor. If a character vector is passed to symbol_filter,only data from responses associated with graphemes with corresponding symbols are included.

get_number_all_colored_graphemes(symbol_filter = NULL)

Returns the number of graphemes for which allresponses have an associated non-NA color. If acharacter vector is passed to symbol_filter, onlygraphemes with symbols in the passed vector are counted.

get_participant_mean_color(symbol_filter = NULL, na.rm = FALSE)

Returns average of all of participants' registeredresponse colors (based on the color spaceset at participant initialization) as a 3-element vector.Example: if color space is RGB, element 1 representsmean R value, element 2 mean G value, element 3B value.

If a character vector is passed tosymbol_filter, only data from graphemes with symbolsin the passed vector are used when calculating themean color.

If na.rm = FALSE, calculates the mean response color ifall of the participants' graphemes only have responsecolors that are non-NA, otherwise returns NA.If na.rm = TRUE, returns the mean response color based onall non-NA response colors.

get_plot( cutoff_line = FALSE, mean_line = FALSE, grapheme_size = 2, grapheme_angle = 0, grapheme_spacing = 0.25, foreground_color = "black", background_color = "white", symbol_filter = NULL)

Returns a ggplot2 plot that describes this participant'sgrapheme color responses and per-grapheme consistency scores.

If cutoff_line = TRUE, the plot will include a blue line thatindicates the value 135.30, which is the synesthesiacut-off score recommended by Rothen, Seth, Witzel & Ward (2013)for the L*u*v color space. If mean_line = TRUE, the plot willinclude a green line that indicates the participant's meanconsistency score for graphemes with all-validresponse colors (if the participant has any such graphemes). If a vectoris passed to symbol_filter, this green line represents the mean scorefor ONLY the symbols included in the filter.

Pass a value to grapheme_size to adjust the size of graphemesshown at the bottom of the plot, e. g. increasing the size ifthere's a lot of empty space otherwise, or decreasing the size if thegraphemes don't fit. The grapheme_angleargument allows rotating graphemes. grapheme_spacing is for adjustinghow far grapheme symbols are spaced from each other.

If a character vector is passed to symbol_filter, only data for graphemeswith symbols in the passed vector are used.

Graphemes are sorted left-to-right by 1. length and2. unicode value (this means among other things that digitscome before letters).

get_plot_data(symbol_filter = NULL)

Returns a data frame with the following columns:

1. grapheme (grapheme names - of type character)

2. consistency_score (of type numeric)

3... color_resp<x>, where x is a digit: hold response hex color codes(number of columns depends on number of response colorsassociated with each grapheme).

The data frame is intended to be used for plotting participant data,using .get_plot(). The call will end with an errorif not all of the participant's graphemes have the same numberof color responses. This is intended.

If a character vector is passed to symbol_filter, only data for graphemeswith symbols in the passed vector are used.

get_symbols()

Returns a character vector with all symbols forgraphemes associated with the participant.

has_graphemes()

Returns TRUE if there is at least onegrapheme in the participant's graphemes list,otherwise returns FALSE

save_plot( save_dir = NULL, file_format = "png", dpi = 300, cutoff_line = FALSE, mean_line = FALSE, grapheme_size = 2, grapheme_angle = 0, foreground_color = "black", background_color = "white", symbol_filter = NULL, ...)

Saves a ggplot2 plot that describes this participant'sgrapheme color responses and per-grapheme consistency scores,using the ggsave function.

If a character vector is passed to symbol_filter, only data for graphemeswith symbols in the passed vector are used.

If save_dir is not specified, the plot is saved to the currentworking directory. Otherwise, the plot is saved to the specifieddirectory. The file is saved using the specified file_format,e. g. JPG (see ggplot2::ggsave documentation for list ofsupported formats), and the resolution specified withthe dpi argument.

If cutoff_line = TRUE, the plot will include a blue line thatindicates the value 135.30, which is the synesthesiacut-off score recommended by Rothen, Seth, Witzel & Ward (2013)for the L*u*v color space. If mean_line = TRUE, the plot willinclude a green line that indicates the participant's meanconsistency score for graphemes with all-valid response colors(if the participant has any such graphemes). If a vectoris passed to symbol_filter, this green line represents the mean scorefor ONLY the symbols included in the filter.

Pass a value to grapheme_size to adjust the size of graphemesshown at the bottom of the plot, e. g. increasing the size ifthere's empty space otherwise, or decreasing the size if thegraphemes don't fit. Similarly, you can use the grapheme_angleargument to rotate the graphemes, which might help them fit better.

Apart from these, all other argumentsthat ggsave accepts (e. g. 'scale') also work with this function, sinceall arguments are passed on to ggsave.

set_date(in_date)

Takes in a one-element character vector with a datein the format 'YYYY-MM-DD' and sets the participant'stest_date to the specified date.


A Reference Class for representing a group of consistency test participants

Description

A Reference Class for representing a group of consistency test participants

Fields

participants

A list ofParticipantclass instances.

Methods

add_participant(participant)

Add a passed participant to the participantgroup's listof participants. The participant's entry inthe list is named based on the participant'sid. Note that if you try to adda participant with an id that's identicalto one of the participants already in theparticipantgroup's list of participants, thealready existing same-id participantis overwritten.

add_participants(participant_list)

Go through a passed list of Participant instancesand add each one using the add_participant() method.

check_valid_get_twcv_scores( min_complete_graphemes = 5, dbscan_eps = 20, dbscan_min_pts = 4, max_var_tight_cluster = 150, max_prop_single_tight_cluster = 0.6, safe_num_clusters = 3, safe_twcv = 250, complete_graphemes_only = TRUE, symbol_filter = NULL)

Checks if participants' data are valid based on passed arguments.This method aims to identify participants who had too few responses orvaried their response colors too little, by marking them as invalid.Note that there are no absolutely correct values, as what is 'too littlevariation' is highly subjective. You might need to tweak parameters to bein line with your project's criteria, especially if you use another colorspace than CIELUV, since the default values are based on what seemsto make sense in a CIELUV context. If you use the results in aresearch article, make sure to reference synr and specify what parametervalues you passed to the function.

This method relies heavily on the DBSCAN algorithm and the package'dbscan', and involves calculating a synr-specific 'Total Within-ClusterVariance' (TWCV) score. You can find more information, andwhat the parameters here mean, inthe documentation for the functionvalidate_get_twcv. Notethat DBSCAN clustering and related calculations are performed ona per-participant basis, before they are summarized in the data framereturned by this method.

Parameters

  • min_complete_graphemes The minimum number of graphemeswith complete (all non-NA color) responses that a participant's datamust have for them to not be categorized as invalid based onthis criterion. Defaults to 7.

  • dbscan_eps Radius of 'epsilon neighborhood' when applying(on a per-participant basis) DBSCAN clustering. Defaults to 30.

  • dbscan_min_pts Minimum number of points required in theepsilon neighborhood for core points (including the core pointitself). Defaults to 4.

  • max_var_tight_cluster Maximum variance for an identifiedDBSCAN cluster to be considered 'tight-knit'. Defaults to 150.

  • max_prop_single_tight_cluster Maximum proportion ofpoints allowed to be within a single 'tight-knit' cluster (if aparticipant's data exceed this limit, they are classified asinvalid). Defaults to 0.6.

  • safe_num_clusters Minimum number of identified DBSCANclusters (including 'noise' cluster only if it consists of at least'dbscan_min_pts' points) that guarantees validity ofa participant's data if points are 'non-tight-knit'. Defaults to 3.

  • safe_twcv Minimum total within-cluster variance (TWCV)score that guarantees a participant's data's validity if points are'non-tight-knit'. Defaults to 250.

  • complete_graphemes_only A logical vector. If TRUE, only data from graphemes that have all non-NA color responsesare used; if FALSE, even data from graphemes with some NA colorresponses are used. Defaults to TRUE.

  • symbol_filter A character vector (or NULL) that specifieswhich graphemes' data to use. Defaults to NULL, meaning data fromall of the participants' graphemes will be used.

Returns

A data frame with columns

  • valid Holds TRUE for participants whose data wereclassified as valid, FALSE for participants whose data wereclassified as invalid.

  • reason_invalid Strings which describe for eachparticipant why their data were deemed invalid. Participantswhose data were classified as valid have empty strings here.

  • twcv Numeric column which holds participants'calculated TWCV scores (NA for participants who had no/toofew graphemes with complete responses).

  • num_clusters One-element numeric (or NA if there are no/too fewgraphemes with complete responses) vector indicatingthe number of identified clusters counting toward thetally compared with 'safe_num_clusters'.

get_ids()

Returns a character vector with all ids forparticipants associated with the participantgroup.

get_mean_colors(symbol_filter = NULL, na.rm = FALSE)

Returns an nx3 data frame of mean colors forparticipants in the group, where the columnsrepresent chosen color space axis 1, 2, and 3, respectively(e.g. 'R', 'G', 'B' if 'sRGB' was specified upon participantgroupcreation).

If na.rm=FALSE, for eachparticipant calculates the mean color ifall of the participants' graphemes only have responsecolors that are non-NA, otherwise puts NA valuesfor that participant's row in matrix. If na.rm=TRUE,for each participant calculates the mean colorfor all of the participant's valid response colors,while ignoring NA response colors. Note thatfor participants whose graphemes ALL have at least one NAresponse color value, an NA is put in the row corresponding tothat participant, regardless of what na.rm is set to.

If a character vector is passed to symbol_filter, onlydata from graphemes with symbols in the passed vectorare used when calculating each participant's mean color.

get_mean_consistency_scores( method = "euclidean", symbol_filter = NULL, na.rm = FALSE)

Returns a vector of mean consistency scores forparticipants in the group. If na.rm=FALSE, for eachparticipant calculates the mean consistency score ifall of the participants' graphemes only have responsecolors that are non-NA, otherwise puts an NA valuefor that participant in returned vector. If na.rm=TRUE,for each participant calculates the mean consistency score forall of the participant's graphemes that only havenon-NA response colors, while ignoring graphemesthat have at least one NA response color value. Note thatfor participants whose graphemes ALL have at least one NAresponse color value, an NA is put in the returned vector forthat participant, regardless of what na.rm is set to.

If a character vector is passed to symbol_filter, onlydata from graphemes with symbols in the passed vectorare used when calculating each participant's mean score.

Use the method argument to specify what kind of color spacedistances should be used when calculating consistency scores(usually 'manhattan' or 'euclidean' - see documentation forthe base R dist function for all options)

get_mean_response_times(symbol_filter = NULL, na.rm = FALSE)

Returns the mean response times, with respect toGrapheme instances associated with each participant.If na.rm=TRUE, for each participant returns mean response time evenif there are missing response times. If na.rm=FALSE, returnsmean response time if there is at least one response timevalue for at least one of the participants' graphemes. If acharacter vector is passed to symbol_filter, only data fromgraphemes with symbols in the passed vector are used whencalculating each participant's mean response time.

get_numbers_all_colored_graphemes(symbol_filter = NULL)

Returns a vector with numbers representing how manygraphemes with all-valid (non-na) response colors that eachparticipant has. If a character vector is passed to symbol_filter,only data connected to graphemes with symbols in the passed vectorare used.

has_participants()

Returns TRUE if there is at least oneparticipant in the participantgroup's participants list,otherwise returns FALSE

save_plots( save_dir = NULL, file_format = "png", dpi = 300, cutoff_line = FALSE, mean_line = FALSE, grapheme_size = 2, grapheme_angle = 0, foreground_color = "black", background_color = "white", symbol_filter = NULL, ...)

Goes through all participants and for each one produces and savesa ggplot2 plot that describes the participant'sgrapheme color responses and per-grapheme consistency scores,using the ggsave function.

If a character vector is passed to symbol_filter, only data for graphemeswith symbols in the passed vector are used.

If path is not specified, plots are saved to the currentworking directory. Otherwise, plots are saved to the specifieddirectory. The file is saved using the specified file_format,e. g. JPG (see ggplot2::ggsave documentation for list ofsupported formats), and the resolution specified withthe dpi argument.

If cutoff_line=TRUE, each plot will include a blue line thatindicates the value 135.30, which is the synesthesia cut-off scorerecommended by Rothen, Seth, Witzel & Ward (2013) for the L*u*vcolor space. If mean_line=TRUE, the plot will include a green linethat indicates the participant's mean consistency score forgraphemes with all-valid response colors (if the participanthas any such graphemes). If a vector is passed to symbol_filter,this green line represents the mean scorefor ONLY the symbols included in the filter.

Pass a value to grapheme_size to adjust the size of graphemesshown at the bottom of the plot, e. g. increasing the size ifthere's empty space otherwise, or decreasing the size if thegraphemes don't fit. Similarly, you can use the grapheme_angleargument to rotate the graphemes, which might help them fit better.

Apart from the ones above, all other argumentsthat ggsave accepts (e. g. 'scale') also work with this function, sinceall arguments are passed on to ggsave.


Calculate sum of squared 3D point distances from centroid

Description

Calculates sum of squared point distances in3D space betweeen points and their centroid.

\frac{\sum_{i=1}^n (x_i-x_m)^2 + (y_i-y_m)^2 + (z-z_m)^2}{sum_(i=1)^n ((x - x_m)^2 + (y - y_m)^2 + (z - z_m)^2)}

WhereX/Y/Z represent one axis each,a_m represents the meanof all points' coordinates on an axis, andn represents the totalnumber of points.

Usage

centroid_3d_sq_dist(point_matrix)

Arguments

point_matrix

An n-by-3 numerical matrix where eachrow corresponds to a single point in 3D space.


Create a grapheme instance

Description

Takes in a symbol/grapheme and sets of response times/colors,then creates a Grapheme instance that holds the passed information and returns it.

Usage

create_grapheme(  symbol,  response_times = NULL,  response_colors,  color_space_spec = "Luv")

Arguments

symbol

A one-element character vector holding a symbol/grapheme.

response_times

(optional) A numeric vector. Times from presentation toresponse, in order.

response_colors

A character vector. Response colors, as hex colorcodes.

color_space_spec

A one-element character vector. What color spaceis to be used? The following color spaces are supported:"XYZ", "sRGB", "Apple RGB", "Lab", and "Luv"

Examples

create_grapheme(symbol="a", response_times=c(2.3, 6.7, 0.4),response_colors=c("84AE99", "9E3300", "000000"), color_space_spec="Luv")

Create a Participant instance.

Description

Takes in a participant id, set of symbols for which graphemesshould be created and participant trial/response data. Returns a Participantinstance with all the input data linked to it. For each grapheme, if thereare data for less trials than the number specified by n_trials_per_grapheme,NA values are added to affected graphemes' associated vectors of responsetimes/colors.

Usage

create_participant(  participant_id,  grapheme_symbols,  n_trials_per_grapheme,  trial_symbols,  response_times = NULL,  response_colors,  color_space_spec = "Luv",  test_date = NULL)

Arguments

participant_id

A one-element character vector holding a participant id.

grapheme_symbols

A character vector of symbols/graphemes for whichGrapheme instances should be created and linked to the Participant instance.

n_trials_per_grapheme

A one-element numeric vector holding the numberof trials per grapheme.

trial_symbols

A character vector that holds one symbol/graphemefor each trial of the participant's consistency test run.

response_times

(optional) A numeric vector. Consistency test times frompresentation to response, in order.

response_colors

A character vector. Consistency test responsecolors, as hex color codes.

color_space_spec

A one-element character vector. What colorspace is to be used? The following color spaces are supported:"XYZ", "sRGB", "Apple RGB", "Lab", and "Luv"

test_date

(optional) A one-element character vector in the format"YYYY-MM-DD" that indicates on what date the participantfinished the consistency test.

Examples

participant_id <- "1"target_symbols_vec <- c("A", "D", "7")symbol_vec <- c("A", "D", "7",                "D", "A", "7",                "7", "A", "D")times_vec <- c(1.1, 0.4, 5,               0.3, 2.4, 7.3,               1, 10.2, 8.4)color_vec <- c("98FF22", "138831", "791322",               "8952FE", "DC8481", "7D89B0",               "001100", "887755", "FF0033")p <- create_participant(participant_id=participant_id,                        grapheme_symbols=target_symbols_vec,                        n_trials_per_grapheme=3,                        trial_symbols=symbol_vec,                        response_times=times_vec,                        response_colors=color_vec,                        color_space_spec="Luv")

Create a ParticipantGroup instance using long-format data

Description

Takes in a data frame of raw'long format'consistency test data and returns aParticipantGroup instance, to which all the relevant data are linked. See theexample data frame 'synr_exampledf_long_small' and its documentation('help(synr_exampledf_long_small)') for more information on the formatthat this function expects data to be in.

Usage

create_participantgroup(  raw_df,  n_trials_per_grapheme = 3,  id_col_name,  symbol_col_name,  color_col_name,  time_col_name = NULL,  color_space_spec = "Luv")

Arguments

raw_df

A data frame of 'long format' raw consistency test data.

n_trials_per_grapheme

A one-element numeric vector holding the numberof trials per grapheme that was used in the consistency test the data are from.

id_col_name

A one-element character vector that holds thename of the participant id column in raw_df.

symbol_col_name

A one-element character vector that holds thename of the grapheme/symbol column in raw_df.

color_col_name

A one-element character vector that holds thename of the response color (hex codes) column in raw_df.

time_col_name

(optional) A one-element character vector that holds thename of the response time (time from stimulus presentation to response) columnin raw_df.

color_space_spec

A one-element character vector specifying which colorspace to use for calculations with participant data. One of"XYZ", "sRGB", "Apple RGB", "Lab", and "Luv".

Examples

pg <- create_participantgroup(  raw_df=synr_exampledf_long_small,  n_trials_per_grapheme=2,  id_col_name="participant_id",  symbol_col_name="trial_symbol",  color_col_name="response_color",  time_col_name="response_time",  color_space_spec="Luv")cons_means <- pg$get_mean_consistency_scores()print(cons_means)

Create a ParticipantGroup instance

Description

Takes in a data frame of raw consistency test data and returns aParticipantGroup instance, to which all the relevant data are linked. See theexample data frame synr_exampledf_wide_small and its documentation (help(synr_exampledf_wide_small))for information on the format that this function expects data to be in.

Participant id and (optional) test date column names are specified with theexact column names used in the data frame passed to the function. Symbol(i. e. grapheme), response color and response time (optional) columns arespecified using regular expressions. You canread about regular expressions using R hereif you want to, but basically what you want to do is this:say your columns with response colors are named"chosen_color_001", "chosen_color_002" and so on. You then simply setcolor_col_regex="chosen_color" when calling this function. The important thingis that you specify a part of the column names that is unique for the type ofcolumn you want to indicate. So if your symbol/grapheme columns are named"grapheme_1", "grapheme_2" ... and your participant id column is named"graphparticipant_1" ..., then symbol_col_regex="graph" wouldn't work,but symbol_col_regex="grapheme_" or even symbol_col_regex="graphe" would.

Usage

create_participantgroup_widedata(  raw_df,  n_trials_per_grapheme = 3,  participant_col_name,  symbol_col_regex,  color_col_regex = "colou*r",  time_col_regex = NULL,  testdate_col_name = NULL,  color_space_spec = "Luv")

Arguments

raw_df

A data frame of raw consistency test data.

n_trials_per_grapheme

A one-element numeric vector holding the numberof trials per grapheme that was used in the consistency test the data are from.

participant_col_name

A one-element character vector that holds thecolumn name used for the column in raw_df that holds participant id's.(e. g. "participant_id" for the synr::synr_exampledf_wide_small)

symbol_col_regex

A one-element character vector with a regular expression(see above) unique to columns in the passed data frame that hold trial graphemes/symbols.

color_col_regex

A one-element character vector with a regular expression(see above) unique to columns in the passed data frame that hold response color hex codes.

time_col_regex

(optional) A one-element character vector with a regular expression(see above) unique to columns in the passed data frame that hold response times (timesfrom stimulus presentation to response).

testdate_col_name

(optional) A one-element character vector that holds thecolumn name used for the column in raw_df that holds test dates (dates when participantsfinished the consistency test).

color_space_spec

A one-element character vector. What colorspace is to be used for analyses of the data? The following color spaces are supported:"XYZ", "sRGB", "Apple RGB", "Lab", and "Luv"

Examples

pg <- create_participantgroup_widedata(raw_df=synr_exampledf_wide_small,                              n_trials_per_grapheme=2,                              participant_col_name="participant_id",                              symbol_col_regex="symbol",                              color_col_regex="colou*r",                              time_col_regex="response_time",                              color_space_spec="Luv")cons_means <- pg$get_mean_consistency_scores()print(cons_means)

Filter graphemes of a single participant

Description

Takes in a list of Grapheme objects and a character vector.Returns a list of Grapheme objects, consisting of the participant'sgraphemes which had a symbol included in the character vector to filter by.

Usage

filter_graphemes(graphemes, symbol_vector = NULL)

Arguments

graphemes

A list of Grapheme objects.

symbol_vector

A character vector of symbols to filter theparticipant's graphemes by. Alternatively NULL (default), in which caseno filtering will be done and the full grapheme list is returned.

Value

A list of Grapheme objects.


Calculate sample variance of 3D point distance from centroid

Description

Calculates sample variance of points' distances in3D space from their centroid. This function is normally only usedindirectly through 'validate_get_twcv'.

Usage

point_3d_variance(point_matrix)

Arguments

point_matrix

An n-by-3 numerical matrix where eachrow corresponds to a single point in 3D space.

Value

A one-element numeric vector holding calculated variance

Details

The variance here is taken to meanthe sum of variances for each dimension/axis:

\frac{\sum_{i=1}^n (x_i-x_m)^2 + (y_i-y_m)^2 + (z-z_m)^2}{n-1}

WhereX/Y/Z represent one axis each,a_m represents the meanof all points' coordinates on an axis, andn represents the totalnumber of points.

See Also

centroid_3d_sq_dist


Raw consistency test data example, long format

Description

A data frame with an example of raw consistency test data that arecompatible with the synr package's ‘create_participantgroup’ function.The color and 'symbol' data are from five actual participants who dida test that included all letters, digits and weekdays, with 3 trialsper grapheme. The response times are randomly generated. Note that response times are optional. If you don't havethem, you can still use synr - see 'help(create_participantgroup_widedata)'.

Usage

synr_exampledf_large

Format

A data frame with 516 rows and 4 columns:

participant_id

Participant ID

trial_symbol

Column of trial symbols/graphemes

response_color

Column of trial response colors

response_time

Column of trial response times


Raw consistency test data example, long format (small)

Description

A data frame with an example of raw consistency test data that arecompatible with the synr package's 'create_participantgroup' function,with completely made updata for three participants from a hypothetical test that included three graphemes ("A", "D", 7) and two responsesper grapheme. More graphemes and/orresponses per grapheme can be handled by the package (though participantplots do not function correctly if there are more than three responsesper grapheme). Note that response times are optional. If you don't havethem, you can still use synr - see 'help(create_participantgroup_widedata)'.

Usage

synr_exampledf_long_small

Format

A data frame with 18 rows and 4 columns:

participant_id

Participant ID

trial_symbol

Column of trial symbols/graphemes

response_color

Column of trial response colors

response_time

Column of trial response times


Raw consistency test data example, wide format (small)

Description

A data frame with an example of raw consistency test data that arecompatible with the synr package's 'create_participantgroup_widedata' function,with data for three participants from a test that included threegraphemes ("A", "D", 7) and two responsesper grapheme. More graphemes and/orresponses per grapheme can be handled by the package (though participantplots do not function correctly if there are more than three responsesper grapheme)

Usage

synr_exampledf_wide_small

Format

A data frame with 3 rows and 8 columns:

participant_id

Participant ID

symbol_1

Column with symbol/grapheme connected to first response

response_color_1

Column with color of first response

response_time_1

(optional) Column with time frompresentation to response, for first response

symbol_2

Column with symbol/grapheme connected to second response

response_color_2

Column with color of second response

response_time_2

(optional) Column with time frompresentation to response, for second response

symbol_3

Column with symbol/grapheme connected to third response

response_color_3

Column with color of third response

response_time_3

(optional) Column with time frompresentation to response, for third response

symbol_4

Column with symbol/grapheme connected to fourth response

response_color_4

Column with color of fourth response

response_time_4

(optional) Column with time frompresentation to response, for fourth response

symbol_5

Column with symbol/grapheme connected to fifth response

response_color_5

Column with color of fifth response

response_time_5

(optional) Column with time frompresentation to response, for fifth response

symbol_6

Column with symbol/grapheme connected to sixth response

response_color_6

Column with color of sixth response

response_time_6

(optional) Column with time frompresentation to response, for sixth response


Calculate Total Within Cluster Variance of 3D points

Description

CalculatesTotal Within Cluster Variance(TWCV) of3D points. This function is normally only usedindirectly through 'validate_get_twcv'.

Usage

total_within_cluster_variance(point_matrix, cluster_vector)

Arguments

point_matrix

An n-by-3 numerical matrix where eachrow corresponds to a single point in 3D space.

cluster_vector

A numerical vector of cluster assignments, oflength n (ie one assignment per point).

Value

A one-element numeric vector holding calculated variance

TWCV

TWCV is a synr-specific term for a measure that aims to describe spreadof points in 3D space while taking into account that points belongto distinct clusters.TWCV is calculated in a multi-step process:

  1. Each cluster's centroid is calculated.

  2. All points' squared distances to their corresponding centroids arecalculated.

  3. The point-to-centroid squared distances are summed up.

  4. The sum of squared distances is divided by the total numberof points, minus the number of clusters (to account for decreaseddegrees of freedom).

See Also

centroid_3d_sq_dist


Check if color data are valid and get TWCV

Description

Checks if passed color data are valid, i. e. are bountifuland varied enough according to passed validation criteria. This functionis normally only used indirectly through'Participant$check_valid_get_twcv()' or 'ParticipantGroup$get_valid_twcv()'.

Usage

validate_get_twcv(  color_matrix,  dbscan_eps = 20,  dbscan_min_pts = 4,  max_var_tight_cluster = 150,  max_prop_single_tight_cluster = 0.6,  safe_num_clusters = 3,  safe_twcv = 250)

Arguments

color_matrix

An n-by-3 numerical matrix where eachrow corresponds to a single point in 3D color space.

dbscan_eps

One-element numerical vector: radius of‘epsilon neighborhood’ when applying DBSCAN clustering.

dbscan_min_pts

One-element numerical vector:Minimum number of points required in the epsilon neighborhoodfor core points (including the core point itself).

max_var_tight_cluster

One-element numerical vector:maximum variance for a cluster to be considered 'tight-knit'.

max_prop_single_tight_cluster

One-element numerical vector:maximum proportion of points allowed to be within a 'tight-knit' cluster(if this threshold is exceeded, the data are categorized as invalid).

safe_num_clusters

One-element numerical vector: minimum number ofclusters that guarantees validity if points are 'non-tight-knit'.

safe_twcv

One-element numerical vector: minimum totalwithin-cluster variance (TWCV) score that guarantees validity ifpoints are 'non-tight-knit'.

Value

A list with components

valid

One-element logical vector

reason_invalid

One-element character vector, empty if valid is TRUE

twcv

One-element numeric (or NA if can't be calculated) vector,indicating TWCV

num_clusters

One-element numeric (or NA if can't be calculated)vector, indicating the number of identified clusters counting toward thetally compared with 'safe_num_clusters'

Details

This function relies heavily on the DBSCAN algorithm and its implementationin the R package 'dbscan', for clustering color points. For furtherinformation regarding the 'dbscan_eps' and'dbscan_min_pts' parameters as well as DBSCAN itself, please seethe 'dbscan' documentation. Once clustering is done, passed validationcriteria are applied:

Note that this means data can be classified as valid by either havingat least 'safe_num_cluster' clusters,or by having points composinga smaller number of clusters but spaced relatively far apartwithin these clusters.

The DBSCAN 'noise' cluster only counts towards the 'cluster tally' (compared with 'safe_num_cluster') if it includes at least 'dbscan_min_pts' points.Points in the noise cluster are however always included inother calculations, e. g. total within-cluster variance (TWCV).

See Also

point_3d_variance for single-cluster variance,total_within_cluster_variance for TWCV.


[8]ページ先頭

©2009-2025 Movatter.jp