varargout = itemAlign(dataset1, varargin)

ITEMALIGN is a data organization utility that uses one set of data to identify corresponding entries in a second set of data, then aligns the two datasets for later usage or, optionally, computes basic statistics on the result.

The following usage is preferred when item data are in matrix form (e.g., brain images) and a vector of labels is readily available. [aligned_item_matrices, aligned_labels, aligned_conds, info] = ... itemAlign(item_matrix1, item_matrix2, item_labels1, item_labels2, ... [item_conds1], [item_conds2], [opt_args]) input args item_matrix1: Data matrix of items by variables (i.e., of size n1 x m, where n1 is the number of items). item_matrix2: Data matrix of items by variables (i.e., of size n2 x m, where n2 is the number of items). item_labels1: Vector of item numbers or cell vector of item strings of length n1, with each element describing a corresponding row in item_matrix1. Items are allowed to recur. item_labels2: Vector of item numbers or cell vector of item strings of length n2, with each element describing a corresponding row in item_matrix1. Items are allowed to recur. [item_conds1]: Optional vector of cond numbers or cell vector of cond names of length n1 used to identify the condition to which each item belongs. [item_conds2]: Optional vector of cond numbers or cell vector of cond names of length n1 used to identify the condition to which each item belongs. output args aligned_item_matrices: a struct containing aligned_item_matrix1 and aligned_item_matrix2, sorted such that each position of aligned_item_matrix2 contains a value for the corresponding item in the same position of aligned_item_matrix1. aligned_labels: a vector containing the item label for each row of the aligned item matrices. aligned_conds: a struct array containing cond_vec1 and cond_vec2, with each element of each vector containing a cond label for the corresponding row of the aligned item matrices. cond_vec1 contains cond labels obtained from set1, and may not be equivalent to those of set2. USAGE 2 The following usage is preferred where items are organized into cell arrays by condition (e.g., in SPM). [aligned_cond_arrays, aligned_labels, info] = itemAlign(cond_arrays1, ... cond_arrays2, item_labels1, item_labels2, [opt_args]) input args cond_array1: A cell vector of length c1, with each cell corresponding to a data matrix of items belonging to a particular condition (i.e., each cell contains a data matrix in the format n1{c} x m, where n1{c} is the number of items in a given condition). cond_array2: A cell vector of length c2, with each cell corresponding to a data matrix of items belonging to a particular condition (i.e., each cell contains a data matrix in the format n2{c} x m, where n2{c} is the number of items in a given condition). item_labels1: A cell vector of length c, with each cell corresponding to a set of labels for the corresponding cell in cond_arrays1. These can be numeric arrays or cell arrays of strings, but each vector must have length n1{c}. Items are allowed to recur. item_labels2: A cell vector of length c, with each cell corresponding to a set of labels for the corresponding cell in cond_arrays2. These can be numeric arrays or cell arrays of strings, but each vector must have length n2{c}. Items are allowed to recur. output args aligned_cond_arrays: A struct containing aligned_cond_array1 and aligned_cond_array2, which remain in the format of cond_arrays 1 and 2, but are sorted such that each position of aligned_cond_array2 contains a value for the corresponding item in the same position of aligned_cond_array1. Both aligned_cond_arrays will be of length c1 with each element having length of c2. aligned_labels: A vector of the same format of item_labels1 and 2, but contain the item label for each row of the aligned item matrices USAGE 3 The following usage is preferred where items are contained in the form of a datastruct supplied by a SuperPsychToolbox function (e.g., easyKeys or easyType). [aligned_tables, condmap, info] = itemAlign(item_table1, ... item_table2, [opt_args]) input args item_table1: A SuperPsychToolbox datastruct, a trials table from such a struct, or any other table that contains trials data in rows and a column "stim_id". Items are allowed to recur and elements do not need to be unique (e.g., if an item was presented multiple times). If a "cond" column is present, this is used to group items within condition. item_table2: A second struct/table with similar format. Items do not have to occur in the same order as in item_table1 and do not need to be unique. output args aligned_tables: A struct containing aligned_item_table1 and aligned_item_table2, which remain in the format of item_tables 1 and 2, but are sorted such that each position of aligned_item_table1 contains a value for the corresponding item in the same position of aligned_item_table2. condmap: Information about the condition maps for item_table1 and item_table2. When 'data_merge' is set to 'none', condmap will be a struct containing a comdmap for each aligned set. Otherwise, condmap will be a cell array of condmaps reflecting the condmaps of each set as well as a master condmap at the end. In the case that itemAlign is operating with usage1 or usage2 and the 'output_format' parameter is set to 'table', condmap will be supplied as an empty cell. When table inputs are supplied, itemAlign will attempt to construct condmaps from the cond_str and cond columns of item_table1 and item_table2. Across all usages, info is an optional output struct that contains: - item_table: a table of of all items encountered with n rows, where n is the superset of items from set1 and set2, with the following columns: - 'stim_id' contains the item label for each item. - 'set1_count' and 'set2_count' contain counts of the number of times the item was encountered in sets 1 and 2, respectively. - 'set1_index', 'set2_index', 'aligned_index' contain the row position of each item within the set1 and set2 inputs, as well as the aligned output (includes cell position if applicable). - 'set1_cond' and 'set2_cond' containst the condition label for the item in set1 vs. set2. - set1_orphans: a vector of item labels from set1 that were not matched with any label from set 2. - set2_orphans: a vector of item labels from set2 that were not matched with any label from set 1. Optional parameter-value pairs flatten_conds: Boolean to indicate whether condition information should be ignored for sorting purposes, such that items are aligned across conditions rather than just within them. When set to false, each condition is effectively treated as a separate analysis. Default is FALSE. repeats_set1: Items that have multiple occurances must be handled during matching, with multiple possible strategies for doing so. The following tokens may be supplied: 'multiple' - each instance of an item is matched with the corresponding item(s) from set2. This will cause the value(s) from set2 to be used multiple times. Only one of repeat_set1 and repeat_set2 can use this option at a time. 'mean' (default) - take the mean of repeated values, such that each item number appears in the output only once. [any other string] - the function will attempt to evaluate the string as a function handle on repeated values (e.g., 'median', 'std', 'sum'). repeats_set2: Items that have multiple occurances must be handled during matching, with multiple possible strategies for doing so. Token usage is as with repeats_set1. omit_missing: Boolean to indicate whether items that don't have matches across the two sets should be omitted rather than supplying NaN placeholders. Output within 'info' output is unaffected. Default is FALSE. data_merge: Matched items across sets can optionally be post- processed in various ways. If used, only the merged output will be provided in aligned_[item_matrices] [cond_arrays][tables] (non-numeric columns of tables will be concatenated as _set1 or _set2) 'none' (default) - no merging operation performed. 'diff' - values in set2 are subtracted from corresponding values in set1. [any other string] - the function will attempt to evaluate the string as a function handle on the two sets of values (e.g., 'mean', 'sum'). stats: Boolean to indicate whether basic statistics should be run on the output, generating a new field 'stats' in the output variable "info" (if caught). Default is FALSE. The statistics include: - mean: the mean of set1 and set2 items for each variable / column m. - diff: the mean difference of set1 and set2 items for each variable / column m. - boot: the bootstrapped probability that, for each variable / column m, the samples from sets 1 and 2 were drawn from the same distribution. - corr: the correlation between set1 and set2 values for each variable / column m. - bootcorr: the bootstrapped probability that, for each variable / column m, the correlation between values in sets 1 and 2 is significant. output_format: The output can be modified such that it conforms to a format other than that with which it was entered. For instance, data using cond_array of usage 2 can be altered to be output in the matrix or table formats of usages 1 and 3. To have this effect, set this parameter to 'matrix', 'cell_array', or 'table'. Default: original format. force_redundancy: When repeated items are found, non-numeric data is collected into a cell array in the merged row. If set to false (default), the resulting cell array will be checked. If all values in the cell are identical, the cell will be collapsed to the single value. ADVANCED USE - to organize or summarize one set of data by item, run the function with set2 input parameters set to empty. - data_merge can be used to reduce memory usage where large datasets are concerned. Written by Jeff Mountjoy April 9, 2015