****************************************************************** * This program randomly generates a list of two integers that have a predetermined correlation * It then generates five additional variables that are allowed to vary freely * This program is used to determine the attribute values in the implicit learning scale * Comments in boxes describe how to alter the program to change the nature of the correlated attributes ****************************************************************** new file. input program. * Working variables numeric #a01 to #a100 #b01 to #b100 #c01 to #c100 #d01 to #d100 #n. numeric #t #i #gamma #curcorr #change #store #step #sumx #sumy #sumx2 #sumy2 #sumxy. numeric #meana #meanb #meanc #sda #sdb #sdc #corr #neg. vector A = #a01 to #a100. vector B = #b01 to #b100. vector C = #c01 to #c100. vector D = #d01 to #d100. ****************************************************************** * #n is the number of observations (used in formulas below) * However, you can't just change this number. You must do a general search and replace of * whatever the number is here (100 by default) with whatever number of observations that * you want the data set to have. ****************************************************************** compute #n = 100. * Output variables ****************************************************************** * c1 and c2 are the correlated attributes ****************************************************************** numeric c1 c2. ****************************************************************** * Define the desired correlation of the final attributes * #corr should always be a positive value * If you want a negative correlation set #neg = 1. * i.e., if you want a correlation of -.25, set #corr = .25 and #neg = 1 ****************************************************************** compute #corr = .8. compute #neg = 0. * The first attribute variable will be a function of A * The second attribute variable will be a function of C = A + gamma*B * A acts as a source of systematic variability (SSR) while gamma*B acts as a source of random error (SSE) * Program tries different values of gamma until the #correlation between A and C is close to the desired value * Fill the A and B vectors with values from the standard normal distribution loop #t = 1 to #n. + compute A(#t) = rv.normal(0,1). + compute B(#t) = rv.normal(0,1). end loop. * Make sure A and B have a negative #correlation compute #sumx = 0. compute #sumy = 0. compute #sumx2 = 0. compute #sumy2 = 0. compute #sumxy = 0. loop #t = 1 to #n. + compute #sumx = #sumx + A(#t). + compute #sumy = #sumy + B(#t). + compute #sumx2 = #sumx2 + A(#t)**2. + compute #sumy2 = #sumy2 + B(#t)**2. + compute #sumxy = #sumxy + A(#t)*B(#t). end loop. compute #curcorr = (#n*#sumxy - #sumx*#sumy)/(sqrt(#n*#sumx2 - #sumx**2)*sqrt(#n*#sumy2-#sumy**2)). do if #curcorr > 0. + loop #t = 1 to #n. + compute B(#t) = -B(#t). + end loop. end if. * Standardize A and B compute #meana = mean(#a01 to #a100). compute #sda = sd(#a01 to #a100). compute #meanb = mean(#b01 to #b100). compute #sdb = sd(#b01 to #b100). loop #t = 1 to #n. + compute A(#t) = (A(#t) - #meana)/#sda. + compute B(#t) = (B(#t) - #meanc)/#sdb. end loop. * Determine the value of #gamma that will give the desired correlation * Step #gamma up by 1 until the correlation between A and C is less than the desired value compute #gamma = 0. compute #curcorr = 1. compute #change = 1. * #step variable determines whether program is in forward increasing phase (1) * or the subsequent halving phase (-1) compute #step = 1. loop if abs(#corr-#curcorr) > .005. * Determine the new value for gamma do if (#step = 1). + do if (#curcorr > #corr). + compute #gamma=#gamma+1. + else. + compute #step = -1. + compute #change = #change/2. + compute #gamma= #gamma - #change. + end if. else. + do if (#curcorr > #corr). + compute #change = #change/2. + compute #gamma = #gamma + #change. + else. + compute #change = #change/2. + compute #gamma = #gamma - #change. + end if. end if. * Compute values for C and D loop #t = 1 to #n. + compute C(#t) = A(#t) + #gamma*B(#t). + compute D(#t) = A(#t). end loop. * Rescale C and D * Standardize C (D is already standardized) compute #meanc = mean(#c01 to #c100). compute #sdc = sd(#c01 to #c100). loop #t = 1 to #n. + compute C(#t) = (C(#t) - #meanc)/#sdc. end loop. * Create C and D as integers with the desired mean and sd ****************************************************************** * The statements inside the loop determine the mean and sd of the correlated attributes * Changing these equations will change the distributions of your correlated variables * These variables do not necessarily have to have the same mean and sd ****************************************************************** loop #t = 1 to #n. + compute C(#t) = rnd(C(#t)*2 + 10). + compute D(#t) = rnd(D(#t)*2 + 10). end loop. * Compute the correlation between C and D compute #sumx = 0. compute #sumy = 0. compute #sumx2 = 0. compute #sumy2 = 0. compute #sumxy = 0. loop #t = 1 to #n. + compute #sumx = #sumx + D(#t). + compute #sumy = #sumy + C(#t). + compute #sumx2 = #sumx2 + D(#t)**2. + compute #sumy2 = #sumy2 + C(#t)**2. + compute #sumxy = #sumxy + D(#t)*C(#t). end loop. compute #curcorr = (#n*#sumxy - #sumx*#sumy)/(sqrt(#n*#sumx2 - #sumx**2)*sqrt(#n*#sumy2-#sumy**2)). end loop. * Reverse code C if a negative correlation is wanted do if (#neg = 1). + compute #meanc = mean(#c01 to #c100). + loop #t = 1 to #n. + compute C(#t) = #meanc - (C(#t) - #meanc). + end loop. end if. * Create the data set loop #t = 1 to #n. + compute c1 = D(#t). + compute c2 = C(#t). + end case. end loop. end file. end input program. execute. CORRELATIONS /VARIABLES=c1 c2 /PRINT=TWOTAIL NOSIG /MISSING=PAIRWISE .