--- title: "Vignettes" author: "CheonGeunChoi" date: "2023-03-01" mainfont: NanumGothic output: html_document: df_print: paged word_document: default pdf_document: latex_engine: xelatex editor_options: chunk_output_type: inlines --- # 1. A Quick Overview(비네트 고정법) ![Fig](nihms258919f1.jpg) This section assumes that you have already have anchors installed and want a quick introduction/overview. Information on installation, background, and examples of anchors are provide in detail in subsequent sections. All examples and objects described in this document assume that you have loaded the package in an R session, ```{r} # install.packages("anchors_3.0-8.tar.gz", repos = NULL, type = "source") library(anchors) ``` A list of the functions and datasets with help pages can be found using, ```{r} help(package="anchors") ``` For a list of demonstrations of functions, uses of data, and replications of published results, ```{r} demo(package="anchors") ``` The function anchors() has two method= options * B non-parametric rank method from Wand (2007a) * C non-parametric rank method from King et al. (2004) and King and Wand (2007) * There are two other key supporting functions that will be discussed in turn: anchors.order() and chopit() For methods B and C, one can also specify that all combinations of subsets of vignettes (but retaining the same relative order as submitted in the formula) be analyzed using the option anchors(..., combn=TRUE). The default is combn=FALSE since for more than three vignettes, the process requires non-trivial computational time. Details can be found in the later section on vignette selection, and via help(anchors.combn). Datasets with anchoring vignettes that are made available by the anchors package include * chopitsim Simulated Data for test chopit function * mexchn China-Mexico political efficacy data * poleff Simulated Political Efficacy Data * poleffna Simulated Political Efficacy Data with NA (demo only, don't use) * freedom Individual freedom of speech data * sleep Sleep data for china * selfcare Self-care data for china * table1 Reference from Table 1 of King and Wand (2007) * table1src Specific response values that have inequalities to create table1 Any of these can be loaded with data(), for example, ```{r} data(freedom) ``` * Description: Selection of individual-level survey data for freedom of speech. * sage: data(freedom) * Details: A table with 3,500 observations and 12 variables. * sex1 for men and 0 for women * ageAge of respondent in years * educLevels of education, coded as a numeric variable with + 1No formal education + 2Less than primary school education + 3Completed primary school + 4Completed secondary school + 5Completed high school + 6Completed college + 7Completed post-graduate degree * countryCharacter strings consisting of "Oceana", "Eurasia", and "Eastasia", after Orwell's 1984. * selfSelf assessment (see below). * vign1-vign6Response to vignettes (see below). Survey respondents were asked in almost the same language for a self-assessment and for an assessment of several hypothetical persons described by written vignettes. The self assessment (self, in the data set), "How free do you think [name/you] [is/are] to express [him-her/your]self without fear of government reprisal?" was first asked of the survey respondent with respect to him or herself, and then after each of vignette. The possible response categories are: * 1Completely free * 2Very free * 3Moderately free * 4Slightly free * 5Not free at all The vignettes, ordered from most free to least free, are: * vign1[Kay] does not like many of the government's policies. She frequently publishes her opinion in newspapers, criticizing decisions by officials and calling for change. She sees little reason these actions could lead to government reprisal. * vign2[Michael] disagrees with many of the government's policies. Though he knows criticism is frowned upon, he doesn't believe the government would punish someone for expressing critical views. He makes his opinion known on most issues without regard to who is listening. * vign3[Bob] has political views at odds with the government. He has heard of people occasionally being arrested for speaking out against the government, and government leaders sometimes make political speeches condemning those who criticize. He sometimes writes letters to newspapers about politics, but he is careful not to use his real name. * vign4[Connie] does not like the government's stance on many issues. She has a friend who was arrested for being too openly critical of governmental leaders, and so she avoids voicing her opinions in public places. * vign5[Vito] disagrees with many of the government's policies, and is very careful about whom he says this to, reserving his real opinions for family and close friends only. He knows several men who have been taken away by government officials for saying negative things in public. * vign6[Sonny] lives in fear of being harassed for his political views. Everyone he knows who has spoken out against the government has been arrested or taken away. He never says a word about anything the government does, not even when he is at home alone with his family. Demonstration files are available, both to provide examples of the use of functions and as an aid to those who would simply like to re-compute published results that have used versions of the anchors package, anchors.plot Demo of plotting with anchors * chopit Demo of chopit: summary, plot * anchors.freedom Wand et al (2007) rank analysis of freedom * anchors.freedom3 Wand et al (2007) Figure 2 histogram with 3 vignettes * anchors.freedom6 Wand et al (2007) Figure 1 histogram with 6 vignettes * anchors.vign2 King and Wand (2007) Table 1 anchors() * anchors.mexchn King and Wand (2007) Figure 1 histogram * entropy.mexchn King and Wand (2007) Figure 2 entropy() * entropy.sleep King and Wand (2007) Figure 3 entropy() * entropy.self King and Wand (2007) Figure 4 entropy() * anchors.mexchn2 Repl King et al (2004) Figure 2 * chopit.mexchn King et al (2004) Table 2 (non-linear taus) Any of these can be invoked with demo(), for example, ```{r} demo(anchors.freedom) ``` # 2. Introduction fo Anchoring Vignettes ## 2.1. 개념 * 비네트 고정법(anchoring vignette)은 응답자의 응답 편파(혹은 응답 편향, response bias)를 보정하기 위해서 King et al.(2004) 등에 의해 제안되었다. Likert형 측정문항에서 응답자는 개인의 속성에 따라서 문항 및 척도를 다르게 해석(혹은 이해)하여 응답할 수 있는데(김은나, 2015; Chevalier & Fielding, 2011), 비네트 고정법은 이러한 해석(혹은 이해)에 따른 응답의 차이를 보정하기 위한 기법 중 하나이다. * 비네트 고정법에서는 측정 변인(예, 학습 동기)에 대한 전형적인 하/중/상 수준의 사례에 대한 응답자의 반응을 활용하여, 해당 변인과 관련된 측정문항에 대한 응답자의 응답 편파를 보정한다. 예컨대, [그림 1]은 학생들의 학습동기를 5개의 문항으로 측정할 때에 응답자들의 응답 편파를 보정하기 위해 제작된 하/중/상 수준의 비네트 문항들을 제시하고 있다(von Davier et al., 2018). ![Fig4](fig4.png) [그림1] '학습동기'에 대한 비네트 문항 예시 비네트 고정법의 적용 원리는 다음과 같다. [그림2]는 앞서 제시한 [그림1]의 예시에서 두 명의 응답자가 측정문항과 세 개의 사례에 대하여 어떻게 응답했는지 보여주고 있다. 측정문항에서 응답자 A와 응답자 B는 각각 동일한 점수 3에 체크하였지만, 비네트 문항에서는 가상의 같은 사례라도 응답자 B가 A보다 더 높은 점수를 부여한 것을 볼 수 있다. 즉, 같은 측정 변인에 대하여 응답자 A가 응답자 B보다 상대적으로 더 후한 점수를 부여한 것이다. 응답자 A는 측정문항에 대하여 중 수준의 사례와 같은 3점을 부여했지만 응답자 B는 상 수준의 사례와 같은 3점을 부여하였다. 즉, 비네트 문항의 반응 결과를 고려할 때, 해당 측정문항에 대한 응답자 B의 응답 점수 3점은 응답자 A의 응답 점수 3점보다 더 높은 수준으로 인식하고 있다고 해석할 수 있다. 따라서 비모수적 방법을 활용하여 조정점수를 산출하면(이에 대한 자세한 설명은 아래에 있다), A의 조정점수는 2가 되고, B의 조정점수는 6이 된다. 요컨대, 비네트 고정법은 이렇게 같은 측정문항에 대하여 다르게 해석하여 응답하는 두 응답자의 응답을 동등선상에서 비교하기 위하여 응답자의 응답 점수를 조절하는 기법이다. ![Fig5](fig5.png) [그림 2] 비네트 고정법을 활용한 응답자 점수 비교 예시 비네트 고정법이 적절하게 활용되려면 다음과 같은 가정을 만족시켜야 한다(King et al., 2004). 첫 번째 가정은 ‘응답 일관성(response consistency)’이며, 이는 응답자들이 측정문항과 비네트 문항에서 동일한 평가 기준을 적용하여 일관되게 응답해야 한다는 것이다. 두 번째 가정은 ‘비네트 동일성(vignette equivalence)’이며, 이는 모든 응답자들이 비네트 문항에 제시된 사례들에 대하여 특정 측정 변인에 관련한 사례로 동일하게 해석(혹은 이해)해야 한다는 것이다. 예컨대, 위 그림의 예시에서 응답자들은 비네트 문항들이 ‘학습동기’와 관련한 사례라고 공통적으로 해석해야 한다. Consider a survey question along with response categories that is asked as a self-assessment, * How free do you think you are to express yourself without fear of government reprisal? (1) Completely Free, (2) Very Free, (3) Moderately Free, (4) Slightly Free, (5) Not Free at All One key difficulty of analyzing the results from such a survey question is the possibility that individuals apply different standards in the selection of a response category. Researchers have tried to ameliorate the problems of interpersonal and cross-cultural incomparability in survey research with careful question wording, translation (and back translation), focus groups, cognitive debriefing, and other techniques, most of which are designed to improve the survey question. In contrast, anchoring vignettes is a technique that seeks to bring additional data to bear on the problem. For example, vignettes corresponding to the above political freedom question attempt to describe hypothetical individuals who have different levels of freedom from government reprisal. The following six vignettes are intended to correspond to distinct levels of political freedom in order of decreasing freedom, * vign1[Kay] does not like many of the government's policies. She frequently publishes her opinion in newspapers, criticizing decisions by officials and calling for change. She sees little reason these actions could lead to government reprisal. * vign2[Michael] disagrees with many of the government's policies. Though he knows criticism is frowned upon, he doesn't believe the government would punish someone for expressing critical views. He makes his opinion known on most issues without regard to who is listening. * vign3[Bob] has political views at odds with the government. He has heard of people occasionally being arrested for speaking out against the government, and government leaders sometimes make political speeches condemning those who criticize. He sometimes writes letters to newspapers about politics, but he is careful not to use his real name. * vign4[Connie] does not like the government's stance on many issues. She has a friend who was arrested for being too openly critical of governmental leaders, and so she avoids voicing her opinions in public places. * vign5[Vito] disagrees with many of the government's policies, and is very careful about whom he says this to, reserving his real opinions for family and close friends only. He knows several men who have been taken away by government officials for saying negative things in public. * vign6[Sonny] lives in fear of being harassed for his political views. Everyone he knows who has spoken out against the government has been arrested or taken away. He never says a word about anything the government does, not even when he is at home alone with his family. After each of these vignettes, a corresponding evaluation question is asked with the same response categories as for the self-assessment. * How free do you think [name] is to express [him/her]self without fear of government reprisal? (1) Completely Free, (2) Very Free, (3) Moderately Free, (4) Slightly Free, (5) Not Free at All * Note: In the case where there are missing values for responses to the self-assessment or the vignettes, it is important that these be coded as '0' (zero), instead of NA or some other missing value if you wish to retain the other (non-missing) responses of an individual in the parametric model to be described shortly (see chopit). For all non-parametric analysis that rely on anchors or anchors.order, cases with missing responses (either NA or zero) must be listwise deleted. We pro- vide a handy function, replace.value, that facilitates the alteration of the coding of missing values for subsets of variables. # 3. Indexing Notation Our notation is a generalization of King et al. designed to accommodate our enhancements to the various models. We index survey questions, response categories, and respondents as follows: * We index survey questions by the pair ($s; j$), where question set $s (s =1, ..., S)$ corresponds to the self-assessment question number and refers to the set of questions that includes the self-assessment question (indicated by $j = 0$) and, optionally, one or more vignette questions (indicated by $j = 1, ..., J_s$). * We index response categories by $k (k = 1,..., K_s$) separately for each survey question since they can each have different response categories. Each set of questions (self-assessment and vignettes) must have the same number of choice categories (coded as increasing sequential integers starting with 1). Missing values (whether structural, because the question was not asked, or due to nonresponse) should be coded as k = 0. * We index respondents by $i$ or $l$. Respondent $i(i = 1, ..., n)$ is asked all of the self-assessment questions. Respondent $l(l = 1, ..., N)$ is asked all of the vignette questions. (Respondents are indexed for self-assessment and vignette questions separately since each could be asked of independent samples; if they are asked of the same individuals, then $i = l$ and $n = N$.) If your survey design asks each set of vignette questions in separate samples (and separate from the self-assessment question), then index each set of vignettes according to unique values of $l$ and use the missing value code (k = 0) for vignettes that are not asked of a subgroup; in other words, stack the data in block diagonal format. Thus, every mathematical symbol in the model could be indexed by $s, j, k$, and either $i$ or $l$. In practice, we drop indexes that are constant. # 4. A Nonparametric Approach(조정점수 산출방법) ## 4.1. Definition Define $C_{is}$ as the self-assessment relative to the corresponding set of vignettes. Let $y_i$ be the self-assessment response and $z_{i1}, ..., z_{iJ}$ be the J vignette responses, for the $i$th respondent. For respondents with consistently ordered rankings on all vignettes ($z_{j-1} < z_j$ , for $j = 2,..., J$), we create the DIF-corrected self-assessment $C_i$ $$ C_{i} = \begin{pmatrix} 1 & \quad \text{if} \quad y_i < z_{i1} \quad \quad\quad \\ 2 & \quad \text{if} \quad y_i = z_{i1} \quad \quad \quad \\ 3 & \quad \text{if} \quad z_{i1} < y_i < z_{i2}\quad \\ \vdots & \vdots \\ 2J + 1& \quad \text{if} \quad y_i > z_{iJ} \quad \quad\quad \end{pmatrix}$$ Respondents who give tied or inconsistently ordered vignette responses may have an interval values of C, if the tie/inconsistency results in multiple conditions in equation 1 appearing to be true. A more general definition of C is defined as the minimum to maximum values among all the conditions that hold true in equation 1. Values of C that are intervals, rather than scalar, represent the set of inequalities over which the analyst cannot distinguish without further assumption. ![Fig3](fig3.png) ## 4.2. Example Code: anchors(). This example again first loads the library and example dataset, and then anchors() calculates C for each individual. In the non-parametric estimation, only one self-question and corresponding set of vignettes are analyzed at a time. ```{r} summary(freedom) ``` ```{r} a1 <- anchors(self ~ vign2+vign3+vign4+vign5+vign6, freedom, method="C") summary(a1) ``` The names of vignettes must be passed to the function in the same order as the direction of the responses. In the example, vign2 is in the same (highest) direction as the response category 1, while the vign6 is in the same direction (lowest) as the response category 5. (We drop vign1 here for space reason when printing the summary-with the different combinations of intervals of C can be numerous.) * If anchors produces many ties you should check that you passed the vignettes in the correct order, but we also offer a function that investigates the ordering of vignettes in detail. ## 4.3. Example Code: anchors.order(). The function anchors.order(), and the associated methods summary.anchors.order and barplot.anchors.order investigate the relationship between vignette responses without reference to the self- assessment question. ```{r} vo1<-anchors.order(~vign2+vign3+vign4+vign5+vign6, freedom) summary(vo1,top=10,digits=3) ``` In the first column, the numbers in the first column are the index for the vignettes given the order in which they were written (left to right) in the formula passed to anchors.order(). It happens in this example that the index values also correspond to the numbers in the labels of the vignettes, but that need not be the case. Vignettes that have the same response value are placed within {} brackets. The most common set of responses is to give one value for vign1, and another greater value for {vign2,vign3,vign4,vign5}, and the next most common ranking is giving all vignettes the same value (Frequency = 277). The two columns Ndistinct and Nviolation are included to facilitate alternative orderings of the summary of vignette rankings, as well as a quick source of information. For example, the fourth row, 1, {2,4},{3,5}, has Ndistinct = 3 distinct response levels. Although this is easily calculated by counting the number of distinct sets in the first column, having Ndistinct column provides a summary of how many different response values are observed for each constellation of ordering of vignettes. In this example, since there are only five response categories but five vignettes there must be at least one vignette that has the same response values. The maximum Ndistinct value is thus 4. Also in the fourth row, we also have Nviolation = 1 because in these cases the 4th vignette has a value less than the 3th vignette. The column Nviolation is calculated by the number of times any of the vignette responses are strictly contrary to the natural ordering, as given by the user's formula (ordered left to right). In this list of vignette response rankings the careful observer might note that ties and order violations occur one pair of vignettes, between vign3 and vign4. The summary() function seeks to make it easier than staring at this list to identify troublesome patterns by providing two additional summary statistics. Immediately above the listing of orderings is a matrix. In the upper triangle is $p_{ij} - p_{ji}$, such that negative number indicate a disjunction between the order of the listed vignettes and their responses. Continuing the comparison of vign3 and vign4, we have the negative valuesfor $p_{34} - p_{43} = -0.156$, which provides a quick summary that there is an inconsistency of the expert ordering. The proportion of ties between each pair of vignettes is shown in the lower triangle. The proportion of ties in the comparison of vign3 and vign4 is 0.339. There is an issue in the political freedom data with the ordering between vign3 and vign4. Reasonable people might disagree (and apparently the respondents do) about which scenario indicates less freedom: Bob writes letters to newspapers about politics using a pseudonym, while Michael makes his opinion known on most issues without regard to who is listening. For some respondents the mere existence of a media outlet such as a paper to which one could write a letter discussing political subjects may be the more important indicator of freedom than the ability to talk publicly about politics. The substance of the Sonny and Vito vignettes seem to be correctly ordered, but perhaps the reversal is due to whether or not the vignette ends with the statement about men being taken away for speaking out against the government. Further indicating that vign4 describes a more repressive scenario, it is more often tied with the most extreme vign6 than any other vignette. Above this matrix is a matrix the number of times that reversals in responses occur as matrix of pairwise comparison between vignettes. Each cell summarizes the proportion of cases that the vignette listed at the beginning of the row i has a response less than the vignette listed in column $j$. Let $p_{ij}$ be the value in cell $(i, j)$, then $1 - p_{ij} - p_{ji}$ is the number of cases where vignette i and vignette $j$ have the same value. For example, the proportion of cases where vign3 chopit(fo, freedom, options = anchors.options(single.vign.var = TRUE)) * 2. Another option is parameterization defined by the extreme vignettes. Let $\theta_1 = 0$ and $\theta_J = 1$. This lets estimates of $\mu$ be interpreted on the scale of the vignettes, with 0 being the level of the lowest vignette and 1 the level of the highest. Note that $\mu$ can still be higher than 1 or lower than 0. * To identify the model by setting $\theta_1 = 0$ and $\theta_J = 1$, use the option R> chopit(fo, freedom, options = anchors.options(normalize = "hilo")) * Caution: The order of the vignettes does matter for this normalization. If you constrain the $\=theta$ parameters to have an order different from what would be estimated without constraints, odd results such as extremely large standard errors and implausibly large parameter estimates can occur. $Hint$: if in doubt, use the normalize = "self" model first to establish the order of the vignettes ## 5.6. Additional options There are a variety of options, among which the following are the most often used, * Instead of the default optimizer optim(), use genoud() (Mebane and Sekhon 2009a,b): e.g., ```{r} cout2 <- chopit(fo, freedom, options = anchors.options(optimizer = "genoud", start = cout$parm, print.level = 1)) ``` As there is as yet no proof of the global concavity of the chopit likelihood, a prudent researcher should investigate whether a choit model fitted using optim() is potentially at a local maximum rather than the global maximum of likelihood. Genoud does not rely on global concavity of the likelihood, and is an efficient approach to finding the global maximum. * The option use.gr toggles whether or not to use the analytical gradients that have been derived for the model with a linear parameterization of cutpoints. If use.gr = TRUE then analytical gradients are used. The use of numerical gradients via use.gr = FALSE, which is currently required if $\tau$ are specified as non-linear function, is significantly more time consuming to estimate. * See help("chopit") and demo("chopit") for additional examples of options. ### C_MinEnt ```{r} freedom2$C_minent <- (freedom2$Ce + freedom2$Cs)/2 table(freedom2$self, freedom2$C_minent) ``` # 6. Manual ## 6.1. Insert DIF-corrected variable into original data frame ```{r} data(freedom) ra <- anchors(self ~ vign1 + vign3 + vign6, data = freedom, method="B") freedom3 <- insert(freedom, ra ) names(freedom) ``` # 7. Reference * https://blog.naver.com/smileaddict/222569930575