NotificationsYou must be signed in to change notification settings
Fork5
Star6

Testing sampling weights on lsasim

wleoncio edited this pageOct 22, 2019 ·2 revisions

Introduction

About this document

Thank you for your help in testing lsasim, an R package for simulatingLarge Scale Assessment (LSA) data.

This document is intended to aid the development of the next stableversion of lsasim (possibly numbered 2.1.0). The current stable versionof lsasim (2.0.0) is available onCRAN and onGitHub.

I invite you to read the next subsections of this introduction—availableon the other tabs up there—even if you’re familiar with lsasim. Thefinal subsection (How to contribute totesting) is especially useful in showingyou how to give feedback on your tests.

This document should be a standalone guide to working with thecluster_gen function oflsasim for testing purposes. However, it isstill a work in progress and feedback on missing or incorrectinformation is welcome. In addition, the help file ofcluster_gen maybe useful in understanding how the function works. You can access thefunction documentation by runninghelp("cluster_gen", "lsasim") or?lsasim::cluster_gen in the R terminal.

Quick history of lsasim

The table below contains information about the three stable releases oflsasim. The innovations under testing in this document will be part ofthe next stable release.

Version	CRAN release date	Innovations
1.0.0	2017-02-23	Simulates cognitive and background test data
1.0.1	2017-05-10	Bug fixes
2.0.0	2019-09-12	Expanded functionality of background questionnaires

How to contribute to testing

We appreciate any help in the development of lsasim. In order to makethe best of everyone’s time, though, it is desirable that the testerhas:

Access to R version 3.6.0 or newer
Permission to install R packages in their working computer
Knowledge of sampling weights, especially:
1. How to calculate sampling weights
2. How those weights are usually calculated in LSAs
3. How such data is usually displayed to analysts of LSA datasets

How to give feedback

In order to keep things organized (and make sure your contributiongets officially recorded), bugs should ideally be reported to https://github.com/tmatta/lsasim/issues/.This requires you to have a (free) GitHub account. If you have foundseveral examples of the same issue, please report them as one issue.
As an alternative to using ourGitHubissues tracker, you cansend ane-mail to the packagemaintainer.

Future features to be tested

Replicate weights
Within and between group correlation

Testing sampling weights

Installing lsasim (development version)

The development version of lsasim can be downloaded from GitHub byissuing the following command on your R console.

First, install the remotes package. You can skip this step if remotes isalready installed on your machine. If you don’t know if remotes isinstalled on your machine, try runninglibrary(remotes) and see ifthere are any errors.

install.packages("remotes")

If the installation goes well, you should see this at the bottom of theoutput:

## * DONE (remotes)## ## The downloaded source packages are in

Next, we use theinstall_github function to install the developmentversion of lsasim locally. There are actually two versions to choosefrom:

The recommended version, 2.0.0.9103 (older, but more stable and withresults comparable to this document)
The bleeding edge version (newer, but less stable and with resultsthat will differ from this document even with equal seeds)

To install the recommended version, please run the following on your Rterminal:

remotes::install_github("tmatta/lsasim", ref="v2.0.0.9103")

The bleeding edge version (> 2.0.0.9103) is available by simplychanging theref argument:

remotes::install_github("tmatta/lsasim", ref="develop")

Note: Installing the version from the develop branch will result in more features but results that are different from the ones shown in this document. If you would like to reproduce the results shown here, you must install version 2.0.0.9103.

After issuinginstall_github, R will tell the user it is checking,preparing, excuting and testing the installation of lsasim.The mostimportant output is the final message, which should read “DONE(lsasim)”. Ir could also read something like “Skipping install of‘lsasim’ from a github remote, the SHA1 (…) has not changed since lastinstall”, which means that you already have the latest version. In thesecases, you can force the installation by includingforce=TRUE as anargument toinstall_github. This can be useful in cases where a newversion is available but R fails to recognize the difference betweenthat version and the one installed on your computer.

Finally, we load the installed lsasim package to our current R sessionand check the build version (your output ofpackageVersion shouldmatch the output below (boxes containing lines beginning withdouble-hashes (##) are the expected output).

library(lsasim)packageVersion("lsasim")## [1] '2.0.0.9103'

Once lsasim is installed and loaded, you are ready to test it. Clickthe next tab to continue.

Generating clustered test data

This test concerns the generation of sampling weights for backgroundquestionnaire data generated in a hierarchical structure. Eachhierarchical level is composed of clusters, which can be sampled from apopulation using either Simple Random Sampling (SRS) or withProbabilities Proportional to Size (PPS).

Basic background questionnaire data generation is handled by thefunctionquestionnaire_gen, present in lsasim since its first release.The way cluster background data generation works is through a functioncalledcluster_gen, which callsquestionnaire_gen on each clusterlevel.

Two-level structures

We will start with a simple example, where 2 schools and 10 students ineach school are selected. This structure is represented by the followingvector:

n1 <- c(2, 10)

The structure can be checked with the functiondraw_cluster_sctructure, which creates a visual representation of thehierarchical tree in the R console:

draw_cluster_structure(n1)  # pay no mind to the "NULL" printed at the end## school1 (10 students)## school2 (10 students)## NULL

It may not look like much now, but when more complex scenarios startshowing up, this visual representation can really help one understandwhat is going on!

In order to generate clustered responses forn1, we call thecluster_gen function, which is the star of this test. The firstargument ofcluster_gen is calledn and corresponds to the number ofsampled observations on each level. Two ways of callingcluster_genwithn = n1 arecluster_gen(n = n1) andcluster_gen(n1), whereomittingn = just tells R to assume that the order of the argumentsyou are passing is the same one the function expects. To see theargument order thatcluster_gen expects, see the “Usage” section ofthe?cluster_gen help page.

Theset.seed function we call right beforecluster_gen is there tomake sure that your data will match the output below. If that command isdropped, the test results will change each timecluster_gen is called.

set.seed(1234)cluster_gen(n1)## ── Hierarchical structure ──────────────────────────────────────────────────────────────────────────## Generating questionnaires for schools## Total respondents: 20 (10 + 10)## school1 (10 students)## school2 (10 students)## ── Information on sampling weights ─────────────────────────────────────────────────────────────────## - Calculating PPS weights at the school level##   final.student.weight should add up to the number of students in the population (20)## $school## $school[[1]]##    subject           q1 q2 q3 q4 q5 q6 school.weight within.school.weight final.student.weight## 1        1  0.005006950  2  2  1  2  2             1                    1                    1## 2        2 -0.037630263  1  1  1  2  1             1                    1                    1## 3        3  0.723976061  2  2  1  2  1             1                    1                    1## 4        4 -0.496738863  2  2  1  2  1             1                    1                    1## 5        5  0.011395161  2  1  1  2  1             1                    1                    1## 6        6  0.009859946  1  2  2  1  2             1                    1                    1## 7        7  0.678271423  1  1  1  2  2             1                    1                    1## 8        8  1.029563029  2  2  1  2  1             1                    1                    1## 9        9 -1.729528504  1  2  2  2  1             1                    1                    1## 10      10 -2.204348095  1  1  2  2  2             1                    1                    1##             uniqueID## 1   student1_school1## 2   student2_school1## 3   student3_school1## 4   student4_school1## 5   student5_school1## 6   student6_school1## 7   student7_school1## 8   student8_school1## 9   student9_school1## 10 student10_school1## ## $school[[2]]##    subject           q1 q2 q3 q4 q5 q6 school.weight within.school.weight final.student.weight## 1        1 -0.242559707  1  1  1  1  2             1                    1                    1## 2        2  2.187119161  1  2  2  2  2             1                    1                    1## 3        3 -0.581727450  1  1  2  1  1             1                    1                    1## 4        4  0.700080227  2  1  2  1  1             1                    1                    1## 5        5  1.492176579  1  2  1  1  1             1                    1                    1## 6        6  0.526553441  1  1  2  1  2             1                    1                    1## 7        7  1.037772101  2  2  2  2  2             1                    1                    1## 8        8 -1.860716351  1  1  2  1  1             1                    1                    1## 9        9 -0.426574240  2  1  2  1  1             1                    1                    1## 10      10 -0.001137045  1  1  2  1  1             1                    1                    1##             uniqueID## 1   student1_school2## 2   student2_school2## 3   student3_school2## 4   student4_school2## 5   student5_school2## 6   student6_school2## 7   student7_school2## 8   student8_school2## 9   student9_school2## 10 student10_school2

Notice howcluster_gen prints the cluster strucute as well as otherimportant information before showing the background data itself. Thiscan be disabled by insertingverbose = FALSE into thecluster_gencall.

By default,cluster_gen will determine the number of continuous (X)and categorical (W) background questions. In this case,X = {X₁} (represented in the output byq1) andW = {W₁, …, W₅} (represented in the outputbyq2 throughq6). This can be customized, and for the sake ofsimplicity, we will have only one categorical background variable and nocontinuous variables. This time, the output will also be assigned todata, which is finally printed for us to see what it looks like.

set.seed(2345)data <- cluster_gen(n1, n_X = 0, n_W = list(1))## ── Hierarchical structure ──────────────────────────────────────────────────────────────────────────## Generating questionnaires for schools## Total respondents: 20 (10 + 10)## school1 (10 students)## school2 (10 students)## ── Information on sampling weights ─────────────────────────────────────────────────────────────────## - Calculating PPS weights at the school level##   final.student.weight should add up to the number of students in the population (20)data## $school## $school[[1]]##    subject q1 school.weight within.school.weight final.student.weight          uniqueID## 1        1  3             1                    1                    1  student1_school1## 2        2  3             1                    1                    1  student2_school1## 3        3  1             1                    1                    1  student3_school1## 4        4  1             1                    1                    1  student4_school1## 5        5  3             1                    1                    1  student5_school1## 6        6  3             1                    1                    1  student6_school1## 7        7  3             1                    1                    1  student7_school1## 8        8  4             1                    1                    1  student8_school1## 9        9  4             1                    1                    1  student9_school1## 10      10  2             1                    1                    1 student10_school1## ## $school[[2]]##    subject q1 school.weight within.school.weight final.student.weight          uniqueID## 1        1  1             1                    1                    1  student1_school2## 2        2  4             1                    1                    1  student2_school2## 3        3  1             1                    1                    1  student3_school2## 4        4  4             1                    1                    1  student4_school2## 5        5  4             1                    1                    1  student5_school2## 6        6  2             1                    1                    1  student6_school2## 7        7  4             1                    1                    1  student7_school2## 8        8  2             1                    1                    1  student8_school2## 9        9  3             1                    1                    1  student9_school2## 10      10  4             1                    1                    1 student10_school2

Notice hown_W is defined as a list where each element—only one inthis case—corresponds to the number of variables at a particular level.This is done so thatn_W can support more complex calls such asn_W = list(list(2, 2), 5), which corresponds to tellingcluster_genthat the first level will have two binary categorical variables and thesecond level will have 5 categorical variables (the number of categoriesbeing randomly determined).

Three-level structures

Let us now consider a second hierarchical structure, composed of acluster of 2 schools which are divided into 3 classes each; each classcontains 5 students:

n2 <- c(2, 3, 5)set.seed(2345)cluster_gen(n2)## ── Hierarchical structure ──────────────────────────────────────────────────────────────────────────## Generating questionnaires for schools, classes## Total respondents: 36 (3 + 3 + 5 + 5 + 5 + 5 + 5 + 5)## school1## ├─school1_class1 (5 students)## ├─school1_class2 (5 students)## └─school1_class3 (5 students)## school2## ├─school2_class1 (5 students)## ├─school2_class2 (5 students)## └─school2_class3 (5 students)## ── Information on sampling weights ─────────────────────────────────────────────────────────────────## - Calculating PPS weights at the school level##   final.teacher.weight should add up to the number of teachers in the population (6)## - Calculating SRS weights at the class level##   class.weight should add up to the number of classes in the population (6, counting once per class)## $school## $school[[1]]##   subject          q1 q2 q3 school.weight within.school.weight final.teacher.weight       uniqueID## 1       1 -0.16566248  1  2             1                    1                    1 class1_school1## 2       2 -0.88234450  1  1             1                    1                    1 class2_school1## 3       3 -0.01332182  2  2             1                    1                    1 class3_school1## ## $school[[2]]##   subject          q1 q2 q3 school.weight within.school.weight final.teacher.weight       uniqueID## 1       1  0.07879383  1  1             1                    1                    1 class1_school2## 2       2 -0.88209970  2  2             1                    1                    1 class2_school2## 3       3  0.89263571  2  1             1                    1                    1 class3_school2## ## ## $class## $class[[1]]##   subject         q1 q2 q3 q4 q5 q6 class.weight within.class.weight final.student.weight## 1       1  2.1527725  2  1  2  1  2            1                   1                    1## 2       2  0.5173488  1  2  2  1  2            1                   1                    1## 3       3 -1.2601526  2  1  2  2  2            1                   1                    1## 4       4  0.4095549  1  1  1  2  1            1                   1                    1## 5       5 -0.3379999  2  1  2  1  1            1                   1                    1##                  uniqueID## 1 student1_class1_school1## 2 student2_class1_school1## 3 student3_class1_school1## 4 student4_class1_school1## 5 student5_class1_school1## ## $class[[2]]##   subject          q1 q2 q3 q4 q5 q6 class.weight within.class.weight final.student.weight## 1       1 -2.15015256  1  1  2  2  1            1                   1                    1## 2       2  1.63216373  2  1  1  1  2            1                   1                    1## 3       3  0.47573673  2  2  1  2  2            1                   1                    1## 4       4 -1.10436289  1  2  2  2  2            1                   1                    1## 5       5 -0.05614962  2  1  1  2  1            1                   1                    1##                  uniqueID## 1 student1_class2_school1## 2 student2_class2_school1## 3 student3_class2_school1## 4 student4_class2_school1## 5 student5_class2_school1## ## $class[[3]]##   subject         q1 q2 q3 q4 q5 q6 class.weight within.class.weight final.student.weight## 1       1  1.0676829  1  1  1  1  1            1                   1                    1## 2       2 -1.0448467  2  1  1  2  2            1                   1                    1## 3       3  0.7418229  1  1  2  1  1            1                   1                    1## 4       4 -0.2396375  2  2  1  1  1            1                   1                    1## 5       5  0.5653863  1  2  2  1  1            1                   1                    1##                  uniqueID## 1 student1_class3_school1## 2 student2_class3_school1## 3 student3_class3_school1## 4 student4_class3_school1## 5 student5_class3_school1## ## $class[[4]]##   subject          q1 q2 q3 q4 q5 q6 class.weight within.class.weight final.student.weight## 1       1 -0.31211831  2  2  1  1  2            1                   1                    1## 2       2 -1.06488440  2  2  1  2  1            1                   1                    1## 3       3  0.06095831  2  1  1  1  2            1                   1                    1## 4       4  0.74802298  1  2  1  1  2            1                   1                    1## 5       5  2.74479129  1  1  1  1  2            1                   1                    1##                  uniqueID## 1 student1_class1_school2## 2 student2_class1_school2## 3 student3_class1_school2## 4 student4_class1_school2## 5 student5_class1_school2## ## $class[[5]]##   subject         q1 q2 q3 q4 q5 q6 class.weight within.class.weight final.student.weight## 1       1  0.6141850  2  1  1  1  1            1                   1                    1## 2       2  1.8841624  1  1  2  1  2            1                   1                    1## 3       3 -0.2516623  2  2  2  2  1            1                   1                    1## 4       4  0.7501333  2  1  2  2  2            1                   1                    1## 5       5  0.4777128  2  1  1  1  2            1                   1                    1##                  uniqueID## 1 student1_class2_school2## 2 student2_class2_school2## 3 student3_class2_school2## 4 student4_class2_school2## 5 student5_class2_school2## ## $class[[6]]##   subject         q1 q2 q3 q4 q5 q6 class.weight within.class.weight final.student.weight## 1       1 -0.4050786  2  1  1  2  1            1                   1                    1## 2       2  0.4307551  2  2  1  2  1            1                   1                    1## 3       3 -0.3358192  2  1  2  1  2            1                   1                    1## 4       4 -0.4681827  1  2  1  2  2            1                   1                    1## 5       5  0.5989933  1  1  2  2  2            1                   1                    1##                  uniqueID## 1 student1_class3_school2## 2 student2_class3_school2## 3 student3_class3_school2## 4 student4_class3_school2## 5 student5_class3_school2

Notice how the output above contains 2 school questionnaires with 3answers each (from the teachers who answered for the classes) as well as2 × 3 = 6 questionnaires, each of which applied 5 students in eachclass. Notice how the teacher questionnaires are the same, with oneXand 2W variables, and the student questionnaires are also the same,with oneX and 5Ws. By default, the means of the continuousvariables are the same (0), and the proportions of the categoricalvariables are randomly determined.

n1 andn2 are unnamed vectors, socluster_gen determined the namesof the clusters itself using a pre-built sequence. Nonetheless, the useris free to use whatever labels they want. This can be done either bypassing names to then argument or by passing character vectors to thecluster_labels andresp_labels arguments. See the examples below:

cluster_gen(n = c(a = 2, b = 3))## ── Hierarchical structure ──────────────────────────────────────────────────────────────────────────## Generating questionnaires for as, bs## Total respondents: 6 (3 + 3)## a1 (3 bs)## a2 (3 bs)## ── Information on sampling weights ─────────────────────────────────────────────────────────────────## - Calculating SRS weights at the a level##   a.weight should add up to the number of as in the population (2, counting once per a)## $a## $a[[1]]##   subject         q1         q2 q3 q4 q5 q6 q7 q8 q9 a.weight within.a.weight final.b.weight## 1       1  3.9427498 -3.5449571  1  1  1  1  2  1  2        1               1              1## 2       2 -0.5029523  0.3083131  1  2  1  1  2  2  2        1               1              1## 3       3 -0.6693996 -0.6230736  1  2  1  1  1  1  1        1               1              1##   uniqueID## 1    b1_a1## 2    b2_a1## 3    b3_a1## ## $a[[2]]##   subject        q1        q2 q3 q4 q5 q6 q7 q8 q9 a.weight within.a.weight final.b.weight uniqueID## 1       1 0.2319406 1.4823528  2  2  2  1  2  1  2        1               1              1    b1_a2## 2       2 1.9505050 0.2887602  1  1  1  2  2  1  2        1               1              1    b2_a2## 3       3 0.4564811 0.5387870  2  1  2  2  2  1  2        1               1              1    b3_a2cluster_gen(n = c(2, 3), cluster_labels = c("group"), resp_labels = c("person"))## ── Hierarchical structure ──────────────────────────────────────────────────────────────────────────## Generating questionnaires for groups## Total respondents: 6 (3 + 3)## group1 (3 persons)## group2 (3 persons)## ── Information on sampling weights ─────────────────────────────────────────────────────────────────## - Calculating SRS weights at the group level##   group.weight should add up to the number of groups in the population (2, counting once per group)## $group## $group[[1]]##   subject         q1 q2 q3 q4 group.weight within.group.weight final.person.weight       uniqueID## 1       1 -0.3544601  1  1  2            1                   1                   1 person1_group1## 2       2  0.5769583  1  1  1            1                   1                   1 person2_group1## 3       3 -0.6274591  1  1  2            1                   1                   1 person3_group1## ## $group[[2]]##   subject         q1 q2 q3 q4 group.weight within.group.weight final.person.weight       uniqueID## 1       1  0.5236884  1  2  2            1                   1                   1 person1_group2## 2       2 -1.0585904  2  2  1            1                   1                   1 person2_group2## 3       3 -0.0260521  1  2  2            1                   1                   1 person3_group2

Your data should vary from the output below (due to the lack of a fixedseed), but the labels and the hierarchical structure should be the same.

Asymmetric structures

As we said before,n corresponds to the number of sampled observationson each level. This means that each level will have the same number ofsublevels, in what one could call a symmetric hierarchical structure.Asymmetric structures can also be determined, and they use the followingsyntax (the vector velow is named for convenience, but it may also benameless):

n3 <- list(sch = 3, cls = c(2, 1, 2), stu = c(5, 4, 2, 3, 2))

The list above corresponds to 3 schools, each one containing 2, 1 and 2classes. These 5 classes respectively contain 5, 4, 2, 3 and 2 students.

As you can imagine, this sort of structure can easily become complicatedto imagine. This is when thedraw_cluster_structure function can behelpful:

draw_cluster_structure(n3)## sch1## ├─sch1_cls1 (5 stus)## └─sch1_cls2 (4 stus)## sch2## └─sch2_cls1 (2 stus)## sch3## ├─sch3_cls1 (3 stus)## └─sch3_cls2 (2 stus)## NULL

As an exercise, try callingcluster_gen(n3) and see if the number ofresponses corresponds to your expectations.

n can also be passed as a range of values, randomly determined by thefunction. For example, if we set

n4 <- list(school = 4, class = ranges(5, 10), student = ranges(20, 50))

Then, once we callcluster_gen onn4 we are telling R that each ofthe 4 schools have between 5 and 10 classes, and each class has between20 and 50 students. Let us usedraw_cluster_structure to see what thegenerated structure looks like

set.seed(6789)draw_cluster_structure(n4)## school1## ├─school1_class1 (46 students)## ├─school1_class2 (31 students)## ├─school1_class3 (38 students)## ├─school1_class4 (37 students)## ├─school1_class5 (34 students)## ├─school1_class6 (34 students)## ├─school1_class7 (26 students)## ├─school1_class8 (48 students)## └─school1_class9 (45 students)## school2## ├─school2_class1 (40 students)## ├─school2_class2 (42 students)## ├─school2_class3 (30 students)## ├─school2_class4 (24 students)## ├─school2_class5 (22 students)## └─school2_class6 (48 students)## school3## ├─school3_class1 (32 students)## ├─school3_class2 (35 students)## ├─school3_class3 (41 students)## ├─school3_class4 (45 students)## ├─school3_class5 (35 students)## ├─school3_class6 (21 students)## ├─school3_class7 (29 students)## └─school3_class8 (26 students)## school4## ├─school4_class1 (42 students)## ├─school4_class2 (22 students)## ├─school4_class3 (48 students)## ├─school4_class4 (37 students)## ├─school4_class5 (27 students)## ├─school4_class6 (41 students)## ├─school4_class7 (42 students)## ├─school4_class8 (40 students)## ├─school4_class9 (37 students)## └─school4_class10 (47 students)## NULL

Customizing the population size

So far, we have only worked with the sampled elements, which are passedas the first argument ofcluster_gen. By default,cluster_genassumesN = n, meaning thatn actually corresponds to a census(where all the elements of the population are selected). In practice,though, this is rarely the case, andcuster_gen can receive othervalues to indicate the population structure under theN argument. Seethe examples below:

n5 <- c(3, 4)N5 <- 2

This is the most basic way to determine a different population size: bypassing a single number toN. In that case,N will be interpreted asa multiplier ofn. In other words, the syntax above basically saysthat the sample is composed of 3 schools and 4 students in each school,whereas the population is twice as large at all levels. This is allexplicit whencluster_gen is called (see the hierarchical structuresprinted below):

data5 <- cluster_gen(n = n5, N = N5)## ── Hierarchical structure ──────────────────────────────────────────────────────────────────────────## Population structure## school1 (8 students)## school2 (8 students)## school3 (8 students)## school4 (8 students)## school5 (8 students)## school6 (8 students)## Sampled structure## Generating questionnaires for schools## Total respondents: 12 (4 + 4 + 4)## school1 (4 students)## school2 (4 students)## school3 (4 students)## ── Information on sampling weights ─────────────────────────────────────────────────────────────────## - Calculating PPS weights at the school level##   final.student.weight should add up to the number of students in the population (48)

In the example above, the questionnaire answers are stored indata5,which is why they do not appear in the R terminal. The user messages arestill printed, as they are not stored indata5.

The population structure can also be explicitly defined:

n6 <- c(3, 4)N6 <- c(4, 5)data6 <- cluster_gen(n = n6, N = N6)## ── Hierarchical structure ──────────────────────────────────────────────────────────────────────────## Population structure## school1 (5 students)## school2 (5 students)## school3 (5 students)## school4 (5 students)## Sampled structure## Generating questionnaires for schools## Total respondents: 12 (4 + 4 + 4)## school1 (4 students)## school2 (4 students)## school3 (4 students)## ── Information on sampling weights ─────────────────────────────────────────────────────────────────## - Calculating PPS weights at the school level##   final.student.weight should add up to the number of students in the population (20)

Just liken,N can also be defined as lists:

n7 <- list(3, c(4, 2, 3))N7 <- list(10, c(10, 11, 12, 13, 14, 15, 16, 17, 18, 19))data7 <- cluster_gen(n = n7, N = N7)## ── Hierarchical structure ──────────────────────────────────────────────────────────────────────────## Population structure## school1 (10 students)## school2 (11 students)## school3 (12 students)## school4 (13 students)## school5 (14 students)## school6 (15 students)## school7 (16 students)## school8 (17 students)## school9 (18 students)## school10 (19 students)## Sampled structure## Generating questionnaires for schools## Total respondents: 9 (4 + 2 + 3)## school1 (4 students)## school2 (2 students)## school3 (3 students)## ── Information on sampling weights ─────────────────────────────────────────────────────────────────## - Calculating PPS weights at the school level##   final.student.weight should add up to the number of students in the population (145)

Mixing ranges forn andexplicit lists forN is also possible.

set.seed(345)n8 <- list(3, ranges(5, 10))N8 <- list(10, c(10, 11, 12, 13, 14, 15, 16, 17, 18, 19))data8 <- cluster_gen(n = n8, N = N8)## ── Hierarchical structure ──────────────────────────────────────────────────────────────────────────## Population structure## school1 (10 students)## school2 (11 students)## school3 (12 students)## school4 (13 students)## school5 (14 students)## school6 (15 students)## school7 (16 students)## school8 (17 students)## school9 (18 students)## school10 (19 students)## Sampled structure## Generating questionnaires for schools## Total respondents: 25 (9 + 7 + 9)## school1 (9 students)## school2 (7 students)## school3 (9 students)## ── Information on sampling weights ─────────────────────────────────────────────────────────────────## - Calculating PPS weights at the school level##   final.student.weight should add up to the number of students in the population (145)n9 <- list(3, ranges(5, 10))N9 <- list(10, ranges(50, 100))data9 <- cluster_gen(n = n9, N = N9)## ── Hierarchical structure ──────────────────────────────────────────────────────────────────────────## Population structure## school1 (79 students)## school2 (65 students)## school3 (59 students)## school4 (59 students)## school5 (64 students)## school6 (98 students)## school7 (82 students)## school8 (63 students)## school9 (97 students)## school10 (56 students)## Sampled structure## Generating questionnaires for schools## Total respondents: 23 (5 + 8 + 10)## school1 (5 students)## school2 (8 students)## school3 (10 students)## ── Information on sampling weights ─────────────────────────────────────────────────────────────────## - Calculating PPS weights at the school level##   final.student.weight should add up to the number of students in the population (722)

There are other structure combinations forn andN which give outputthat could be confusing for a user. One example is when the populationis smaller than the sample. This example is illustrated below.If youfind other misbehaving or otherwise noteworthy examples,pleasereport.

set.seed(345)n10 <- c(3, 4)N10 <- c(2, 3)cluster_gen(n = n10, N = N10)  # notice the missing weights## ── Hierarchical structure ──────────────────────────────────────────────────────────────────────────## Population structure## school1 (3 students)## school2 (3 students)## Sampled structure## Generating questionnaires for schools## Total respondents: 12 (4 + 4 + 4)## school1 (4 students)## school2 (4 students)## school3 (4 students)## ── Information on sampling weights ─────────────────────────────────────────────────────────────────## - Calculating PPS weights at the school level##   final.student.weight should add up to the number of students in the population (6)## $school## $school[[1]]##   subject         q1 q2 q3 q4 q5 school.weight within.school.weight final.student.weight## 1       1  1.7863951  2  2  1  2     0.6666667                 0.75                  0.5## 2       2  0.1956436  1  2  1  2     0.6666667                 0.75                  0.5## 3       3  0.7482214  2  2  1  2     0.6666667                 0.75                  0.5## 4       4 -0.2938856  1  2  1  1     0.6666667                 0.75                  0.5##           uniqueID## 1 student1_school1## 2 student2_school1## 3 student3_school1## 4 student4_school1## ## $school[[2]]##   subject         q1 q2 q3 q4 q5 school.weight within.school.weight final.student.weight## 1       1 -0.4178153  2  2  1  2     0.6666667                 0.75                  0.5## 2       2 -1.2977486  2  1  1  2     0.6666667                 0.75                  0.5## 3       3 -1.0928678  2  1  1  2     0.6666667                 0.75                  0.5## 4       4 -0.5008169  2  2  1  2     0.6666667                 0.75                  0.5##           uniqueID## 1 student1_school2## 2 student2_school2## 3 student3_school2## 4 student4_school2## ## $school[[3]]##   subject         q1 q2 q3 q4 q5 school.weight within.school.weight final.student.weight## 1       1 -1.8998894  2  2  2  1            NA                   NA                   NA## 2       2  0.2750867  1  2  2  1            NA                   NA                   NA## 3       3  1.8452385  1  1  1  2            NA                   NA                   NA## 4       4 -0.2048277  1  2  1  2            NA                   NA                   NA##           uniqueID## 1 student1_school3## 2 student2_school3## 3 student3_school3## 4 student4_school3

Other useful function arguments

Understanding the commands above is all that you need to start checkingthe sampling weights. However, you might be interested in knowing someother things thatcluster_gen can already do.

For instance, the user might be interested in keeping this clusterstructure, but only generating questionnaires at the student level. Thiscan be done by running

set.seed(2345)df <- cluster_gen(n2, separate_questionnaires = FALSE)## ── Hierarchical structure ──────────────────────────────────────────────────────────────────────────## Generating questionnaires for students## Total respondents: 30 (5 + 5 + 5 + 5 + 5 + 5)## school1## ├─school1_class1 (5 students)## ├─school1_class2 (5 students)## └─school1_class3 (5 students)## school2## ├─school2_class1 (5 students)## ├─school2_class2 (5 students)## └─school2_class3 (5 students)## ── Information on sampling weights ─────────────────────────────────────────────────────────────────## - Calculating SRS weights at the class level##   class.weight should add up to the number of classes in the population (6, counting once per class)

Printdf and notice how the data generated is different from theprevious one even though both calls share the same seed. This is becausethe former call generates student questionnaires only after the teacherquestionnaires, so the seed is effectively different when it comes togenerating student questionnaires.

Back to the case of separate questionnaires, the user may want tocollapse the questionnaires per level, so that all the questionnaires onthe same level are put together; alternatively, all the questionnairescan be collapsed into one data frame, with answers from higher levelsbeing repeated at the lowest level. Perhaps this can be betterunderstood in the example below. The relevant argument here iscollapse;n_X = 0,n_W = 1 andcalc_weights = FALSE were set tomake the output shorter, thus making it easier to understand the effectof differentcollapse options.

set.seed(1); cluster_gen(n2, n_X = 0, n_W = 1, calc_weights = FALSE, collapse = "none")  # default behavior## ── Hierarchical structure ──────────────────────────────────────────────────────────────────────────## Generating questionnaires for schools, classes## Total respondents: 36 (3 + 3 + 5 + 5 + 5 + 5 + 5 + 5)## school1## ├─school1_class1 (5 students)## ├─school1_class2 (5 students)## └─school1_class3 (5 students)## school2## ├─school2_class1 (5 students)## ├─school2_class2 (5 students)## └─school2_class3 (5 students)## $school## $school[[1]]##   subject q1       uniqueID## 1       1  2 class1_school1## 2       2  2 class2_school1## 3       3  2 class3_school1## ## $school[[2]]##   subject q1       uniqueID## 1       1  1 class1_school2## 2       2  2 class2_school2## 3       3  3 class3_school2## ## ## $class## $class[[1]]##   subject q1                uniqueID## 1       1  2 student1_class1_school1## 2       2  2 student2_class1_school1## 3       3  2 student3_class1_school1## 4       4  2 student4_class1_school1## 5       5  1 student5_class1_school1## ## $class[[2]]##   subject q1                uniqueID## 1       1  2 student1_class2_school1## 2       2  2 student2_class2_school1## 3       3  2 student3_class2_school1## 4       4  2 student4_class2_school1## 5       5  1 student5_class2_school1## ## $class[[3]]##   subject q1                uniqueID## 1       1  2 student1_class3_school1## 2       2  1 student2_class3_school1## 3       3  1 student3_class3_school1## 4       4  1 student4_class3_school1## 5       5  1 student5_class3_school1## ## $class[[4]]##   subject q1                uniqueID## 1       1  2 student1_class1_school2## 2       2  3 student2_class1_school2## 3       3  3 student3_class1_school2## 4       4  1 student4_class1_school2## 5       5  3 student5_class1_school2## ## $class[[5]]##   subject q1                uniqueID## 1       1  1 student1_class2_school2## 2       2  1 student2_class2_school2## 3       3  1 student3_class2_school2## 4       4  1 student4_class2_school2## 5       5  4 student5_class2_school2## ## $class[[6]]##   subject q1                uniqueID## 1       1  2 student1_class3_school2## 2       2  2 student2_class3_school2## 3       3  1 student3_class3_school2## 4       4  2 student4_class3_school2## 5       5  1 student5_class3_school2set.seed(1); cluster_gen(n2, n_X = 0, n_W = 1, calc_weights = FALSE, collapse = "partial")## ── Hierarchical structure ──────────────────────────────────────────────────────────────────────────## Generating questionnaires for schools, classes## Total respondents: 36 (3 + 3 + 5 + 5 + 5 + 5 + 5 + 5)## school1## ├─school1_class1 (5 students)## ├─school1_class2 (5 students)## └─school1_class3 (5 students)## school2## ├─school2_class1 (5 students)## ├─school2_class2 (5 students)## └─school2_class3 (5 students)## $school##   subject q1       uniqueID## 1       1  2 class1_school1## 2       2  2 class2_school1## 3       3  2 class3_school1## 4       4  1 class1_school2## 5       5  2 class2_school2## 6       6  3 class3_school2## ## $class##    subject q1                uniqueID## 1        1  2 student1_class1_school1## 2        2  2 student2_class1_school1## 3        3  2 student3_class1_school1## 4        4  2 student4_class1_school1## 5        5  1 student5_class1_school1## 6        6  2 student1_class2_school1## 7        7  2 student2_class2_school1## 8        8  2 student3_class2_school1## 9        9  2 student4_class2_school1## 10      10  1 student5_class2_school1## 11      11  2 student1_class3_school1## 12      12  1 student2_class3_school1## 13      13  1 student3_class3_school1## 14      14  1 student4_class3_school1## 15      15  1 student5_class3_school1## 16      16  2 student1_class1_school2## 17      17  3 student2_class1_school2## 18      18  3 student3_class1_school2## 19      19  1 student4_class1_school2## 20      20  3 student5_class1_school2## 21      21  1 student1_class2_school2## 22      22  1 student2_class2_school2## 23      23  1 student3_class2_school2## 24      24  1 student4_class2_school2## 25      25  4 student5_class2_school2## 26      26  2 student1_class3_school2## 27      27  2 student2_class3_school2## 28      28  1 student3_class3_school2## 29      29  2 student4_class3_school2## 30      30  1 student5_class3_school2set.seed(1); cluster_gen(n2, n_X = 0, n_W = 1, calc_weights = FALSE, collapse = "full")## ── Hierarchical structure ──────────────────────────────────────────────────────────────────────────## Generating questionnaires for schools, classes## Total respondents: 36 (3 + 3 + 5 + 5 + 5 + 5 + 5 + 5)## school1## ├─school1_class1 (5 students)## ├─school1_class2 (5 students)## └─school1_class3 (5 students)## school2## ├─school2_class1 (5 students)## ├─school2_class2 (5 students)## └─school2_class3 (5 students)##    subject q1.student        uniqueID.student q1.teacher## 1        1          2 student1_class1_school1          2## 2        2          2 student2_class1_school1          2## 3        3          2 student3_class1_school1          2## 4        4          2 student4_class1_school1          2## 5        5          1 student5_class1_school1          2## 6        6          2 student1_class1_school2          1## 7        7          3 student2_class1_school2          1## 8        8          3 student3_class1_school2          1## 9        9          1 student4_class1_school2          1## 10      10          3 student5_class1_school2          1## 11      11          2 student1_class2_school1          2## 12      12          2 student2_class2_school1          2## 13      13          2 student3_class2_school1          2## 14      14          2 student4_class2_school1          2## 15      15          1 student5_class2_school1          2## 16      16          1 student1_class2_school2          2## 17      17          1 student2_class2_school2          2## 18      18          1 student3_class2_school2          2## 19      19          1 student4_class2_school2          2## 20      20          4 student5_class2_school2          2## 21      21          2 student1_class3_school1          2## 22      22          1 student2_class3_school1          2## 23      23          1 student3_class3_school1          2## 24      24          1 student4_class3_school1          2## 25      25          1 student5_class3_school1          2## 26      26          2 student1_class3_school2          3## 27      27          2 student2_class3_school2          3## 28      28          1 student3_class3_school2          3## 29      29          2 student4_class3_school2          3## 30      30          1 student5_class3_school2          3

Checking sampling weights

This is the final section of this document, and at this point you areassumed to be familiar with howcluster_gen works, but beforeproceeding there is one last argument you should be familiar with,calledsampling_method.

Changing the sampling method

Consider the example below. Then* numbering is reset for convenience,andprint_pop_structure was set toFALSE to suppress the otherwiselengthy output of the population structure (the tree would contain5 × 9 × 6 = 270 lines). You can checkdraw_cluster_structure(N1) foryourself if you’re interested:

n1 <- c(2, 3, 2, 10)N1 <- c(5, 9, 6, 50)data1 <- cluster_gen(n = n1, N = N1, print_pop_structure = FALSE)## ── Hierarchical structure ──────────────────────────────────────────────────────────────────────────## Generating questionnaires for states, schools, classes## Total respondents: 138 (3 + 3 + 2 + 2 + 2 + 2 + 2 + 2 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10)## state1## ├─state1_school1## │ ├─state1_school1_class1 (10 students)## │ └─state1_school1_class2 (10 students)## ├─state1_school2## │ ├─state1_school2_class1 (10 students)## │ └─state1_school2_class2 (10 students)## └─state1_school3##   ├─state1_school3_class1 (10 students)##   └─state1_school3_class2 (10 students)## state2## ├─state2_school1## │ ├─state2_school1_class1 (10 students)## │ └─state2_school1_class2 (10 students)## ├─state2_school2## │ ├─state2_school2_class1 (10 students)## │ └─state2_school2_class2 (10 students)## └─state2_school3##   ├─state2_school3_class1 (10 students)##   └─state2_school3_class2 (10 students)## ── Information on sampling weights ─────────────────────────────────────────────────────────────────## - Calculating SRS weights at the state level##   state.weight should add up to the number of states in the population (5, counting once per state)## - Calculating PPS weights at the school level##   final.teacher.weight should add up to the number of teachers in the population (270)## - Calculating SRS weights at the class level##   class.weight should add up to the number of classes in the population (270, counting once per class)

SinceN != n, the output ofcluter_gen included some information onsampling weights. This is crucial for understanding how the weights werecalculated and checking if they were indeed correctly calculated.Thedefault behavior ofcluster_gen is to use PPS (ProbabilitiesProportional to Size) whenever it detects “school” as a label and SRS(Simple Random Sampling) otherwise. This can, however, be changed. Seethe following examples (also notice howprint_pop_structure wasabbreviated; this is OK as long as it’s still clear to R what you arereferring to):

data1 <- cluster_gen(n = n1, N = N1, print_pop = FALSE, sampling_method = "mixed")  # default## ── Hierarchical structure ──────────────────────────────────────────────────────────────────────────## Generating questionnaires for states, schools, classes## Total respondents: 138 (3 + 3 + 2 + 2 + 2 + 2 + 2 + 2 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10)## state1## ├─state1_school1## │ ├─state1_school1_class1 (10 students)## │ └─state1_school1_class2 (10 students)## ├─state1_school2## │ ├─state1_school2_class1 (10 students)## │ └─state1_school2_class2 (10 students)## └─state1_school3##   ├─state1_school3_class1 (10 students)##   └─state1_school3_class2 (10 students)## state2## ├─state2_school1## │ ├─state2_school1_class1 (10 students)## │ └─state2_school1_class2 (10 students)## ├─state2_school2## │ ├─state2_school2_class1 (10 students)## │ └─state2_school2_class2 (10 students)## └─state2_school3##   ├─state2_school3_class1 (10 students)##   └─state2_school3_class2 (10 students)## ── Information on sampling weights ─────────────────────────────────────────────────────────────────## - Calculating SRS weights at the state level##   state.weight should add up to the number of states in the population (5, counting once per state)## - Calculating PPS weights at the school level##   final.teacher.weight should add up to the number of teachers in the population (270)## - Calculating SRS weights at the class level##   class.weight should add up to the number of classes in the population (270, counting once per class)data1 <- cluster_gen(n = n1, N = N1, print_pop = FALSE, sampling_method = "SRS")  # always SRS## ── Hierarchical structure ──────────────────────────────────────────────────────────────────────────## Generating questionnaires for states, schools, classes## Total respondents: 138 (3 + 3 + 2 + 2 + 2 + 2 + 2 + 2 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10)## state1## ├─state1_school1## │ ├─state1_school1_class1 (10 students)## │ └─state1_school1_class2 (10 students)## ├─state1_school2## │ ├─state1_school2_class1 (10 students)## │ └─state1_school2_class2 (10 students)## └─state1_school3##   ├─state1_school3_class1 (10 students)##   └─state1_school3_class2 (10 students)## state2## ├─state2_school1## │ ├─state2_school1_class1 (10 students)## │ └─state2_school1_class2 (10 students)## ├─state2_school2## │ ├─state2_school2_class1 (10 students)## │ └─state2_school2_class2 (10 students)## └─state2_school3##   ├─state2_school3_class1 (10 students)##   └─state2_school3_class2 (10 students)## ── Information on sampling weights ─────────────────────────────────────────────────────────────────## - Calculating SRS weights at the state level##   state.weight should add up to the number of states in the population (5, counting once per state)## - Calculating SRS weights at the school level##   school.weight should add up to the number of schools in the population (45, counting once per school)## - Calculating SRS weights at the class level##   class.weight should add up to the number of classes in the population (270, counting once per class)data1 <- cluster_gen(n = n1, N = N1, print_pop = FALSE, sampling_method = "PPS")  # always PPS## ── Hierarchical structure ──────────────────────────────────────────────────────────────────────────## Generating questionnaires for states, schools, classes## Total respondents: 138 (3 + 3 + 2 + 2 + 2 + 2 + 2 + 2 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10)## state1## ├─state1_school1## │ ├─state1_school1_class1 (10 students)## │ └─state1_school1_class2 (10 students)## ├─state1_school2## │ ├─state1_school2_class1 (10 students)## │ └─state1_school2_class2 (10 students)## └─state1_school3##   ├─state1_school3_class1 (10 students)##   └─state1_school3_class2 (10 students)## state2## ├─state2_school1## │ ├─state2_school1_class1 (10 students)## │ └─state2_school1_class2 (10 students)## ├─state2_school2## │ ├─state2_school2_class1 (10 students)## │ └─state2_school2_class2 (10 students)## └─state2_school3##   ├─state2_school3_class1 (10 students)##   └─state2_school3_class2 (10 students)## ── Information on sampling weights ─────────────────────────────────────────────────────────────────## - Calculating PPS weights at the state level##   final.principal.weight should add up to the number of principals in the population (45)## - Calculating PPS weights at the school level##   final.teacher.weight should add up to the number of teachers in the population (270)## - Calculating PPS weights at the class level##   final.student.weight should add up to the number of students in the population (13500)data1 <- cluster_gen(n = n1, N = N1, print_pop = FALSE, sampling_method = c("PPS", "PPS", "SRS"))  # customized## ── Hierarchical structure ──────────────────────────────────────────────────────────────────────────## Generating questionnaires for states, schools, classes## Total respondents: 138 (3 + 3 + 2 + 2 + 2 + 2 + 2 + 2 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10 + 10)## state1## ├─state1_school1## │ ├─state1_school1_class1 (10 students)## │ └─state1_school1_class2 (10 students)## ├─state1_school2## │ ├─state1_school2_class1 (10 students)## │ └─state1_school2_class2 (10 students)## └─state1_school3##   ├─state1_school3_class1 (10 students)##   └─state1_school3_class2 (10 students)## state2## ├─state2_school1## │ ├─state2_school1_class1 (10 students)## │ └─state2_school1_class2 (10 students)## ├─state2_school2## │ ├─state2_school2_class1 (10 students)## │ └─state2_school2_class2 (10 students)## └─state2_school3##   ├─state2_school3_class1 (10 students)##   └─state2_school3_class2 (10 students)## ── Information on sampling weights ─────────────────────────────────────────────────────────────────## - Calculating PPS weights at the state level##   final.principal.weight should add up to the number of principals in the population (45)## - Calculating PPS weights at the school level##   final.teacher.weight should add up to the number of teachers in the population (270)## - Calculating SRS weights at the class level##   class.weight should add up to the number of classes in the population (270, counting once per class)

Calculating sampling weights

As a tester, your main task is to check the calculation of allsampling weights. The weights were calculated based on thePISA DataAnalysisManual,which contains one chapter explaining how such weights are calculated,but of course there are other valid references on the subject you maycheck.

When validating the output ofcluster_gen, please check:

If the “information on sampling weights” is correct (especiallythe totals in parenthesis)
If thelabels of the*.weight columns are correct.
If thevalues of the*.weight columns are correct.

If an error is found in the weight columns, its origin is likely eitherin the*.weight or thewithin.*.weight column. Thefinal.*.weightis calculated as a product of the former, so errors found here arenothing but a propagation of the others.

Thank you for also reporting any other errors found. Please read the“How to give feedback” section for moreinformation about how to report errors.

Testing given examples

Here are some examples of pairs ofn andN which can be used to getyou started. Use each (n*,N*) pair as input tocluster_gen.

n1 <- 1:4N1 <- 5n2 <- c(5, 1, 3)N2 <- list(6, c(2, 4, 2, 1, 6, 7), rep(10, sum(c(2, 4, 2, 1, 6, 7))))n3 <- list(3, c(4, 2, 4), c(8, 2, 1, 3, 4, 6, 9, 10, 2, 10))N3 <- c(3, 4, 10)

Coming up with new examples

The true power of having multiple collaborators is that multiple brainscan come with more examples than only one. Use your imagination, try tobreak the package and find as many examples as you can which don’t workas they should.

Movatterモバイル変換

Testing sampling weights on lsasim

Introduction

About this document

Quick history of lsasim

How to contribute to testing

How to give feedback

Future features to be tested

Testing sampling weights

Installing lsasim (development version)

Generating clustered test data

Two-level structures

Three-level structures

Asymmetric structures

Customizing the population size

Other useful function arguments

Checking sampling weights

Changing the sampling method

Calculating sampling weights

Testing given examples

Coming up with new examples

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally