Importing data from SPSS files

Importing from a SPSS "system" file

In order to import data from an SPSS "system" file, the usual binaryformat in which SPSS data now is usually saved and often distributed,one needs to first make the file that contains the data known to R, asin the following example:

library(memisc)ZA5702<-spss.system.file("Data/ZA5702_v2-0-0.sav")ZA5702

SPSS system file 'Data/ZA5702_v2-0-0.sav'     with 979 variables and 3911 observations

Once the "system file" is declared using the functionspss.system.file(), metadata becomes available, such as thenumber of cases and variables (as just seen), the names and labels ofthe variables (as seen below):

description(ZA5702)

study    'Studiennummer'                       version  'GESIS Archiv Version'                year     'Erhebungsjahr'                       field    'Erhebungszeitraum'                   glescomp 'GLES-Komponente'                     survey   'Erhebung/Welle'                      lfdn     'Laufende Nummer (Kumulation)'        vlfdn    'Laufende Nummer (Vorwahl)'           nlfdn    'Laufende Nummer (Nachwahl)'          datum    'Datum der Befragung (Monat/Tag/Jahr)'

(Here only an extract of the full output was shown, since the dataset contains as many as 979 variables.)

An "importer" object, such asZA5702 in this example,would also allow to obtain a full codebook with

codebook(ZA5702)

but we refrain from showing such a codebook for the obvious reason ofnot creating too much output. As the inspection of the data in the fileshows, most variable names have a standardised, yet non-mnemonicstructure. Variables referring to questions asked in the pre-electionwave of the GLES 2013 study have names starting with "v",those referring to questions asked in the post-election wave have namesstarting with "v", while those referring to question askedin both waves have names starting "nv". For a specificanalysis, such variable names are not very useful. For this reason wewant to rename them. We could do this after loading the data, but it ismore convenient to do the data import and the renaming in one step as inthe example below:

gles2013work<-subset(ZA5702,select=c(wave                  = survey,intent.turnout        = v10,turnout               = n10,voteint.candidate     = v11aa,voteint.list          = v11ba,postal.vote.candidate = v12aa,postal.vote.list      = v12ba,vote.candidate        = n11aa,vote.list             = n11ba,bula                  = bl                       ))

The variable names to the left of the equality sign are the variablenames as they will appear in the data set after import, while thevariable names to the right of the equality aign are the variable namesas they exist in the data file.

As a demonstration of what information can be extracted from the datafile, we create a codebook for one of the items in the data set:

codebook(gles2013work$turnout)

================================================================================   gles2013work$turnout 'Wahlbeteiligung'--------------------------------------------------------------------------------   Storage mode: double   Measurement: interval   Missing values: -Inf - -1   Values and labels                      N Valid Total                                                          -99 M 'keine Angabe'                   3         0.1   -97 M 'trifft nicht zu'               20         0.5   -94 M 'nicht in Auswahlgesamtheit'  2003        51.2     1   'ja, habe gewaehlt'           1596  84.7  40.8     2   'nein, habe nicht gewaehlt'    289  15.3   7.4                                                               Min: 1.000                                             Max: 2.000                                            Mean: 1.153                                        Std.Dev.: 0.360

Import from a SPSS "portable" file

Data from SPSS "portable" files are imported in essentially the sameway as data from SPSS "system" files: The first step again is to makethe data set known toR:

ZA3861<-spss.portable.file("Data/ZA3861.por",iconv=FALSE)ZA3861

SPSS portable file 'Data/ZA3861.por'     with 331 variables and 3263 observations

Since this file contains German umlauts (in contrast to the previousexample), we need to convert the character coding of the value labelsetc. from "Latin-1" (the original coding of the data) into the nativeencoding of the system (unless the computer is using natively "Latin-1"encoding and not - as must Mac and most Linux System - a variant ofUTF8).

ZA3861<-Iconv(ZA3861,from="latin1")

Importer objects created from "portable" files can be examined in thesame way as importer objects created from "system" files. For example,we get a description of the variables in the data set (the variablelabels) and a codebook.

description(ZA3861)

vvpnid   'Fallnummer'                                 vsplitwo 'West-Ost-Kennung'                           vvornach 'Vor-/Nachwahl'                              vland    'Bundesland'                                 v10      'Wirtschaftl. Lage allgemein'                v20      'Wirtschaftl. Lage retrospektiv'             v30      'Wirtschaftl. Lage prospektiv'               v31      'Wichtigkeit Erst/Zweitstimme BTW (nicht 94)'v40      'Demokratiezufriedenheit'                    v50      'Staerke Politikinteresse'

To actually import the data and make them accessible for analysis wecan (as above), useas.data.set(), orsubset()as in this example:

work2002<-subset(ZA3861,select=c(respid            = VVPNID,split.wo          = VSPLITWO,split.vor.nach    = VVORNACH,Bundesland        = VLAND,Erststimme        = V69,Zweitstimme       = V70,Geschlecht        = VSEX,GebMonat          = VMONAT,GebJahr           = VJAHR,Konfession        = VRELIG,Kirchgang         = VKIRCHG,Erwerbst          = VBERUFTG,FrErwerbst        = VFRBERTG,Beruf             = VBERUF,Famstand          = VFAMSTDN,Partner           = VPARTNER,BildungP          = VPBILDGA,BerufstP          = VPBERUFT,FrBerufstP        = VPFBERTG,BerufP            = VPBERUF,ReprGewicht       = VGVWNW        )    )

Import from a fixed-width file accompanied by SPSS syntax

Data from more recent study components of the American NationalElecion Study comes in fixed-width format, with some additional SPSSsyntax files that define columns, variable labels, value labels, andmissing values.memisc also provides an importer functionsuch data. Naturally this requires a little bit more information. Inaddition to the actual data file, we also need a file with SPSS syntaxspecifying the data columns. Optionally, Syntax files that definevariable labels, value lables, and missing values can also bespecified.

anes2008TS<-spss.fixed.file("Data/anes2008/anes2008TS_dat.txt",columns.file="Data/anes2008/anes2008TS_col.sps",varlab.file="Data/anes2008/anes2008TS_lab.sps",codes.file="Data/anes2008/anes2008TS_cod.sps",missval.file="Data/anes2008/anes2008TS_md.sps")anes2008TS

SPSS fixed column file 'issues/anes2008/anes2008TS_dat.txt'     with 1954 variables and 2322 observations    with variable labels from file 'issues/anes2008/anes2008TS_lab.sps'     with value labels from file 'issues/anes2008/anes2008TS_cod.sps'     with missing value definitions from file 'issues/anes2008/anes2008TS_md.sps'

Further information about the data can now be obtained from thereturned importer object in the same way as from importer objects thatdescribe SPSS "system" or SPSS "portable" files. That is, we can usenames(),description(), andcodebook(). To get the data in to the memory ofRwe can use (as above) the functionsas.data.set() andsubset().

Importing data from a Stata file

Data from Stata files (up to Stata Version 12) can be imported in thesame way as data from SPSS files. The main difference is the functionused for it, and the fact that user-defined missing values do not existsin Stata. For this, see the following example:

library(memisc)ZA5702.dta<-Stata.file("Data/ZA5702_v2-0-0.dta")ZA5702.dta

Stata file 'Data/ZA5702_v2-0-0.dta'     with 874 variables and 3911 observations

gles2013work.dta<-subset(ZA5702.dta,select=c(wave                  = survey,intent.turnout        = v10,turnout               = n10,voteint.candidate     = v11aa,voteint.list          = v11ba,postal.vote.candidate = v12aa,postal.vote.list      = v12ba,vote.candidate        = n11aa,vote.list             = n11ba,bula                  = bl                       ))codebook(gles2013work.dta$turnout)

================================================================================   gles2013work.dta$turnout 'Wahlbeteiligung'--------------------------------------------------------------------------------   Storage mode: integer   Measurement: nominal   Missing values: 100 - 127   Values and labels                               N Percent                                                               -99   'keine Angabe'                            3     0.1   -98   'weiss nicht'                             0     0.0   -97   'trifft nicht zu'                        20     0.5   -96   'Split'                                   0     0.0   -95   'nicht teilgenommen'                      0     0.0   -94   'nicht in Auswahlgesamtheit'           2003    51.2   -93   'Interview abgebrochen'                   0     0.0   -92   'Fehler in Daten'                         0     0.0   -86   'nicht wahlberechtigt'                    0     0.0   -85   'nicht waehlen'                           0     0.0   -84   'keine Erst-/Zweitstimme abgegeben'       0     0.0   -83   'ungueltig waehlen'                       0     0.0   -82   'keine andere Partei waehlen'             0     0.0   -81   'noch nicht entschieden'                  0     0.0   -72   'nicht einzuschaetzen'                    0     0.0   -71   'nicht bekannt'                           0     0.0     1   'ja, habe gewaehlt'                    1596    40.8     2   'nein, habe nicht gewaehlt'             289     7.4

Movatterモバイル変換

Importing data from SPSS and Stata

Motivation

The role of "importer" objects

Importing data from SPSS files

Importing from a SPSS "system" file

Import from a SPSS "portable" file

Import from a fixed-width file accompanied by SPSS syntax

Importing data from a Stata file