3 Importing and exporting data
One of the first steps in data analysis is usually importing datasets. These can
come in several different formats, depending on their origin. The most
common text-based format is a comma-separated values (CSV) file, with
tab-delimited text files being another option. It is also very common to keep
data in an Excel spreadsheet, or in an SPSS .sav
file. Fortunately, R has
several packages and functions that allow us to easily import data from comma-separated value
(CSV) or other text files, SPSS files and Excel files.
There are a number of functions in base R (e.g., read.csv()
) but we will use the corresponding alternative functions provided by packages belonging to the tidyverse
-universe (though not all of these are explicitly part of the tidyverse
package, some have to be loaded separately).
These functions all start with read_*
. There are also the corresponding write_*
functions that allow us to export datasets to CSV, SPSS or Excel files.
In this chapter, we will import the following datasets, all of which can be downloaded from ILIAS-page of the course:
File name | Type | Import function | Package |
---|---|---|---|
zufriedenheit.csv | Text file with commas as columns delimiters | read_csv() |
readr |
zufriedenheit-semicolon.csv | Text file with a semi-colon as the delimiter | read_csv2() , read_delim() |
readr |
zufriedenheit.sav | SPSS file | read_sav() |
haven |
zufriedenheit.xls | Excel file | read_excel() , read_xls() or read_xlsx() |
readxl |
These are four different files containing the same dataset. The data consist of a satisfaction rating, recorded at two time points for four subjects.
In your RStudio project directory, create a new directory named ‘data’ if it doesn’t already exist, and put the files you downloaded in there.
In RStudio, there are two ways of importing files:
Using functions:
read_csv()
,read_csv2()
(for ‘;’ delimiters),read_sav()
andread_excel()
.Via the GUI: You can access this functionality either through ‘File > Import Dataset’ or in the Environment pane.
The second option is easier to use, and has the advantage that R will generate all code for importing datasets for us, which can be copied for subsequent use.
The advantage of using the first approach is that you can keep a record of every step in your data analysis pipeline, which is important for reproducibility.
We will now import these four datasets using both options.
3.1 Comma-separated value (CSV) files
The functions we need for importing CSV files are available in the readr
package. This needs to be loaded:
library(readr)
However, readr
is loaded automatically when we load the tidyverse
package:
library(tidyverse)
We will first import the data files using the GUI. In the Environment pane , click on
Import Dataset > From Text (readr) or use the menu ‘File > Import Dataset > From Text (readr)’. You will see a dialogue containing a Code Preview.
On the left (bottom) you will find all options for importing the data. These
options are all arguments of the read_csv()
function (or the more general
read_delim()
function):
In the File Browser you can select the zufriedenheit.csv
file. After doing so you will see Data Preview.
Alternatively, you can also navigate to the ‘data’ directory in the Files pane, and then click on the dataset you want to import. You will then be asked if you want to view the file, or import it. Selecting ‘import’ will take you straight to the data preview.
According to the preview, the variables Vpn
and messzeitpunkt
will be
imported as character vectors
. This means that if we want to treat them as
categorical variables, which they are, they will need to be converted to
factors.
Under Import Options a file name is generated automatically. This is simply
the file name, minus the .csv
suffix.
Try to discover what happens when you play around with the options, e.g. “First Row as Names”.
Now click on Import. A new data frame (tibble) with the name zufriedenheit
will appear in the Environment pane, and the generated R code is printed to
the console.
You can copy these lines of code, and paste them into an R script. Next time you want to import this dataset, you can simply run the code, without having to use the GUI.
library(readr)
<- read_csv("data/zufriedenheit.csv") zufriedenheit
## Rows: 8 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Vpn, messzeitpunkt
## dbl (1): rating
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Now is a good time to convert the grouping variables Vpn
and messzeitpunkt
to factors:
$Vpn <- as.factor(zufriedenheit$Vpn)
zufriedenheit$messzeitpunkt <- as.factor(zufriedenheit$messzeitpunkt) zufriedenheit
If you print the tibble zufriedenheit
, R tells you that Vpn
and
messzeitpunkt
are factors (abbreviated to rating
is a numeric variable (Vpn
has 4 levels and messzeitpunkt
has 2 levels.
zufriedenheit
## # A tibble: 8 × 3
## Vpn messzeitpunkt rating
## <fct> <fct> <dbl>
## 1 VP_1 t1 64.3
## 2 VP_2 t1 65.1
## 3 VP_3 t1 54.2
## 4 VP_4 t1 54.5
## 5 VP_1 t2 67.2
## 6 VP_2 t2 75.6
## 7 VP_3 t2 84.6
## 8 VP_4 t2 73.9
Now let’s do the same with the semi-colon delimited file
zufriedenheit-semicolon.csv
. This is something that you might frequently work
with if your computer uses German language settings (since the comma is reserved for the decimal separator in German mathematical notation).
This file uses semicolons instead of commas as a column delimiter. Therefore, you need to change the delimiter setting. Now you should see the following code in the Code Preview:
library(readr)
<- read_delim("data/zufriedenheit-semicolon.csv",
zufriedenheit_semicolon ";", escape_double = FALSE, trim_ws = TRUE)
Another possibility to do this is using the function read_csv2()
:
<- read_csv2("data/zufriedenheit-semicolon.csv") zufriedenheit
## ℹ Using "','" as decimal and "'.'" as grouping mark. Use `read_delim()` for more control.
## Rows: 8 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ";"
## chr (2): Vpn, messzeitpunkt
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
3.1.1 Exporting a CSV file
If you want to save a data frame as a CSV
file, you can use the function
write_csv()
(or write_delim()
if you want to use semicolons). You have to
specify the path where you want to save the CSV file. Fortunately, since we work in an Rstudio project, we can always use a relative path:
write_csv(x = zufriedenheit, path = "data/zufriedenheit.csv")
3.2 SPSS datasets
We can import the same dataset, but this time from the SPSS file
(zufriedenheit.sav
). The function we use for this is called read_sav()
, which
is provided by the haven
package (also includes functions for importing Stata and
SAS files). We have installed this package already with the tidyverse
but it needs to be loaded separately:
library(haven)
As before, we can use the GUI to import files. In the Environment pane, click
on Import Dataset > From SPSS, and choose the file. To distinguish this from
the previously import CSV file, let’s change the name to zufriedenheit_spss
.
In the Code Preview, the important line is:
<- read_sav("data/zufriedenheit.sav") zufriedenheit_spss
Note that the read_sav()
function does have more arguments. These allow you
skip the first n lines, define values to be converted to NA, select columns and
to repair non-unique column names.
In contrast to importing CSV files, however, the GUI offers no further options when importing from SPSS except for the name of the data frame.
Now you can click on Import. In the Environment pane, the newly imported dataset will appear. Variables imported from SPSS can have additional
attributes, the most important of which is the labels
attribute. Both the Vpn
and the messzeitpunkt
columns have this:
$Vpn zufriedenheit_spss
## <labelled<double>[8]>
## [1] 1 2 3 4 1 2 3 4
##
## Labels:
## value label
## 1 VP_1
## 2 VP_2
## 3 VP_3
## 4 VP_4
$messzeitpunkt zufriedenheit_spss
## <labelled<double>[8]>
## [1] 1 1 1 1 2 2 2 2
##
## Labels:
## value label
## 1 t1
## 2 t2
Both of these variables are declared as labelled double
. This a special class
provided by the haven
package to associate arbitrary labels with numeric or
character vectors. These labels contain the SPSS value labels. Using this, we
can look up what type of coding scheme was used for categorical variables. If for example
the values 0
and 1
were used for a sex
grouping variable, we can determine which sex
has was coded as 0
and which was coded as 1
.
If we want to convert these variables to factors, we can use the function
as_factor()
from the haven
package. This allows us to use either the SPSS
values or value labels as levels of the factor in R.
This is achieved through the argument levels
, which can take the values "default"
, "labels"
, "values"
or "both"
.
"default"
seems to be the most sensible option - this means that if labels are
available they are used as factor levels, otherwise the values themselves are used. The other
options are "both"
(values and value labels are combined), "label"
(labels
only) and "values"
(values only).
Arguments of the haven::as_factor()
function
levels
How to create the levels of the generated factor:
"default": uses labels where available, otherwise the values.
Labels are sorted by value (rather than alphabetically).
"both": like "default", but pastes together the level and value
"label": use only the labels; unlabelled values become NA
"values": use only the values
ordered
If TRUE create an ordered (ordinal) factor, if FALSE (the default) create
a regular (nominal) factor.
The argument ordered
allows us to create an ordered factor, if the ordering of the factor levels is important.
$Vpn <- as_factor(zufriedenheit_spss$Vpn,
zufriedenheit_spsslevels = "default")
$messzeitpunkt <- as_factor(zufriedenheit_spss$messzeitpunkt,
zufriedenheit_spsslevels = "default")
Once we have converted the categorical variables to factors, they are no longer labelled.The labels are only intended to be an intermediate data structure, so that we can convert into regular R factors.
3.2.1 Importing a large dataset
Let’s import a bigger dataset (in the sense that it has many variables): beispieldaten.sav
.
The glimpse()
function (from pack) is useful to get an overview and to see which variables are labelled and which are not:
<- read_sav("data/beispieldaten.sav")
beispieldaten glimpse(beispieldaten)
## Rows: 286
## Columns: 98
## $ ID <dbl> 1, 2, 10, 11, 12, 14, 15, 16, 17, 18, 19, 20, 21…
## $ westost <dbl+lbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ geschlecht <dbl+lbl> 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1,…
## $ alter <dbl> 13, 14, 14, 14, 14, 14, 14, 15, 15, 14, 15, 15, …
## $ swk1 <dbl> 2, 4, 5, 3, 5, 3, 4, 5, 6, 5, 6, 5, 5, 5, 6, 6, …
## $ swk2 <dbl> 5, 6, 4, 6, 6, 5, 3, 4, 6, 3, 6, 3, 1, 5, 7, 7, …
## $ swk3 <dbl> 6, 4, 5, 6, 6, 5, 5, 7, 5, 4, 6, 7, 6, 3, 7, 7, …
## $ swk4 <dbl> 4, 4, 6, 5, 5, 4, 4, 7, 5, 5, 6, 4, 5, 5, 6, 4, …
## $ swk5 <dbl> 7, 6, 5, 7, 7, 5, 6, 7, 7, 5, 7, 3, 3, 6, 7, 7, …
## $ swk6 <dbl> 5, 5, 7, 7, 6, 7, 7, 7, 6, 5, 7, 4, 5, 6, 6, 6, …
## $ swk7 <dbl> 5, 5, 7, 6, 5, 5, 1, NA, 5, 5, 5, 3, 7, 3, 7, 4,…
## $ swk8 <dbl> 6, 6, 6, 6, 5, 6, 6, 7, 6, 4, 6, 4, 1, 5, 6, 6, …
## $ swk9 <dbl> 6, 5, 4, 7, 7, 6, 5, 7, 6, 5, 5, 5, 6, 7, 7, 5, …
## $ swk10 <dbl> 4, 4, 5, 7, 6, 5, 7, 6, 6, 5, 7, 6, 4, 5, 7, 6, …
## $ swk11 <dbl> 4, 4, 3, 3, 4, 3, 3, 4, 6, 5, 4, 4, 4, 3, 7, 7, …
## $ swk12 <dbl> 5, 4, 3, 4, 4, 5, 6, 3, 6, 6, 5, 3, 1, 2, 6, 6, …
## $ swk13 <dbl> 6, 5, 4, 4, 4, 5, 3, 5, 6, 3, 7, 1, 1, 6, 7, 7, …
## $ swk14 <dbl> 5, 5, 3, 4, 4, 5, 2, 4, 6, 4, 6, 5, 3, 2, 6, 5, …
## $ swk15 <dbl> 5, 5, 5, 6, 5, 4, 4, 5, 6, 3, 6, 4, 2, 4, 7, 7, …
## $ swk16 <dbl> 4, 7, 7, 6, 5, 7, 6, 5, 7, 6, 7, 7, 7, 6, 5, 6, …
## $ swk17 <dbl> 7, 3, 5, 7, 6, 2, 4, 5, 6, 3, 7, 5, 6, 3, 4, 4, …
## $ swk18 <dbl> 6, 4, 7, 6, 5, 4, 6, 6, 6, 4, 5, 4, 5, 3, 6, 7, …
## $ swk19 <dbl> 5, 4, 4, 4, 5, 3, 6, 6, 6, 5, 4, 6, 5, 4, 7, 7, …
## $ swk20 <dbl> 5, 5, 6, 6, 5, 6, 5, 4, 6, 5, 6, 4, 1, 7, 7, 7, …
## $ swk21 <dbl> 5, 3, 4, 4, 5, 3, 1, 7, 5, 3, 1, 3, 3, 2, 5, 4, …
## $ swk22 <dbl> 7, 4, 7, 7, 6, 5, 7, 6, 6, 5, 6, 3, 7, 4, 7, 6, …
## $ swk23 <dbl> 3, 5, 7, 5, 4, 3, 3, 3, 6, 4, 5, 2, 2, 4, 5, 5, …
## $ swk24 <dbl> 5, 4, 7, 6, 4, 3, 6, 1, 6, 3, 6, 4, NA, 5, 7, 6,…
## $ swk25 <dbl> 5, 5, 3, 6, 5, 5, 5, 4, 6, 5, 6, 4, 1, 3, 7, 7, …
## $ swk26 <dbl> 6, 5, 7, 4, 7, 4, 6, 6, 7, 6, 7, 5, 6, 4, 5, 5, …
## $ swk27 <dbl> 6, 4, 3, 4, 6, 5, 5, 6, 6, 4, 3, 4, 6, 2, 7, 5, …
## $ swk28 <dbl> 6, 6, 4, 7, 6, 6, 7, 7, 6, 6, 7, 5, 5, 5, 6, 7, …
## $ swk29 <dbl> 3, 5, 4, 5, 4, 4, 3, 5, 6, 4, 7, 4, 2, 4, 7, 7, …
## $ swk30 <dbl> 6, 4, 7, 5, NA, 5, 6, 6, 6, 5, 6, 4, 6, 5, 7, 7,…
## $ swk31 <dbl> 4, 5, 3, 5, NA, 5, 4, 3, 6, 6, 4, 4, 6, 5, 7, 4,…
## $ swk32 <dbl> 7, 6, 5, 7, NA, 6, 5, 5, 6, 6, 7, 5, 1, 6, 7, 6,…
## $ swk33 <dbl> 7, 4, 6, 7, NA, 6, 2, 7, 6, 5, 5, 5, 6, 4, 7, 7,…
## $ swk34 <dbl> 6, 5, 7, 6, NA, 5, 4, 5, 5, 4, 5, 4, 4, 4, 5, 5,…
## $ swk35 <dbl> 7, 6, 7, 7, NA, 5, 7, 6, 6, 5, 7, 5, 2, 7, 7, 7,…
## $ swk36 <dbl> 4, 6, 6, 6, NA, 5, 6, 7, 6, 4, 3, 4, 5, 3, 7, 7,…
## $ unteltern1 <dbl> 7, 6, 7, 7, 6, 6, 6, 5, 6, 4, 4, 6, 5, 5, 7, 7, …
## $ unteltern2 <dbl> 6, 6, 7, 7, 7, 7, 5, 5, 6, 5, 6, 5, 3, 6, 7, 7, …
## $ unteltern3 <dbl> 6, 3, 6, 7, 7, 6, 7, 6, 7, 6, 7, 4, 6, 4, 7, 6, …
## $ unteltern4 <dbl> 7, 6, 7, 7, 7, 7, 7, 6, 7, 7, 7, 5, 4, 4, 7, 7, …
## $ unteltern5 <dbl> 7, 6, 7, 7, 7, 7, 4, 5, 6, 7, 7, 6, 3, 4, 7, 7, …
## $ unteltern6 <dbl> 7, 4, 7, 7, 6, 6, 6, 4, 6, 6, 6, 5, 4, 5, 7, 7, …
## $ untfreunde1 <dbl> 6, 5, 6, 7, 7, 5, 5, NA, 7, 6, 7, 5, 6, 6, 7, 7,…
## $ untfreunde2 <dbl> 7, 6, 5, 7, 7, 7, 6, NA, 7, 5, 7, 6, 5, 7, 7, 7,…
## $ untfreunde3 <dbl> 6, 5, 4, 7, 7, 6, 4, NA, 7, 5, 7, 6, 6, 7, 7, 6,…
## $ untfreunde4 <dbl> 7, 5, 2, 7, 7, 7, 5, NA, 7, 4, 5, 5, 6, 6, 7, 5,…
## $ untfreunde5 <dbl> 7, 4, 2, 7, 7, 6, 4, NA, 7, 5, 5, 5, 6, 6, 7, 4,…
## $ untfreunde6 <dbl> 7, 4, 5, 7, 7, 7, 6, NA, 7, 5, 5, 6, 6, 6, 7, 5,…
## $ leben1 <dbl> 5, 5, 7, 1, 5, 4, 6, 7, 5, 5, 5, 5, 5, 5, 6, 5, …
## $ leben2 <dbl> 4, 4, 7, 6, 6, 5, 5, 5, 6, 5, 6, 3, 3, 6, 7, 7, …
## $ leben3 <dbl> 6, 5, 2, 6, 6, 6, 4, 6, 5, 4, 4, 4, 4, 5, 4, 4, …
## $ leben4 <dbl> 2, 4, 1, 5, 5, 5, 2, 6, 5, 4, 3, 5, 3, 4, 4, 4, …
## $ leben5 <dbl> 6, 5, 5, 6, 6, 6, 5, 5, 5, 5, 6, 4, 1, 5, 6, 6, …
## $ leben6 <dbl> 7, 5, 6, 6, 6, 5, 7, 7, 6, 6, 6, 5, 1, 6, 7, 1, …
## $ leben7 <dbl> 7, 5, 5, 7, 7, 6, 5, NA, 7, 5, 7, 6, 5, 7, 7, 7,…
## $ leben8 <dbl> 7, 5, 7, 7, 6, 6, 6, 6, 6, 5, 7, 5, 5, 4, 7, 6, …
## $ leben9 <dbl> 7, 5, 6, 7, 5, 5, 7, 6, 6, 4, 6, 5, 6, 5, 7, 7, …
## $ leben10 <dbl> 7, 4, 6, 7, 6, 5, 4, 5, 6, 5, 7, 5, 4, 6, 7, 6, …
## $ leben11 <dbl> 7, 4, 5, 6, 6, 6, 6, 6, 6, 6, 7, 5, 1, 6, 7, 7, …
## $ stress1 <dbl> 3, 4, 3, 4, 1, 1, 7, 1, 6, 1, 6, 4, 1, 2, 2, 2, …
## $ stress2 <dbl> 1, 4, 1, 1, 1, 1, 1, 1, 3, 1, 7, 2, 3, 1, 1, 1, …
## $ stress3 <dbl> 3, 2, 7, 4, 2, 2, 5, 3, 2, 1, 7, 7, 4, 1, 2, 2, …
## $ stress4 <dbl> 1, 2, 1, 6, 1, 1, 7, 3, 6, 1, 7, 4, 3, 1, 2, 1, …
## $ stress5 <dbl> 7, 5, 5, 3, 1, 3, 4, 3, 5, 5, 5, 6, 2, 7, 7, 7, …
## $ stress6 <dbl> 3, 6, 3, 3, 2, 1, 2, 3, 1, 5, 5, 1, 7, 4, 3, 1, …
## $ stress7 <dbl> 1, 2, 1, 1, 1, 1, 3, 2, 1, 3, 7, 2, 6, 1, 1, 1, …
## $ stress8 <dbl> 1, 3, 7, 2, 2, 1, 2, 3, 2, 3, 7, 3, 1, 1, 1, 1, …
## $ stress9 <dbl> 5, 2, 5, 1, 3, 1, 1, 3, 2, 3, 7, 4, 1, 1, 1, 7, …
## $ stress10 <dbl> 1, 4, 1, 2, 3, 1, 6, 5, 2, 2, 7, 3, 2, 2, 1, 3, …
## $ stress11 <dbl> 1, 5, 3, 1, 3, 1, 2, 5, 1, 1, 7, 2, 7, 4, 1, 1, …
## $ stress12 <dbl> 1, 5, 5, 2, 3, 1, 1, 3, 2, 3, 6, 1, 7, 4, 3, 3, …
## $ Deutschnote <dbl> 4, 4, 5, 4, 4, 4, 4, 5, 4, 5, 4, 4, 4, 4, 4, 4, …
## $ Mathenote <dbl> 3, 4, 5, 2, 4, 4, 4, 5, 4, 5, 4, 3, 4, 4, 4, 4, …
## $ Fremdsprachenote <dbl> 5, 3, 4, 3, 4, 3, 4, 5, 3, 4, 3, 3, 4, 3, 5, 4, …
## $ Gesamtnote <dbl> NA, 4, 5, 4, 5, 4, 4, 5, 4, 4, 4, 3, 4, 4, 4, 4,…
## $ bildung_vater <dbl+lbl> 4, 3, 2, 2, 4, 2, 2, 4, 2, 3, 1, 1, 2, 1, 3,…
## $ bildung_mutter <dbl+lbl> 4, 2, 1, 4, 2, 2, 2, 4, 2, 2, 2, 1, 1, 1, 2,…
## $ bildung_vater_binaer <dbl+lbl> 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1,…
## $ bildung_mutter_binaer <dbl+lbl> 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,…
## $ swk_neueslernen <dbl> 4.571429, 5.000000, 5.000000, 4.571429, 5.333333…
## $ swk_lernregulation <dbl> 5.500000, 4.000000, 4.875000, 5.375000, 5.285714…
## $ swk_motivation <dbl> 4.800000, 4.600000, 6.000000, 5.400000, 4.666667…
## $ swk_durchsetzung <dbl> 5.333333, 5.000000, 4.333333, 5.333333, 4.666667…
## $ swk_sozialkomp <dbl> 5.142857, 5.571429, 5.428571, 6.000000, 4.800000…
## $ swk_beziehung <dbl> 6.166667, 4.666667, 5.500000, 6.833333, 6.200000…
## $ unt_eltern <dbl> 6.666667, 5.166667, 6.833333, 7.000000, 6.666667…
## $ unt_freunde <dbl> 6.666667, 4.833333, 4.000000, 7.000000, 7.000000…
## $ leben_selbst <dbl> 5.5, 4.5, 6.5, 6.0, 6.0, 5.0, 6.0, 6.0, 6.0, 5.5…
## $ leben_familie <dbl> 7.000000, 4.666667, 6.333333, 7.000000, 5.666667…
## $ leben_schule <dbl> 4.333333, 4.666667, 3.333333, 4.000000, 5.333333…
## $ leben_freunde <dbl> 6.5, 5.0, 5.0, 6.5, 6.5, 6.0, 5.0, 5.0, 6.0, 5.0…
## $ leben_gesamt <dbl> 5.909091, 4.636364, 5.181818, 5.818182, 5.818182…
## $ stress_somatisch <dbl> 3.000000, 3.833333, 3.333333, 3.500000, 1.333333…
## $ stress_psychisch <dbl> 1.666667, 3.500000, 3.666667, 1.500000, 2.500000…
Now we can convert the labelled variables into factors:
$westost <- as_factor(beispieldaten$westost,
beispieldatenlevels = "default")
$geschlecht <- as_factor(beispieldaten$geschlecht,
beispieldatenlevels = "default")
$bildung_vater <- as_factor(beispieldaten$bildung_vater,
beispieldatenlevels = "default")
$bildung_mutter <- as_factor(beispieldaten$bildung_mutter,
beispieldatenlevels = "default")
$bildung_vater_binaer <- as_factor(beispieldaten$bildung_vater_binaer,
beispieldatenlevels = "default")
$bildung_mutter_binaer <- as_factor(beispieldaten$bildung_mutter_binaer,
beispieldatenlevels = "default")
3.2.2 Exporting an SPSS file
We can also save a data frame in the SPSS file format using the write_save()
function:
write_sav(data = zufriedenheit_long, path = "data/zufriedenheit.sav")
3.3 Excel files
We can also import Excel spreadsheets. Click on Import Dataset > From Excel,
and then choose the file you want to import. We will call the data frame
zufriedenheit_xls
. Underneath the Name
text box there is a drop-down menu
entitled Sheet
. This allows you to specify which worksheet you want to import.
You should get the following R code in the Code Preview:
library(readxl)
<- read_excel("data/zufriedenheit.xlsx",
zufriedenheit_xls sheet = "zufriedenheit")
The function we are using is called read_excel()
and is available in the readxl
package (installed with tidyverse
but has to be loaded separately).
Categorical variables should again be converted to factors:
$Vpn <- as.factor(zufriedenheit_xls$Vpn)
zufriedenheit_xls$messzeitpunkt <- as.factor(zufriedenheit_xls$messzeitpunkt) zufriedenheit_xls
3.4 RData files
The last option is an RData
file. This is a binary
file format and has the
advantage that all attributes of an R data frame are included (e.g., factor definitions). This is in contrast to exporting to a text file, such as CSV
, where all metadata will be lost. A further advantage is that we can combine multiple R objects (mostly data frames) in a single RData
file and that the file is compressed in order to save space. The only disadvantage is that this file type is specific to R and cannot be read by other programs (such as jamovi).
You can save objects in your workspace as .RData
(or .Rda
) files with the function save()
:
save(zufriedenheit, file = "data/zufriedenheit.Rda")
You can also save several objects in one go:
save(zufriedenheit, zufriedenheit_spss, zufriedenheit_xls,
file = "data/zufriedenheit_all.Rda")
The file can be loaded using the function load()
:
load(file = "data/zufriedenheit_all.Rda")