3 Importing and exporting data

One of the first steps in data analysis is usually importing datasets. These can come in several different formats, depending on their origin. The most common text-based format is a comma-separated values (CSV) file, with tab-delimited text files being another option. It is also very common to keep data in an Excel spreadsheet, or in an SPSS .sav file. Fortunately, R has several packages and functions that allow us to easily import data from comma-separated value (CSV) or other text files, SPSS files and Excel files. There are a number of functions in base R (e.g., read.csv()) but we will use the corresponding alternative functions provided by packages belonging to the tidyverse-universe (though not all of these are explicitly part of the tidyverse package, some have to be loaded separately). These functions all start with read_*. There are also the corresponding write_* functions that allow us to export datasets to CSV, SPSS or Excel files.

In this chapter, we will import the following datasets, all of which can be downloaded from ILIAS-page of the course:

File name Type Import function Package
zufriedenheit.csv Text file with commas as columns delimiters read_csv() readr
zufriedenheit-semicolon.csv Text file with a semi-colon as the delimiter read_csv2(), read_delim() readr
zufriedenheit.sav SPSS file read_sav() haven
zufriedenheit.xls Excel file read_excel(), read_xls() or read_xlsx() readxl

These are four different files containing the same dataset. The data consist of a satisfaction rating, recorded at two time points for four subjects.

In your RStudio project directory, create a new directory named ‘data’ if it doesn’t already exist, and put the files you downloaded in there.

In RStudio, there are two ways of importing files:

  1. Using functions: read_csv(), read_csv2() (for ‘;’ delimiters), read_sav() and read_excel().

  2. Via the GUI: You can access this functionality either through ‘File > Import Dataset’ or in the Environment pane.

The second option is easier to use, and has the advantage that R will generate all code for importing datasets for us, which can be copied for subsequent use.

The advantage of using the first approach is that you can keep a record of every step in your data analysis pipeline, which is important for reproducibility.

We will now import these four datasets using both options.

3.1 Comma-separated value (CSV) files

The functions we need for importing CSV files are available in the readr package. This needs to be loaded:

library(readr)

However, readr is loaded automatically when we load the tidyverse package:

library(tidyverse)

We will first import the data files using the GUI. In the Environment pane , click on

Import Dataset > From Text (readr) or use the menu ‘File > Import Dataset > From Text (readr)’. You will see a dialogue containing a Code Preview.

On the left (bottom) you will find all options for importing the data. These options are all arguments of the read_csv() function (or the more general read_delim() function):

In the File Browser you can select the zufriedenheit.csv file. After doing so you will see Data Preview.

Alternatively, you can also navigate to the ‘data’ directory in the Files pane, and then click on the dataset you want to import. You will then be asked if you want to view the file, or import it. Selecting ‘import’ will take you straight to the data preview.

According to the preview, the variables Vpn and messzeitpunkt will be imported as character vectors. This means that if we want to treat them as categorical variables, which they are, they will need to be converted to factors.

Under Import Options a file name is generated automatically. This is simply the file name, minus the .csv suffix.

Try to discover what happens when you play around with the options, e.g. “First Row as Names”.

Now click on Import. A new data frame (tibble) with the name zufriedenheit will appear in the Environment pane, and the generated R code is printed to the console.

You can copy these lines of code, and paste them into an R script. Next time you want to import this dataset, you can simply run the code, without having to use the GUI.

library(readr)
zufriedenheit <- read_csv("data/zufriedenheit.csv")
## Rows: 8 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Vpn, messzeitpunkt
## dbl (1): rating
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Now is a good time to convert the grouping variables Vpn and messzeitpunkt to factors:

zufriedenheit$Vpn <- as.factor(zufriedenheit$Vpn)
zufriedenheit$messzeitpunkt <- as.factor(zufriedenheit$messzeitpunkt)

If you print the tibble zufriedenheit, R tells you that Vpn and messzeitpunkt are factors (abbreviated to in the console), and that rating is a numeric variable (). The Environment also provides the information that Vpn has 4 levels and messzeitpunkt has 2 levels.

zufriedenheit
## # A tibble: 8 × 3
##   Vpn   messzeitpunkt rating
##   <fct> <fct>          <dbl>
## 1 VP_1  t1              64.3
## 2 VP_2  t1              65.1
## 3 VP_3  t1              54.2
## 4 VP_4  t1              54.5
## 5 VP_1  t2              67.2
## 6 VP_2  t2              75.6
## 7 VP_3  t2              84.6
## 8 VP_4  t2              73.9

Now let’s do the same with the semi-colon delimited file zufriedenheit-semicolon.csv. This is something that you might frequently work with if your computer uses German language settings (since the comma is reserved for the decimal separator in German mathematical notation).

This file uses semicolons instead of commas as a column delimiter. Therefore, you need to change the delimiter setting. Now you should see the following code in the Code Preview:

library(readr)
zufriedenheit_semicolon <- read_delim("data/zufriedenheit-semicolon.csv",
    ";", escape_double = FALSE, trim_ws = TRUE)

Another possibility to do this is using the function read_csv2():

zufriedenheit  <- read_csv2("data/zufriedenheit-semicolon.csv") 
## ℹ Using "','" as decimal and "'.'" as grouping mark. Use `read_delim()` for more control.
## Rows: 8 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ";"
## chr (2): Vpn, messzeitpunkt
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

3.1.1 Exporting a CSV file

If you want to save a data frame as a CSV file, you can use the function write_csv() (or write_delim() if you want to use semicolons). You have to specify the path where you want to save the CSV file. Fortunately, since we work in an Rstudio project, we can always use a relative path:

write_csv(x = zufriedenheit, path = "data/zufriedenheit.csv")

3.2 SPSS datasets

We can import the same dataset, but this time from the SPSS file (zufriedenheit.sav). The function we use for this is called read_sav(), which is provided by the haven package (also includes functions for importing Stata and SAS files). We have installed this package already with the tidyverse but it needs to be loaded separately:

library(haven)

As before, we can use the GUI to import files. In the Environment pane, click on Import Dataset > From SPSS, and choose the file. To distinguish this from the previously import CSV file, let’s change the name to zufriedenheit_spss.

In the Code Preview, the important line is:

zufriedenheit_spss <- read_sav("data/zufriedenheit.sav") 

Note that the read_sav() function does have more arguments. These allow you skip the first n lines, define values to be converted to NA, select columns and to repair non-unique column names.

In contrast to importing CSV files, however, the GUI offers no further options when importing from SPSS except for the name of the data frame.

Now you can click on Import. In the Environment pane, the newly imported dataset will appear. Variables imported from SPSS can have additional attributes, the most important of which is the labels attribute. Both the Vpn and the messzeitpunkt columns have this:

zufriedenheit_spss$Vpn
## <labelled<double>[8]>
## [1] 1 2 3 4 1 2 3 4
## 
## Labels:
##  value label
##      1  VP_1
##      2  VP_2
##      3  VP_3
##      4  VP_4
zufriedenheit_spss$messzeitpunkt
## <labelled<double>[8]>
## [1] 1 1 1 1 2 2 2 2
## 
## Labels:
##  value label
##      1    t1
##      2    t2

Both of these variables are declared as labelled double. This a special class provided by the haven package to associate arbitrary labels with numeric or character vectors. These labels contain the SPSS value labels. Using this, we can look up what type of coding scheme was used for categorical variables. If for example the values 0 and 1 were used for a sex grouping variable, we can determine which sex has was coded as 0 and which was coded as 1.

If we want to convert these variables to factors, we can use the function as_factor() from the haven package. This allows us to use either the SPSS values or value labels as levels of the factor in R. This is achieved through the argument levels, which can take the values "default", "labels", "values" or "both".

"default" seems to be the most sensible option - this means that if labels are available they are used as factor levels, otherwise the values themselves are used. The other options are "both" (values and value labels are combined), "label" (labels only) and "values" (values only).

Arguments of the haven::as_factor() function

levels
  How to create the levels of the generated factor:
      "default": uses labels where available, otherwise the values. 
                 Labels are sorted by value (rather than alphabetically).
      "both": like "default", but pastes together the level and value
      "label": use only the labels; unlabelled values become NA
      "values": use only the values

ordered
    If TRUE create an ordered (ordinal) factor, if FALSE (the default) create
    a regular (nominal) factor.

The argument ordered allows us to create an ordered factor, if the ordering of the factor levels is important.

zufriedenheit_spss$Vpn <- as_factor(zufriedenheit_spss$Vpn,
                                    levels = "default")

zufriedenheit_spss$messzeitpunkt <- as_factor(zufriedenheit_spss$messzeitpunkt,
                                              levels = "default")

Once we have converted the categorical variables to factors, they are no longer labelled.The labels are only intended to be an intermediate data structure, so that we can convert into regular R factors.

3.2.1 Importing a large dataset

Let’s import a bigger dataset (in the sense that it has many variables): beispieldaten.sav.

The glimpse() function (from pack) is useful to get an overview and to see which variables are labelled and which are not:

beispieldaten <- read_sav("data/beispieldaten.sav")
glimpse(beispieldaten)
## Rows: 286
## Columns: 98
## $ ID                    <dbl> 1, 2, 10, 11, 12, 14, 15, 16, 17, 18, 19, 20, 21…
## $ westost               <dbl+lbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ geschlecht            <dbl+lbl> 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1,…
## $ alter                 <dbl> 13, 14, 14, 14, 14, 14, 14, 15, 15, 14, 15, 15, …
## $ swk1                  <dbl> 2, 4, 5, 3, 5, 3, 4, 5, 6, 5, 6, 5, 5, 5, 6, 6, …
## $ swk2                  <dbl> 5, 6, 4, 6, 6, 5, 3, 4, 6, 3, 6, 3, 1, 5, 7, 7, …
## $ swk3                  <dbl> 6, 4, 5, 6, 6, 5, 5, 7, 5, 4, 6, 7, 6, 3, 7, 7, …
## $ swk4                  <dbl> 4, 4, 6, 5, 5, 4, 4, 7, 5, 5, 6, 4, 5, 5, 6, 4, …
## $ swk5                  <dbl> 7, 6, 5, 7, 7, 5, 6, 7, 7, 5, 7, 3, 3, 6, 7, 7, …
## $ swk6                  <dbl> 5, 5, 7, 7, 6, 7, 7, 7, 6, 5, 7, 4, 5, 6, 6, 6, …
## $ swk7                  <dbl> 5, 5, 7, 6, 5, 5, 1, NA, 5, 5, 5, 3, 7, 3, 7, 4,…
## $ swk8                  <dbl> 6, 6, 6, 6, 5, 6, 6, 7, 6, 4, 6, 4, 1, 5, 6, 6, …
## $ swk9                  <dbl> 6, 5, 4, 7, 7, 6, 5, 7, 6, 5, 5, 5, 6, 7, 7, 5, …
## $ swk10                 <dbl> 4, 4, 5, 7, 6, 5, 7, 6, 6, 5, 7, 6, 4, 5, 7, 6, …
## $ swk11                 <dbl> 4, 4, 3, 3, 4, 3, 3, 4, 6, 5, 4, 4, 4, 3, 7, 7, …
## $ swk12                 <dbl> 5, 4, 3, 4, 4, 5, 6, 3, 6, 6, 5, 3, 1, 2, 6, 6, …
## $ swk13                 <dbl> 6, 5, 4, 4, 4, 5, 3, 5, 6, 3, 7, 1, 1, 6, 7, 7, …
## $ swk14                 <dbl> 5, 5, 3, 4, 4, 5, 2, 4, 6, 4, 6, 5, 3, 2, 6, 5, …
## $ swk15                 <dbl> 5, 5, 5, 6, 5, 4, 4, 5, 6, 3, 6, 4, 2, 4, 7, 7, …
## $ swk16                 <dbl> 4, 7, 7, 6, 5, 7, 6, 5, 7, 6, 7, 7, 7, 6, 5, 6, …
## $ swk17                 <dbl> 7, 3, 5, 7, 6, 2, 4, 5, 6, 3, 7, 5, 6, 3, 4, 4, …
## $ swk18                 <dbl> 6, 4, 7, 6, 5, 4, 6, 6, 6, 4, 5, 4, 5, 3, 6, 7, …
## $ swk19                 <dbl> 5, 4, 4, 4, 5, 3, 6, 6, 6, 5, 4, 6, 5, 4, 7, 7, …
## $ swk20                 <dbl> 5, 5, 6, 6, 5, 6, 5, 4, 6, 5, 6, 4, 1, 7, 7, 7, …
## $ swk21                 <dbl> 5, 3, 4, 4, 5, 3, 1, 7, 5, 3, 1, 3, 3, 2, 5, 4, …
## $ swk22                 <dbl> 7, 4, 7, 7, 6, 5, 7, 6, 6, 5, 6, 3, 7, 4, 7, 6, …
## $ swk23                 <dbl> 3, 5, 7, 5, 4, 3, 3, 3, 6, 4, 5, 2, 2, 4, 5, 5, …
## $ swk24                 <dbl> 5, 4, 7, 6, 4, 3, 6, 1, 6, 3, 6, 4, NA, 5, 7, 6,…
## $ swk25                 <dbl> 5, 5, 3, 6, 5, 5, 5, 4, 6, 5, 6, 4, 1, 3, 7, 7, …
## $ swk26                 <dbl> 6, 5, 7, 4, 7, 4, 6, 6, 7, 6, 7, 5, 6, 4, 5, 5, …
## $ swk27                 <dbl> 6, 4, 3, 4, 6, 5, 5, 6, 6, 4, 3, 4, 6, 2, 7, 5, …
## $ swk28                 <dbl> 6, 6, 4, 7, 6, 6, 7, 7, 6, 6, 7, 5, 5, 5, 6, 7, …
## $ swk29                 <dbl> 3, 5, 4, 5, 4, 4, 3, 5, 6, 4, 7, 4, 2, 4, 7, 7, …
## $ swk30                 <dbl> 6, 4, 7, 5, NA, 5, 6, 6, 6, 5, 6, 4, 6, 5, 7, 7,…
## $ swk31                 <dbl> 4, 5, 3, 5, NA, 5, 4, 3, 6, 6, 4, 4, 6, 5, 7, 4,…
## $ swk32                 <dbl> 7, 6, 5, 7, NA, 6, 5, 5, 6, 6, 7, 5, 1, 6, 7, 6,…
## $ swk33                 <dbl> 7, 4, 6, 7, NA, 6, 2, 7, 6, 5, 5, 5, 6, 4, 7, 7,…
## $ swk34                 <dbl> 6, 5, 7, 6, NA, 5, 4, 5, 5, 4, 5, 4, 4, 4, 5, 5,…
## $ swk35                 <dbl> 7, 6, 7, 7, NA, 5, 7, 6, 6, 5, 7, 5, 2, 7, 7, 7,…
## $ swk36                 <dbl> 4, 6, 6, 6, NA, 5, 6, 7, 6, 4, 3, 4, 5, 3, 7, 7,…
## $ unteltern1            <dbl> 7, 6, 7, 7, 6, 6, 6, 5, 6, 4, 4, 6, 5, 5, 7, 7, …
## $ unteltern2            <dbl> 6, 6, 7, 7, 7, 7, 5, 5, 6, 5, 6, 5, 3, 6, 7, 7, …
## $ unteltern3            <dbl> 6, 3, 6, 7, 7, 6, 7, 6, 7, 6, 7, 4, 6, 4, 7, 6, …
## $ unteltern4            <dbl> 7, 6, 7, 7, 7, 7, 7, 6, 7, 7, 7, 5, 4, 4, 7, 7, …
## $ unteltern5            <dbl> 7, 6, 7, 7, 7, 7, 4, 5, 6, 7, 7, 6, 3, 4, 7, 7, …
## $ unteltern6            <dbl> 7, 4, 7, 7, 6, 6, 6, 4, 6, 6, 6, 5, 4, 5, 7, 7, …
## $ untfreunde1           <dbl> 6, 5, 6, 7, 7, 5, 5, NA, 7, 6, 7, 5, 6, 6, 7, 7,…
## $ untfreunde2           <dbl> 7, 6, 5, 7, 7, 7, 6, NA, 7, 5, 7, 6, 5, 7, 7, 7,…
## $ untfreunde3           <dbl> 6, 5, 4, 7, 7, 6, 4, NA, 7, 5, 7, 6, 6, 7, 7, 6,…
## $ untfreunde4           <dbl> 7, 5, 2, 7, 7, 7, 5, NA, 7, 4, 5, 5, 6, 6, 7, 5,…
## $ untfreunde5           <dbl> 7, 4, 2, 7, 7, 6, 4, NA, 7, 5, 5, 5, 6, 6, 7, 4,…
## $ untfreunde6           <dbl> 7, 4, 5, 7, 7, 7, 6, NA, 7, 5, 5, 6, 6, 6, 7, 5,…
## $ leben1                <dbl> 5, 5, 7, 1, 5, 4, 6, 7, 5, 5, 5, 5, 5, 5, 6, 5, …
## $ leben2                <dbl> 4, 4, 7, 6, 6, 5, 5, 5, 6, 5, 6, 3, 3, 6, 7, 7, …
## $ leben3                <dbl> 6, 5, 2, 6, 6, 6, 4, 6, 5, 4, 4, 4, 4, 5, 4, 4, …
## $ leben4                <dbl> 2, 4, 1, 5, 5, 5, 2, 6, 5, 4, 3, 5, 3, 4, 4, 4, …
## $ leben5                <dbl> 6, 5, 5, 6, 6, 6, 5, 5, 5, 5, 6, 4, 1, 5, 6, 6, …
## $ leben6                <dbl> 7, 5, 6, 6, 6, 5, 7, 7, 6, 6, 6, 5, 1, 6, 7, 1, …
## $ leben7                <dbl> 7, 5, 5, 7, 7, 6, 5, NA, 7, 5, 7, 6, 5, 7, 7, 7,…
## $ leben8                <dbl> 7, 5, 7, 7, 6, 6, 6, 6, 6, 5, 7, 5, 5, 4, 7, 6, …
## $ leben9                <dbl> 7, 5, 6, 7, 5, 5, 7, 6, 6, 4, 6, 5, 6, 5, 7, 7, …
## $ leben10               <dbl> 7, 4, 6, 7, 6, 5, 4, 5, 6, 5, 7, 5, 4, 6, 7, 6, …
## $ leben11               <dbl> 7, 4, 5, 6, 6, 6, 6, 6, 6, 6, 7, 5, 1, 6, 7, 7, …
## $ stress1               <dbl> 3, 4, 3, 4, 1, 1, 7, 1, 6, 1, 6, 4, 1, 2, 2, 2, …
## $ stress2               <dbl> 1, 4, 1, 1, 1, 1, 1, 1, 3, 1, 7, 2, 3, 1, 1, 1, …
## $ stress3               <dbl> 3, 2, 7, 4, 2, 2, 5, 3, 2, 1, 7, 7, 4, 1, 2, 2, …
## $ stress4               <dbl> 1, 2, 1, 6, 1, 1, 7, 3, 6, 1, 7, 4, 3, 1, 2, 1, …
## $ stress5               <dbl> 7, 5, 5, 3, 1, 3, 4, 3, 5, 5, 5, 6, 2, 7, 7, 7, …
## $ stress6               <dbl> 3, 6, 3, 3, 2, 1, 2, 3, 1, 5, 5, 1, 7, 4, 3, 1, …
## $ stress7               <dbl> 1, 2, 1, 1, 1, 1, 3, 2, 1, 3, 7, 2, 6, 1, 1, 1, …
## $ stress8               <dbl> 1, 3, 7, 2, 2, 1, 2, 3, 2, 3, 7, 3, 1, 1, 1, 1, …
## $ stress9               <dbl> 5, 2, 5, 1, 3, 1, 1, 3, 2, 3, 7, 4, 1, 1, 1, 7, …
## $ stress10              <dbl> 1, 4, 1, 2, 3, 1, 6, 5, 2, 2, 7, 3, 2, 2, 1, 3, …
## $ stress11              <dbl> 1, 5, 3, 1, 3, 1, 2, 5, 1, 1, 7, 2, 7, 4, 1, 1, …
## $ stress12              <dbl> 1, 5, 5, 2, 3, 1, 1, 3, 2, 3, 6, 1, 7, 4, 3, 3, …
## $ Deutschnote           <dbl> 4, 4, 5, 4, 4, 4, 4, 5, 4, 5, 4, 4, 4, 4, 4, 4, …
## $ Mathenote             <dbl> 3, 4, 5, 2, 4, 4, 4, 5, 4, 5, 4, 3, 4, 4, 4, 4, …
## $ Fremdsprachenote      <dbl> 5, 3, 4, 3, 4, 3, 4, 5, 3, 4, 3, 3, 4, 3, 5, 4, …
## $ Gesamtnote            <dbl> NA, 4, 5, 4, 5, 4, 4, 5, 4, 4, 4, 3, 4, 4, 4, 4,…
## $ bildung_vater         <dbl+lbl> 4, 3, 2, 2, 4, 2, 2, 4, 2, 3, 1, 1, 2, 1, 3,…
## $ bildung_mutter        <dbl+lbl> 4, 2, 1, 4, 2, 2, 2, 4, 2, 2, 2, 1, 1, 1, 2,…
## $ bildung_vater_binaer  <dbl+lbl> 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1,…
## $ bildung_mutter_binaer <dbl+lbl> 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,…
## $ swk_neueslernen       <dbl> 4.571429, 5.000000, 5.000000, 4.571429, 5.333333…
## $ swk_lernregulation    <dbl> 5.500000, 4.000000, 4.875000, 5.375000, 5.285714…
## $ swk_motivation        <dbl> 4.800000, 4.600000, 6.000000, 5.400000, 4.666667…
## $ swk_durchsetzung      <dbl> 5.333333, 5.000000, 4.333333, 5.333333, 4.666667…
## $ swk_sozialkomp        <dbl> 5.142857, 5.571429, 5.428571, 6.000000, 4.800000…
## $ swk_beziehung         <dbl> 6.166667, 4.666667, 5.500000, 6.833333, 6.200000…
## $ unt_eltern            <dbl> 6.666667, 5.166667, 6.833333, 7.000000, 6.666667…
## $ unt_freunde           <dbl> 6.666667, 4.833333, 4.000000, 7.000000, 7.000000…
## $ leben_selbst          <dbl> 5.5, 4.5, 6.5, 6.0, 6.0, 5.0, 6.0, 6.0, 6.0, 5.5…
## $ leben_familie         <dbl> 7.000000, 4.666667, 6.333333, 7.000000, 5.666667…
## $ leben_schule          <dbl> 4.333333, 4.666667, 3.333333, 4.000000, 5.333333…
## $ leben_freunde         <dbl> 6.5, 5.0, 5.0, 6.5, 6.5, 6.0, 5.0, 5.0, 6.0, 5.0…
## $ leben_gesamt          <dbl> 5.909091, 4.636364, 5.181818, 5.818182, 5.818182…
## $ stress_somatisch      <dbl> 3.000000, 3.833333, 3.333333, 3.500000, 1.333333…
## $ stress_psychisch      <dbl> 1.666667, 3.500000, 3.666667, 1.500000, 2.500000…

Now we can convert the labelled variables into factors:

beispieldaten$westost <- as_factor(beispieldaten$westost,
                                              levels = "default")
beispieldaten$geschlecht <- as_factor(beispieldaten$geschlecht,
                                              levels = "default")
beispieldaten$bildung_vater <- as_factor(beispieldaten$bildung_vater,
                                              levels = "default")
beispieldaten$bildung_mutter <- as_factor(beispieldaten$bildung_mutter,
                                              levels = "default")
beispieldaten$bildung_vater_binaer <- as_factor(beispieldaten$bildung_vater_binaer,
                                              levels = "default")
beispieldaten$bildung_mutter_binaer <- as_factor(beispieldaten$bildung_mutter_binaer,
                                              levels = "default")

3.2.2 Exporting an SPSS file

We can also save a data frame in the SPSS file format using the write_save() function:

write_sav(data = zufriedenheit_long, path = "data/zufriedenheit.sav")

3.3 Excel files

We can also import Excel spreadsheets. Click on Import Dataset > From Excel, and then choose the file you want to import. We will call the data frame zufriedenheit_xls. Underneath the Name text box there is a drop-down menu entitled Sheet. This allows you to specify which worksheet you want to import. You should get the following R code in the Code Preview:

library(readxl)
zufriedenheit_xls <- read_excel("data/zufriedenheit.xlsx",
    sheet = "zufriedenheit")

The function we are using is called read_excel() and is available in the readxl package (installed with tidyverse but has to be loaded separately).

Categorical variables should again be converted to factors:

zufriedenheit_xls$Vpn <- as.factor(zufriedenheit_xls$Vpn)
zufriedenheit_xls$messzeitpunkt <- as.factor(zufriedenheit_xls$messzeitpunkt)

3.4 RData files

The last option is an RData file. This is a binary file format and has the advantage that all attributes of an R data frame are included (e.g., factor definitions). This is in contrast to exporting to a text file, such as CSV, where all metadata will be lost. A further advantage is that we can combine multiple R objects (mostly data frames) in a single RData file and that the file is compressed in order to save space. The only disadvantage is that this file type is specific to R and cannot be read by other programs (such as jamovi).

You can save objects in your workspace as .RData (or .Rda) files with the function save():

save(zufriedenheit, file = "data/zufriedenheit.Rda")

You can also save several objects in one go:

save(zufriedenheit, zufriedenheit_spss, zufriedenheit_xls,
     file = "data/zufriedenheit_all.Rda")

The file can be loaded using the function load():

load(file = "data/zufriedenheit_all.Rda")