R Tutorials | R Exercises – 31-40 – Data Frame Manipulations

R Exercises – 31-40 – Data Frame Manipulations

R Tutorials
11 Jul, 2016
Exercise Database
dataframe, dataset, diamonds, exercise, ggplot2, iris, mtcars, r base

1. Working with the ‘mtcars’ dataset

a. Get a histogram of the ‘mpg’ values of ‘mtcars’. Which bin contains the most observations?

b. Are there more automatic (0) or manual (1) transmission-type cars in the dataset? Hint: ‘mtcars’ has 32 observations.

c. Get a scatter plot of ‘hp’ vs ‘weight’.

Reveal solution

a.
hist(mtcars$mpg)
Bin 15-20

sum(mtcars$am == 1)

sum(mtcars$am == 0)

There are more automatic (0) cars.

c.

plot(mtcars$wt, mtcars$hp, col = "red", pch = 3,

bty = "l", xlab = "Car Weight in 1K lbs", ylab = "Horsepower" )

2. Working with the ‘iris’ dataset

a. Get all rows of Species ‘versicolor’ in a new data frame. Call this data frame: ‘iris.vers’

#expected result

   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
51          7.0         3.2          4.7         1.4 versicolor
52          6.4         3.2          4.5         1.5 versicolor
53          6.9         3.1          4.9         1.5 versicolor
54          5.5         2.3          4.0         1.3 versicolor
55          6.5         2.8          4.6         1.5 versicolor
56          5.7         2.8          4.5         1.3 versicolor
57          6.3         3.3          4.7         1.6 versicolor
58          4.9         2.4          3.3         1.0 versicolor
59          6.6         2.9          4.6         1.3 versicolor
60          5.2         2.7          3.9         1.4 versicolor

b. Get a vector called ‘sepal.dif’ with the difference between ‘Sepal.Length’ and ‘Sepal.Width’ of ‘versicolor’ plants.

#expected result

[1]  3.8 3.2 3.8 3.2 3.7 2.9 3.0 2.5 3.7 2.5 3.0 2.9 3.8 3.2 2.7 3.6 2.6 3.1 4.0
[20] 3.1 2.7 3.3 3.8 3.3 3.5 3.6 4.0 3.7 3.1 3.1 3.1 3.1 3.1 3.3 2.4 2.6 3.6 4.0
[39] 2.6 3.0 2.9 3.1 3.2 2.7 2.9 2.7 2.8 3.3 2.6 2.9

c. Update (add) ‘iris.vers’ with the new column ‘sepal.dif’.

#expected result

   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species sepal.dif
51          7.0         3.2          4.7         1.4 versicolor       3.8
52          6.4         3.2          4.5         1.5 versicolor       3.2
53          6.9         3.1          4.9         1.5 versicolor       3.8
54          5.5         2.3          4.0         1.3 versicolor       3.2
55          6.5         2.8          4.6         1.5 versicolor       3.7
56          5.7         2.8          4.5         1.3 versicolor       2.9

a.
iris.vers = subset(iris, Species == "versicolor"); iris.vers

b.
sepal.dif = iris.vers$Sepal.Length - iris.vers$Sepal.Width; sepal.dif

c.
iris.vers = data.frame(iris.vers, sepal.dif); head(iris.vers)

3. Classes of Variables (‘mtcars’)

a. Check the class of each variable in ‘mtcars’.

#expected result

      mpg       cyl      disp        hp      drat        wt      qsec        vs 
"numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" 

       am      gear      carb 
"numeric" "numeric" "numeric"

b. Change ‘am’, ‘cyl’ and ‘vs’ to integer and store the new dataset as ‘newmtc’.

#expected result

       mpg as.integer.cyl.     disp         hp      drat 
 "numeric"       "integer" "numeric" "numeric" "numeric" 

        wt      qsec as.integer.vs. as.integer.am.      gear 
 "numeric" "numeric"      "integer"      "integer" "numeric" 

      carb 
 "numeric"

c. Round the ‘newmtc’ data frame to one digit.

#expected result

    mpg as.integer.cyl.  disp  hp drat  wt qsec as.integer.vs. as.integer.am. gear carb
1  21.0               6 160.0 110  3.9 2.6 16.5              0              1    4    4
2  21.0               6 160.0 110  3.9 2.9 17.0              0              1    4    4
3  22.8               4 108.0  93  3.8 2.3 18.6              1              1    4    1
4  21.4               6 258.0 110  3.1 3.2 19.4              1              0    3    1
5  18.7               8 360.0 175  3.1 3.4 17.0              0              0    3    2
6  18.1               6 225.0 105  2.8 3.5 20.2              1              0    3    1
7  14.3               8 360.0 245  3.2 3.6 15.8              0              0    3    4
8  24.4               4 146.7  62  3.7 3.2 20.0              1              0    4    2
9  22.8               4 140.8  95  3.9 3.1 22.9              1              0    4    2
10 19.2               6 167.6 123  3.9 3.4 18.3              1              0    4    4

Reveal solution

head(mtcars)

sapply(mtcars, class)

attach(mtcars)

newmtc = data.frame(mpg, as.integer(cyl), disp, hp, drat, wt, qsec,

as.integer(vs), as.integer(am), gear, carb)

sapply(newmtc, class)

c.
round(newmtc, 1)

4. Advanced filtering of ‘iris’

a. Use ‘dplyr’ to filter for all data of Species ‘virginica’ with a ‘Sepal.Width’ of greater than 3.5.

#expected result

  Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
1          7.2         3.6          6.1         2.5 virginica
2          7.7         3.8          6.7         2.2 virginica
3          7.9         3.8          6.4         2.0 virginica

b. How would you use R Base to get a data frame of all data of Species ‘virginica’ with a ‘Sepal.Width’ of greater than 3.5, but without the last column Species in the data frame?

#expected result

    Sepal.Length Sepal.Width Petal.Length Petal.Width
110          7.2         3.6          6.1         2.5
118          7.7         3.8          6.7         2.2
132          7.9         3.8          6.4         2.0

c. Get the row IDs of the rows matching the two filtering criteria provided above.

#expected result

[1] "110" "118" "132"

Reveal solution

library(dplyr)

filter(iris, Sepal.Width > 3.5, Species =="virginica")

b.
iris[iris$Species == "virginica" & iris$Sepal.Width > 3.5, 1:4]

c.
row.names(iris[iris$Species == "virginica" & iris$Sepal.Width > 3.5, 1:4])

5. Manipulating data frames at column level (dataset = ‘iris’)

a. Repeat each value of ‘Sepal.Length’ two times and repeat the whole sequence two times as well.

#expected result

[1]   5.1 5.1 4.9 4.9 4.7 4.7 4.6 4.6 5.0 5.0 5.4 5.4 4.6 4.6 5.0 5.0 4.4 4.4 4.9 4.9 5.4 5.4 4.8
[24]  4.8 4.8 4.8 4.3 4.3 5.8 5.8 5.7 5.7 5.4 5.4 5.1 5.1 5.7 5.7 5.1 5.1 5.4 5.4 5.1 5.1 4.6 4.6
[47]  5.1 5.1 4.8 4.8 5.0 5.0 5.0 5.0 5.2 5.2 5.2 5.2 4.7 4.7 4.8 4.8 5.4 5.4 5.2 5.2 5.5 5.5 4.9
[70]  4.9 5.0 5.0 5.5 5.5 4.9 4.9 4.4 4.4 5.1 5.1 5.0 5.0 4.5 4.5 4.4 4.4 5.0 5.0 5.1 5.1 4.8 4.8
[93]  5.1 5.1 4.6 4.6 5.3 5.3 5.0 5.0 7.0 7.0 6.4 6.4 6.9 6.9 5.5 5.5 6.5 6.5 5.7 5.7 6.3 6.3 4.9
[116] 4.9 6.6 6.6 5.2 5.2 5.0 5.0 5.9 5.9 6.0 6.0 6.1 6.1 5.6 5.6 6.7 6.7 5.6 5.6 5.8 5.8 6.2 6.2
[139] 5.6 5.6 5.9 5.9 6.1 6.1 6.3 6.3 6.1 6.1 6.4 6.4 6.6 6.6 6.8 6.8 6.7 6.7 6.0 6.0 5.7 5.7 5.5
[162] 5.5 5.5 5.5 5.8 5.8 6.0 6.0 5.4 5.4 6.0 6.0 6.7 6.7 6.3 6.3 5.6 5.6 5.5 5.5 5.5 5.5 6.1 6.1
[185] 5.8 5.8 5.0 5.0 5.6 5.6 5.7 5.7 5.7 5.7 6.2 6.2 5.1 5.1 5.7 5.7 6.3 6.3 5.8 5.8 7.1 7.1 6.3
[208] 6.3 6.5 6.5 7.6 7.6 4.9 4.9 7.3 7.3 6.7 6.7 7.2 7.2 6.5 6.5 6.4 6.4 6.8 6.8 5.7 5.7 5.8 5.8
[231] 6.4 6.4 6.5 6.5 7.7 7.7 7.7 7.7 6.0 6.0 6.9 6.9 5.6 5.6 7.7 7.7 6.3 6.3 6.7 6.7 7.2 7.2 6.2
[254] 6.2 6.1 6.1 6.4 6.4 7.2 7.2 7.4 7.4 7.9 7.9 6.4 6.4 6.3 6.3 6.1 6.1 7.7 7.7 6.3 6.3 6.4 6.4
[277] 6.0 6.0 6.9 6.9 6.7 6.7 6.9 6.9 5.8 5.8 6.8 6.8 6.7 6.7 6.7 6.7 6.3 6.3 6.5 6.5 6.2 6.2 5.9
[300] 5.9 5.1 5.1 4.9 4.9 4.7 4.7 4.6 4.6 5.0 5.0 5.4 5.4 4.6 4.6 5.0 5.0 4.4 4.4 4.9 4.9 5.4 5.4
[323] 4.8 4.8 4.8 4.8 4.3 4.3 5.8 5.8 5.7 5.7 5.4 5.4 5.1 5.1 5.7 5.7 5.1 5.1 5.4 5.4 5.1 5.1 4.6
[346] 4.6 5.1 5.1 4.8 4.8 5.0 5.0 5.0 5.0 5.2 5.2 5.2 5.2 4.7 4.7 4.8 4.8 5.4 5.4 5.2 5.2 5.5 5.5
[369] 4.9 4.9 5.0 5.0 5.5 5.5 4.9 4.9 4.4 4.4 5.1 5.1 5.0 5.0 4.5 4.5 4.4 4.4 5.0 5.0 5.1 5.1 4.8
[392] 4.8 5.1 5.1 4.6 4.6 5.3 5.3 5.0 5.0 7.0 7.0 6.4 6.4 6.9 6.9 5.5 5.5 6.5 6.5 5.7 5.7 6.3 6.3
[415] 4.9 4.9 6.6 6.6 5.2 5.2 5.0 5.0 5.9 5.9 6.0 6.0 6.1 6.1 5.6 5.6 6.7 6.7 5.6 5.6 5.8 5.8 6.2
[438] 6.2 5.6 5.6 5.9 5.9 6.1 6.1 6.3 6.3 6.1 6.1 6.4 6.4 6.6 6.6 6.8 6.8 6.7 6.7 6.0 6.0 5.7 5.7
[461] 5.5 5.5 5.5 5.5 5.8 5.8 6.0 6.0 5.4 5.4 6.0 6.0 6.7 6.7 6.3 6.3 5.6 5.6 5.5 5.5 5.5 5.5 6.1
[484] 6.1 5.8 5.8 5.0 5.0 5.6 5.6 5.7 5.7 5.7 5.7 6.2 6.2 5.1 5.1 5.7 5.7 6.3 6.3 5.8 5.8 7.1 7.1
[507] 6.3 6.3 6.5 6.5 7.6 7.6 4.9 4.9 7.3 7.3 6.7 6.7 7.2 7.2 6.5 6.5 6.4 6.4 6.8 6.8 5.7 5.7 5.8
[530] 5.8 6.4 6.4 6.5 6.5 7.7 7.7 7.7 7.7 6.0 6.0 6.9 6.9 5.6 5.6 7.7 7.7 6.3 6.3 6.7 6.7 7.2 7.2
[553] 6.2 6.2 6.1 6.1 6.4 6.4 7.2 7.2 7.4 7.4 7.9 7.9 6.4 6.4 6.3 6.3 6.1 6.1 7.7 7.7 6.3 6.3 6.4
[576] 6.4 6.0 6.0 6.9 6.9 6.7 6.7 6.9 6.9 5.8 5.8 6.8 6.8 6.7 6.7 6.7 6.7 6.3 6.3 6.5 6.5 6.2 6.2
[599] 5.9 5.9

b. Get a new object which contains only the odd values of ‘Sepal.Length’.

#expected result

[1]  5.1 4.7 5.0 4.6 4.4 5.4 4.8 5.8 5.4 5.7 5.4 4.6 4.8 5.0 5.2 4.8 5.2 4.9 5.5 4.4 5.0 4.4 5.1
[24] 5.1 5.3 7.0 6.9 6.5 6.3 6.6 5.0 6.0 5.6 5.6 6.2 5.9 6.3 6.4 6.8 6.0 5.5 5.8 5.4 6.7 5.6 5.5
[47] 5.8 5.6 5.7 5.1 6.3 7.1 6.5 4.9 6.7 6.5 6.8 5.8 6.5 7.7 6.9 7.7 6.7 6.2 6.4 7.4 6.4 6.1 6.3
[70] 6.0 6.7 5.8 6.7 6.3 6.2

c. Get a new object which repeats each value from the new vector of exercise b.

#expected result

[1]   5.1 5.1 4.7 4.7 5.0 5.0 4.6 4.6 4.4 4.4 5.4 5.4 4.8 4.8 5.8 5.8 5.4 5.4 5.7 5.7 5.4 5.4 4.6
[24]  4.6 4.8 4.8 5.0 5.0 5.2 5.2 4.8 4.8 5.2 5.2 4.9 4.9 5.5 5.5 4.4 4.4 5.0 5.0 4.4 4.4 5.1 5.1
[47]  5.1 5.1 5.3 5.3 7.0 7.0 6.9 6.9 6.5 6.5 6.3 6.3 6.6 6.6 5.0 5.0 6.0 6.0 5.6 5.6 5.6 5.6 6.2
[70]  6.2 5.9 5.9 6.3 6.3 6.4 6.4 6.8 6.8 6.0 6.0 5.5 5.5 5.8 5.8 5.4 5.4 6.7 6.7 5.6 5.6 5.5 5.5
[93]  5.8 5.8 5.6 5.6 5.7 5.7 5.1 5.1 6.3 6.3 7.1 7.1 6.5 6.5 4.9 4.9 6.7 6.7 6.5 6.5 6.8 6.8 5.8
[116] 5.8 6.5 6.5 7.7 7.7 6.9 6.9 7.7 7.7 6.7 6.7 6.2 6.2 6.4 6.4 7.4 7.4 6.4 6.4 6.1 6.1 6.3 6.3
[139] 6.0 6.0 6.7 6.7 5.8 5.8 6.7 6.7 6.3 6.3 6.2 6.2

d. Replace the ‘Sepal.Length’ column of ‘iris’ with the new ‘Sepal.Length’ from exercise c. Check if the replacement worked.

#expected result

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1           5.1         3.5          1.4         0.2  setosa
2           5.1         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.7         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
6           5.0         3.9          1.7         0.4  setosa

Reveal solution

a.
rep(iris$Sepal.Length, each = 2, times = 2)

b.
sep.lengthodd = iris[c(T,F),1]

c.
newsep.length = rep(sep.lengthodd, each = 2)

d.
iris$Sepal.Length = newsep.length; head(iris)

6. Big Data Example with ‘diamonds’ (package: ‘ggplot2’)

a. Get familiar with the dataset ‘diamonds’ from ‘ggplot2’.

b. Attach the dataset.

c. Get a subset from the diamonds dataset with all the rows that have a clarity of ‘SI2’ and a depth of at least 70. Call the subset ‘diam.sd’ and display it in the same line of code.

#expected result

# A tibble: 6 x 10
   carat   cut color clarity depth table price     x     y     z
   <dbl> <ord> <ord>   <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1   1.50  Fair     I     SI2  70.1    58  4328  6.96  6.85  4.84
2   2.00  Fair     F     SI2  70.2    57 15351  7.63  7.59  5.34
3   0.70  Fair     D     SI2  71.6    55  1696  5.47  5.28  3.85
4   0.70  Fair     E     SI2  70.6    56  1828  5.45  5.35  3.81
5   1.00  Fair     G     SI2  70.2    58  2326  6.00  5.73  4.13
6   0.96  Fair     G     SI2  72.2    56  2438  6.01  5.81  4.28

d. Which index positions have a clarity of ‘SI2’ and a depth of at least 70? (hint: ‘row.names’)

#expected result

[1] "1" "2" "3" "4" "5" "6"

e. Store the index positions as an integer object.

#expected result

[1] 1 2 3 4 5 6

Reveal solution

library(ggplot2)

? diamonds

summary(diamonds)

b.
attach(diamonds)

c.
diam.sd = subset(diamonds, clarity == "SI2" & depth >= 70); diam.sd

d.
row.names(diam.sd)

e.
index.pos = as.integer(row.names(diam.sd)); index.pos

7. Getting counts of filtered datasets (‘ggplot2’ : ‘diamonds’)

a. How many observations of diamonds have a cut of ‘ideal’ and have less than 0.21 carat?

b. How many observations of diamonds have a combined ‘x’ + ‘y’ + ‘z’ dimension greater than 40?

c. How many observations of diamonds have either a price above 10.000 USD or a depth of at least 70?

Reveal solution

a.
sum(cut == "Ideal" & carat < 0.21)
'diamonds' is attached to environment

b.
sum ((x + y + z) > 40)

c.
sum(price > 10000 | depth >= 70)

8. Filtering based on row and column ID (‘ggplot2’: ‘diamonds’)

a. Get a data frame with observations ’67’ and ‘982’ of variables color and y.

#expected result

# A tibble: 2 x 2
  color     y
  <ord> <dbl>
1     I  4.42
2     F  5.76

b. Get a data frame with the full info on observations ‘453’, ‘792’ and ‘10489’.

#expected result

# A tibble: 3 x 10
   carat       cut color clarity depth table price     x     y     z
   <dbl>     <ord> <ord>   <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1   0.71     Ideal     I    VVS2  60.2    56  2817  5.84  5.89  3.53
2   0.70 Very Good     D     SI1  62.5    55  2862  5.67  5.72  3.56
3   1.01     Ideal     F     SI2  60.0    57  4796  6.55  6.51  3.92

c. Get the first 10 rows of the dataset ‘diamonds’ with the variables ‘x’, ‘y’, ‘z’.

#expected result

# A tibble: 10 x 3
      x     y     z
  <dbl> <dbl> <dbl>
1  3.95  3.98  2.43
2  3.89  3.84  2.31
3  4.05  4.07  2.31
4  4.20  4.23  2.63
5  4.34  4.35  2.75
6  3.94  3.96  2.48
7  3.95  3.98  2.47
8  4.07  4.11  2.53
9  3.87  3.78  2.49
10 4.00  4.05  2.39

d. Get the first six values of ‘diamonds’ of the variable ‘y’ as a simple vector.

#expected result

# A tibble: 6 x 1
      y
  <dbl>
1  3.98
2  3.84
3  4.07
4  4.23
5  4.35
6  3.96

Reveal solution

a.
diamonds[c(67,982), c(3,9)]

b.
diamonds[c(453, 792, 10489), ]

c.
head(diamonds[ , c(8,9,10)],10)

d.
head(diamonds[,"y"])
different output format: 'vector'

9. Ordering data (‘ggplot2’: ‘diamonds’)

a. Create the object ‘newdiam’ which is a subset of the first 1000 rows of ‘diamonds’.

#expected result

# A tibble: 1,000 x 10
  carat       cut color clarity depth table price     x     y     z
  <dbl>     <ord> <ord>   <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1  0.23     Ideal     E     SI2  61.5    55   326  3.95  3.98  2.43
2  0.21   Premium     E     SI1  59.8    61   326  3.89  3.84  2.31
3  0.23      Good     E     VS1  56.9    65   327  4.05  4.07  2.31
4  0.29   Premium     I     VS2  62.4    58   334  4.20  4.23  2.63
5  0.31      Good     J     SI2  63.3    58   335  4.34  4.35  2.75
6  0.24 Very Good     J    VVS2  62.8    57   336  3.94  3.96  2.48
7  0.24 Very Good     I    VVS1  62.3    57   336  3.95  3.98  2.47

b. Order ‘newdiam’ according to price, starting with the lowest. Hint: ‘dplyr’, ‘arrange’ is a useful function for that.

#expected result

# A tibble: 6 x 10
  carat       cut color clarity depth table price     x     y     z
  <dbl>     <ord> <ord>   <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1  0.23     Ideal     E     SI2  61.5    55   326  3.95  3.98  2.43
2  0.21   Premium     E     SI1  59.8    61   326  3.89  3.84  2.31
3  0.23      Good     E     VS1  56.9    65   327  4.05  4.07  2.31
4  0.29   Premium     I     VS2  62.4    58   334  4.20  4.23  2.63
5  0.31      Good     J     SI2  63.3    58   335  4.34  4.35  2.75
6  0.24 Very Good     J    VVS2  62.8    57   336  3.94  3.96  2.48

c. Order ‘newdiam’ according to price, starting with the highest.

#expected result

# A tibble: 6 x 10
  carat       cut color clarity depth table price     x     y     z
  <dbl>     <ord> <ord>   <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1  1.12   Premium     J     SI2  60.6    59  2898  6.68  6.61  4.03
2  0.60 Very Good     D    VVS2  60.6    57  2897  5.48  5.51  3.33
3  0.76   Premium     E     SI1  61.1    58  2897  5.91  5.85  3.59
4  0.54     Ideal     D    VVS2  61.4    52  2897  5.30  5.34  3.26
5  0.72     Ideal     E     SI1  62.5    55  2897  5.69  5.74  3.57
6  0.72      Good     F     VS1  59.4    61  2897  5.82  5.89  3.48

d. Order ‘newdiam’ like in exercise c, but ties should be arranged with increasing depth.

#expected result

# A tibble: 6 x 10
  carat       cut color clarity depth table price     x     y     z
  <dbl>     <ord> <ord>   <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1  1.12   Premium     J     SI2  60.6    59  2898  6.68  6.61  4.03
2  0.72      Good     F     VS1  59.4    61  2897  5.82  5.89  3.48
3  0.60 Very Good     D    VVS2  60.6    57  2897  5.48  5.51  3.33
4  0.76   Premium     E     SI1  61.1    58  2897  5.91  5.85  3.59
5  0.54     Ideal     D    VVS2  61.4    52  2897  5.30  5.34  3.26
6  0.74   Premium     D     VS2  61.8    58  2897  5.81  5.77  3.58

Reveal solution

a.
newdiam = diamonds[1:1000,]

b.

library(dplyr)

head(arrange(newdiam, price))

c.
head(arrange(newdiam, desc(price)))

d.
head(arrange(newdiam, desc(price), depth))

10. Sampling big data frames (‘ggplot2’: ‘diamonds’)

a. Use ‘dplyr’, ‘sample_n’ to get the object ‘diam750’ which contains 750 randomly sampled observations of ‘diamonds’. I use seed nr. 56 for reproduction.

#expected result

# A tibble: 750 x 10
  carat       cut color clarity depth table price     x     y     z
  <dbl>     <ord> <ord>   <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1  1.50     Ideal     I     VS1  61.3    57  9900  7.32  7.35  4.50
2  0.33     Ideal     E    VVS1  60.2    57  1052  4.50  4.47  2.70
3  1.02   Premium     F     VS2  62.4    59  6652  6.40  6.45  4.01
4  0.41   Premium     F     VS1  62.6    58  1076  4.75  4.74  2.97
5  1.08   Premium     F     VS2  62.1    59  7637  6.59  6.54  4.08
6  1.01   Premium     I     SI2  62.6    55  3818  6.43  6.38  4.01
7  1.05 Very Good     G     SI1  59.7    58  5019  6.60  6.66  3.96
8  1.00     Ideal     F     SI2  61.2    57  4452  6.48  6.39  3.94
9  0.55   Premium     H     SI2  59.8    59  1011  5.33  5.30  3.18
10 2.51      Fair     H     SI2  64.7    57 18308  8.44  8.50  5.48
# ... with 740 more rows

b. Get a summary of the new data frame.

#expected result

     carat               cut    color     clarity      depth           table 
 Min.   :0.2300      Fair : 25  D: 73    SI1  :171  Min.   :56.70  Min.   :53.00 
 1st Qu.:0.4000      Good : 70  E:144    VS2  :154  1st Qu.:61.00  1st Qu.:56.00 
 Median :0.7050  Very Good:182  F:128    SI2  :141  Median :61.80  Median :57.00 
 Mean   :0.8192   Premium :187  G:169    VS1  :120  Mean   :61.77  Mean   :57.51 
 3rd Qu.:1.0500     Ideal :286  H:127    VVS2 : 72  3rd Qu.:62.60  3rd Qu.:59.00 
 Max.   :4.1300                 I: 65    VVS1 : 56  Max.   :69.70  Max.   :67.00 
                                J: 44  (Other): 36 
 
       price             x               y              z 
 Min.   :  361.0  Min.   : 3.910  Min.   :3.960  Min.   :0.000 
 1st Qu.:  982.2  1st Qu.: 4.700  1st Qu.:4.710  1st Qu.:2.913 
 Median : 2385.5  Median : 5.700  Median :5.705  Median :3.535 
 Mean   : 4066.9  Mean   : 5.765  Mean   :5.766  Mean   :3.556 
 3rd Qu.: 5400.8  3rd Qu.: 6.548  3rd Qu.:6.555  3rd Qu.:4.050 
 Max.   :18804.0  Max.   :10.000  Max.   :9.850  Max.   :6.430

c. Plot a scatter plot of price vs depth. Use R Base plot, and the function ‘with’ (less code).

Reveal solution

library(dplyr)

set.seed(56)

diam750 = sample_n(diamonds, 750)

b.
summary(diam750)

c.
with(diam750, plot(price, depth))

R Tutorials Training