10:00
10:00
R has a vast ecosystem of packages that add new functions. Any installed package can be loaded with the library()
function.
Most data you will not be creating by hand. You will either be
Loading it in from a separate file.
Loading it from within an R package (most of our are in stat20data
)
To load data from a package,
library()
View(<df name>)
.# A tibble: 333 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<fct> <fct> <dbl> <dbl> <int> <int>
1 Adelie Torgersen 39.1 18.7 181 3750
2 Adelie Torgersen 39.5 17.4 186 3800
3 Adelie Torgersen 40.3 18 195 3250
4 Adelie Torgersen 36.7 19.3 193 3450
5 Adelie Torgersen 39.3 20.6 190 3650
6 Adelie Torgersen 38.9 17.8 181 3625
7 Adelie Torgersen 39.2 19.6 195 4675
8 Adelie Torgersen 41.1 17.6 182 3200
9 Adelie Torgersen 38.6 21.2 191 3800
10 Adelie Torgersen 34.6 21.1 198 4400
# ℹ 323 more rows
# ℹ 2 more variables: sex <fct>, year <int>
tidyverse
The tidyverse
package contains several functions used to manipulate data frames:
select()
: subset columnsarrange()
: sort rowsmutate()
: create a new column from existing column(s)# A tibble: 333 × 2
species island
<fct> <fct>
1 Adelie Torgersen
2 Adelie Torgersen
3 Adelie Torgersen
4 Adelie Torgersen
5 Adelie Torgersen
6 Adelie Torgersen
7 Adelie Torgersen
8 Adelie Torgersen
9 Adelie Torgersen
10 Adelie Torgersen
# ℹ 323 more rows
# A tibble: 333 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<fct> <fct> <dbl> <dbl> <int> <int>
1 Adelie Dream 32.1 15.5 188 3050
2 Adelie Dream 33.1 16.1 178 2900
3 Adelie Torgersen 33.5 19 190 3600
4 Adelie Dream 34 17.1 185 3400
5 Adelie Torgersen 34.4 18.4 184 3325
6 Adelie Biscoe 34.5 18.1 187 2900
7 Adelie Torgersen 34.6 21.1 198 4400
8 Adelie Torgersen 34.6 17.2 189 3200
9 Adelie Biscoe 35 17.9 190 3450
10 Adelie Biscoe 35 17.9 192 3725
# ℹ 323 more rows
# ℹ 2 more variables: sex <fct>, year <int>
You can sort in descending order by wrapping the variable name in
desc()
.
# A tibble: 333 × 9
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<fct> <fct> <dbl> <dbl> <int> <int>
1 Adelie Torgersen 39.1 18.7 181 3750
2 Adelie Torgersen 39.5 17.4 186 3800
3 Adelie Torgersen 40.3 18 195 3250
4 Adelie Torgersen 36.7 19.3 193 3450
5 Adelie Torgersen 39.3 20.6 190 3650
6 Adelie Torgersen 38.9 17.8 181 3625
7 Adelie Torgersen 39.2 19.6 195 4675
8 Adelie Torgersen 41.1 17.6 182 3200
9 Adelie Torgersen 38.6 21.2 191 3800
10 Adelie Torgersen 34.6 21.1 198 4400
# ℹ 323 more rows
# ℹ 3 more variables: sex <fct>, year <int>, bill_index <dbl>
Remember that you can nest functions.
# A tibble: 333 × 1
bill_index
<dbl>
1 57.8
2 56.9
3 58.3
4 56
5 59.9
6 56.7
7 58.8
8 58.7
9 59.8
10 55.7
# ℹ 323 more rows
There is a built-in data set to R called mtcars
that has information on cars that appeared in Motor Trend magazine. It’s already loaded and can be accessed as mtcars
.
Create a slimmer data frame that only contains the columns hp
and wt
and save it to mtcars_slim
.
Create a new column called power_to_weight
that is the ratio of hp
to wt
. Save the three-column data frame back over mtcars_slim
.
Sort the data frame in descending order by the power-to-weight ratio.
Hint: look up help files!
08:00
Break
05:00
The table below displays data from a survey on a class of students.
What proportion of the class was in the marching band?
00:30
What proportion of those in the marching band were juniors?
00:30
What proportion were sophomores who were not in the marching band?
00:30
What were the dimensions of the raw data from which this table was constructed?
00:30
How would you characterize the association between these two variables?
00:30
Political affiliation and college degree status of 500 survey participants.
Which group is the largest?
01:00
What does this plot show?
00:30
A template for a line plot:
Where:
DATAFRAME
is the name of your data frameXVARIABLE
is the name of the variable of that data frame that you want on the x-axisYVARIABLE
is the name of the variable of that data frmae that you want on the y-axis20:00