I don’t know how to handle this Programming question and need guidance.
Lab 2
## v ggplot2_x000D_ ## v tibble_x000D_ ## v tidyr_x000D_ ## v readr_x000D_
3.3.2 v purrr 0.3.4 3.0.3 v dplyr 1.0.2 1.1.2 v stringr 1.4.0 1.3.1 v forcats 0.5.0
Lab 2
Your Name Here Date Here
This lab will explore some aspects of the tidyverse library in R. Below are three data sets that have already been loaded in using the read_csv() function from tidyr!!. NOTE: if you get an error loading in the data you need to either: 1) make sure your lab_r.Rmd file is in the same folder as the data files, or 2) set the file paths correctly inside the read_csv() functions.
For this lab, each question has pieces of code that need to be filled in. You will need to read the comments following the code to determine what you need to do. Because the code below is incomplete, the lab will not compile (knit) until it is complete. I suggest you work in a separate RStudio script and copy you code into the lab once it is complete.
library(tidyverse)
## — Attaching packages ——————————————————————————
## — Conflicts ————————————————————————————— ## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
Question 1
Let’s assume you need to take data from the GPS Training Data and make a plot comparing the players average distance, maximum velocity, and total load over the duration of the training session. Decomposing this into smaller steps, we will need to:
1. Group information by players,
2. Summarize the pieces of information by player, 3. Reorganize the data so it can be plotted, and 4. Plot the data.
Below is a skeleton of the code that follows these four steps. You need to fill in the code with the correct arguments. Be sure to read the comments, not everything has to be filled in, just pieces.
match <- read_csv(“GPS_Match_Data_deidentified.csv”, col_types = cols()) train <- read_csv(“GPS_Training_Data_deidentified.csv”, col_types = cols()) wellness <- read_csv(“Wellness_Data_deidentified.csv”, col_types = cols())
1
–
–
# this creates a new data frame, player_df, that has average distance, maximum velocity, a player_df = train %>% # select the data set you wish to pull information from
group_by() %>% # FILL IN, this should choose your grouping variable
summarise(Dist_mean = mean(), Total_load = sum(), Max_Vel = max()) %>% # FILL IN, this s rename(Player_Name = ) # relabel the Player Name column to a more usable name. This is N
# this reorganized the data frame, player_df, from wide format to long format. We do this player_df = player_df %>% # select the data set you wish to manipulat
pivot_longer(-Player_Name, names_to = “Measure”, values_to = “Val”) # pivots the df from # you will need to c
# This plots the data
ggplot(player_df, aes(x = , y = log(), fill = Measure, width = 0.5)) + # FILL IN, this wil
geom_bar(position = ‘dodge’, stat = ‘identity’) +
xlab(”) + # FILL IN, this will label your x axis
ylab(”) + # FILL IN, this will label your y axis
ggtitle(”) + # FILL IN, this will title your plot
scale_fill_manual(name = ‘Measure’, labels = c(‘Mean Distance’, ‘Max Velocity’, ‘Total L
Answer
Question 2
For this question, we want to visualize fatigue and stress, from the wellness data, by day when grouping players by position. As a general outline, we will:
1. Ensure the Timestamp column is properly coded as a date-time variable, 2. Group by date and position,
3. Summarize the fatigue and stress by position and time,
4. Reshape the data from wide to long format (for plotting), and
5. Plot the data, making sure to label everything correctly.
To handle date-time variables with tidyverse, we will need to use the library lubridate. Below are two lines to install, and load in, the lubriate library. All code to handle date-time variables has been supplied for this homework – you only need to run and understand what the code does. NOTE: be sure to only run the install.packages() line once – either comment it out or delete it after you run it.
##_x000D_ ## Attaching package: 'lubridate'_x000D_
## The following objects are masked from 'package:base':_x000D_ ##_x000D_ ## date, intersect, setdiff, union_x000D_
Below is a skeleton of the code that follows these five steps. You need to fill in the code with the correct arguments. Be sure to read the comments, not everything has to be filled in, just pieces.
2
nd total load
hould summariz_x000D_ OT required, b_x000D_
because ggplot_x000D_
wide to long hange ‘Player_
l be the x and
oad'), values_x000D_
# install.packages(‘lubridate‘)
library(lubridate)
# create a new df called well with the correct summarized variables well = wellness %>% # select wellness data
mutate(Timestamp = mdy(str_split(Timestamp, pattern = ” “, simplify = T)[,1])) %>% # jus group_by(,) %>% # FILL IN, this should choose the two variable to group by summarise(Fatigue = mean(), Stress = mean()) # FILL IN, this should calculate the summar
# run this to see what well looks like right now_x000D_ # well_x000D_
# change from wide to long_x000D_
well_df = well %>%
pivot_longer(-c(Position, Timestamp), names_to = “Measure”, values_to = “Value”)
- # run this to see what well_df looks like right now
- # well_df # plot the data
ggplot(well_df, aes(x = , y = )) + # FILL IN, this should input the correct x and y geom_line(aes(color = Position, linetype = Measure), size = 1) + # this makes it a line scale_x_date(date_breaks = ‘day’, date_labels = “%b %d”) + # this makes the x axis show ylim(c(2, 5)) + # this sets the limits on the y axis
xlab(”) + # label x if needed ylab(”) + # label y if needed
ggtitle(”) + # title if needed
theme(axis.text.x = element_text(angle = 45, vjust = 0.5, size = 10), # this is just for axis.text.y = element_text(size = 10))
Answer
Question 3
For the last question, we want to create violin plots of the log velocity total distance by position from the GPS_Match_Data_deidentified.csv data set. We also want to include the 25, 50, and 75 percent quantiles as horizontal bars within each group. The general outline is:
1. Choose the correct data set,
2. Select the columns Postion Name and Velocity Band 2 Total Distance through Velocity Band 8
Total Distance,
3. Manipulate the data from wide format to long, making sure NOT to pivot the Postion Name column
(i.e., do -Position Name like we did in Q1 adn Q2), and 4. Plot the data, making sure to plot on the log scale.
Below is a skeleton of the code that follows these four steps. You need to fill in the code with the correct arguments. Be sure to read the comments, not everything has to be filled in, just pieces.
3
t run this, th_x000D_ y variables_x000D_
plot, colors t_x000D_ up as Month Da_x000D_
matting, make_x000D_
on, as well as
# create the data set for plotting
position_velocity = %>% # FILL IN, select the correct data
select() %>% # FILL IN, select the correct columns
pivot_longer(, names_to = , values_to = ) # FILL IN, select the correct column to pivot
ggplot(position_velocity, aes(x = , y = )) + # FILL IN, select the x and y values geom_violin(draw_quantiles = c(0.25, 0.5, 0.75)) + # this adds horizontal bars as quanti theme(axis.text.x = element_text(angle = 45, vjust = 0.5, size = 10)) # rotates the x ax
Answer
4
les
is text