::install_github("brooke-watson/BRRR") remotes
Data Wrangling
Due: September 06 by 11:59pm
Weight: This assignment is worth 3% of your final grade.
Purpose: The purpose of this assignment is to get more familiar with R and RStudio and to develop some basic strategies for working with data in R.
Assessment: This assignment is graded using a check system:
- ✔+ (110%): Responses shows phenomenal thought and engagement with the course content. I will not assign these often.
- ✔ (100%): Responses are thoughtful, well-written, and show engagement with the course content. This is the expected level of performance.
- ✔− (50%): Responses are hastily composed, too short, and/or only cursorily engages with the course content. This grade signals that you need to improve next time. I will hopefully not assign these often.
Notice that this is essentially a pass/fail system. I’m not grading your writing ability and I’m not counting the number of words you write - I’m looking for thoughtful engagement. One or two sentences is not enough. Write at least a paragraph and show me that you did the readings assigned.
1. Software
If you haven’t yet, go to the Course Software page and install all the software we’ll need for this course. You’ll need these tools for this assignment.
2. Getting Organized
Download and edit this template when working through this assignment. This is for now mostly a blank file that you can use to jot down examples and play with code.
3. Readings
Open up a notebook (physical, digital…whatever you take notes in best), and take notes while you go through these readings:
- Getting Familiar with the Course: Follow Snoop’s advice and read the entire Course Syllabus (actually read the whole thing). Then review the schedule and make sure to note important upcoming deadlines.
- Basics [Optional] Read through Lessons 1 “Getting Started” and 2 “Data Types & Vectors” in the R4A Primer to get more familiar with basics.
- Data Frames & Data Wrangling Reading through Lessons 3 “Data Frames” and 4 “Data Wrangling” in the R4A Primer to get more familiar with working with data sets in .
4. Exercises
RStudio offers many excellent primers to get up and running quickly in . Running through these exercises will help prepare you for class next week:
5. Reflect
Reflect on what you’ve learned while going through these readings and exercises. Is there anything that jumped out at you? Anything you found particularly interesting or confusing? Write at least a paragraph in your hw1.R
file.
6. Submit
To submit this assignment, create a zip file of all the files in your R project folder for this assignment. Name the zip file hw1-netID.zip
, replacing netID
with your netID (e.g., hw1-jph.zip
). Then copy that zip file into the “submissions” folder in your Box folder created for this class.
Extra Practice
Not required, but probably helpful, especially if you’re new to .
Inspect data from other packages
Write R code to install the dslabs package from CRAN, then write code to load the library. Write some code to preview and inspect the movielens
data frame that gets loaded when you load the library using some of the techniques we saw in class. For each of the following questions, write code to find your answer and leave a detailed response in a comment:
- What is this dataset about?
- How many observations are in the data frame?
- What is the original source of the data?
- What type of data is each variable?
- What are the years of the earliest and most recent observations in the data set?
Answer questions about the data
For each of the following questions, write code to find your answer and leave a detailed response in a comment:
- What is the min, mean, and max rating in the data set?
- How many observations received the maximum rating?
- What percentage of total observations received the maximum rating?
- What is the title of the observation with the longest
title
(in terms of numbers of letters in the title)?
Installing packages from Github: the BRRR library
The vast majority of the time, you will install external packages using the install.packages()
function. This installs packages from the Comprehensive R Archive Network (CRAN), where most packages are published. But you can also install packages that are under development or haven’t been published to CRAN yet. Most of the time, these packages are hosted on GitHub - an online platform for sharing code (it’s also where all of the files that make up this website are stored).
To install a package from GitHub, you first need to install the remotes library. Then you can use the remotes
::install_github()` function to install packages directly from GitHub. To try this out, install the remotes library, then trying installing the BRRR package:
Note: Packges on GitHub are in development and often require other packges to work. So if you get an installation error about some other package dependency, try restarting your R session and try again.
Not sure what this package does? Well, one of the other nice things about packages listed on GitHub is the authors tend to write detailed descriptions - check out the GitHub page for the BRRR package. Then try using the BRRR::skrrrahh()
function with different number arguments (turn your volume up). In the #welcome
channel on slack, post your favorite argument to skrrrahh()
(mine is 24).