Day 3. Data Analysis and Graphics with R.
R is powerful data programming language and environment for statistical computing, data analysis and graphics. R is typically used to explore and understand data in an open-ended, highly interactive, iterative way. Learning R will give you the freedom to experiment and problem solve during data analysis — exactly what we need as bioinformaticians and data scientists.
Before getting our hands dirty working with real data in R, we need to learn the basics of the R language. Even if you’ve poked around in R and seen these concepts before, I would still recommend you follow along and complete the free online interactive learning tutorial. This will take you through a gentle introduction to R syntax and some of the major R data structures (called vectors, matrices data.frames and lists) that we will cover in more detail in class.
Schedule:
Session | Time | Topics |
---|---|---|
I | 9:00-10:15 AM | Introduction to R |
10:15-10:30AM | Coffee Break | |
II | 10:30-12:00 AM | R Control Structures and Functions |
12:00-1:00PM | Lunch | |
III | 1:00-2:15 PM | Data Exploration and Visualization in R |
2:15-2:30 PM | Coffee Break | |
IV | 2:30-4:00 PM | Working with R packages from CRAN & Bioconductor |
Instructors:
Armand Bankhead (AB)
Topics:
I) Introduction to R [1.25 hr] slides
- What is R and Why Use it?
- Ways to Use R
- R as a Statistical Programming Language
- Writing and Running R Scripts
- Data Types
- Data Structures
- Vector and Matrix Operations
—- Coffee Break [15 mins] —
II) R Control Structures and Functions [1.5 hr] slides
- Working Directory
- Reading and Writing Data in R
- Factors
- Using Indexes
- Merging Data Frames
- Functions
- Program Control Structures
—- Lunch Break [1 hr] —
III) Data Exploration and Visualization in R 1.25 hr slides
- Summarizing Data in R
- Creating Plots in R
—- Coffee Break [15 mins] —
IV) Working with packages from CRAN & Bioconductor [1.5 hr] slides
- CRAN (Comprehensive R Archive Network)
- Bioconductor, a bioinformatics package repository
- Package Installation
- Package Documentation
- Package Source Code
- Tidyverse
- Example: BiomaRt Bioconductor Package
—- End/Wrap-Up —
Datasets
Pedersen Log2RPKM Gene Expression Data: file1 file2
Reference material
RStudio cheatsheet: A well designed reference card for RStudio features
ggplot2 cheatsheet: A pragmatic reference creating ggplot2 visualizations
R for Data Science: A brand new O’Reilly book, available free online, that will teach you how to do data science with R
Class notes on R language basics
Class notes on useful R functions for working with strings