79 lines
		
	
	
		
			1.8 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
		
		
			
		
	
	
			79 lines
		
	
	
		
			1.8 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| 
								 | 
							
								---
							 | 
						||
| 
								 | 
							
								title: Subsetting Data in R
							 | 
						||
| 
								 | 
							
								---
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								## What is subsetting?
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Subsetting is the selection or extraction of specific parts of larger data. We can subset on various kinds of data objects: vectors, lists, and data frames. 
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								## Subsetting operators
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								There are three subsetting operators: `[`, `[[` and `$`. 
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								`[[` is similar to `[`, except it can only return a single value and it allows you to pull elements out of a list. 
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								`$` is a useful shorthand for `[[` combined with character subsetting.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								You need `[[` when working with lists. This is because when `[` is applied to a list, it always returns a list; it never gives you the contents of the list.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The following are the examples of subsetting of various `R` objects:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								**1. Vectors**
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								```r
							 | 
						||
| 
								 | 
							
								x <- c(2.1, 4.2, 3.3, 5.4)
							 | 
						||
| 
								 | 
							
								x[c(3, 1)]      # Subsetting using positive integers: return elements at the specified positions.
							 | 
						||
| 
								 | 
							
								## [1] 3.3 2.1
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								x[-c(3, 1)]     # Subsetting using positive integers: return elements at the specified positions.
							 | 
						||
| 
								 | 
							
								## [1] 4.2 5.4
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								x[c(TRUE, TRUE, FALSE, FALSE)]  # # Subsetting using logical vectors.
							 | 
						||
| 
								 | 
							
								## [1] 2.1 4.2
							 | 
						||
| 
								 | 
							
								```
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								**2. Lists**
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								```r
							 | 
						||
| 
								 | 
							
								a <- matrix(1:9, nrow = 3)
							 | 
						||
| 
								 | 
							
								colnames(a) <- c("A", "B", "C")
							 | 
						||
| 
								 | 
							
								a[1:2, ]
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								##      A B C
							 | 
						||
| 
								 | 
							
								## [1,] 1 4 7
							 | 
						||
| 
								 | 
							
								## [2,] 2 5 8
							 | 
						||
| 
								 | 
							
								```
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								**3. Data Frames**
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								```r
							 | 
						||
| 
								 | 
							
								df <- data.frame(x = 1:3, y = 3:1, z = letters[1:3])
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								df[df$x == 2, ]
							 | 
						||
| 
								 | 
							
								##   x y z
							 | 
						||
| 
								 | 
							
								## 2 2 2 b
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								df[c(1, 3), ]
							 | 
						||
| 
								 | 
							
								##   x y z
							 | 
						||
| 
								 | 
							
								## 1 1 3 a
							 | 
						||
| 
								 | 
							
								## 3 3 1 c
							 | 
						||
| 
								 | 
							
								```
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								To get content of a list use `[[` operator like:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								```r
							 | 
						||
| 
								 | 
							
								a <- list(a = 1, b = 2)
							 | 
						||
| 
								 | 
							
								a[[1]]
							 | 
						||
| 
								 | 
							
								## [1] 1
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								a[["a"]]
							 | 
						||
| 
								 | 
							
								## [1] 1
							 | 
						||
| 
								 | 
							
								```
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								## Resources
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								 * [Quick-R](https://www.statmethods.net/management/subset.html)
							 | 
						||
| 
								 | 
							
								 * [R Documentation](https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/subset)
							 | 
						||
| 
								 | 
							
								 * [R Bloggers](https://www.r-bloggers.com/5-ways-to-subset-a-data-frame-in-r/)
							 | 
						||
| 
								 | 
							
								 * [Advanced R](http://adv-r.had.co.nz/Subsetting.html)
							 |