subsetting by multiple columns in pandas (#32603)

Addthe subsetting by multiple columns in pandas
This commit is contained in:
van-tienhoang
2018-12-17 23:11:24 -05:00
committed by Christopher McCormack
parent 2192155f39
commit 5cf95b96e0

View File

@ -119,6 +119,15 @@ Another option for subsetting a dataframe is using the loc and iloc methods. The
ages = df.loc["age"]
```
Instead of passing only one column name inside the brackets, we can pass a List of column names. The return value is a DataFrame.
```python
person_info = df[["name","age","address"]]
```
The `person_info` variable is a reference to the original `df`. If you want to make a clone that does not reference the original, simply use the `copy` method:
```python
person_info = df[["name","age","address"]].copy()
```
### Basic Statistics
Descriptive statistics can be performed on each column of a pandas dataframe.
@ -167,7 +176,6 @@ right = pd.DataFrame({'C': ['C0', 'C2', 'C3'],
index=['K0', 'K2', 'K3'])
```
```python
left.join(right)
```
@ -214,9 +222,6 @@ It wil return a Boolean value telling you whether its a missing value.
```pd.dropna()```
This will drop all rows that have any missing values.
#### More Information:
1. [pandas](http://pandas.pydata.org/)
2. [read_csv](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html?highlight=read_csv#pandas.read_csv)