subsetting by multiple columns in pandas (#32603)

Addthe subsetting by multiple columns in pandas
2018-12-17 23:11:24 -05:00
parent 2192155f39
commit 5cf95b96e0
1 changed files with 10 additions and 5 deletions
--- a/guide/english/data-science-tools/pandas/index.md
+++ b/guide/english/data-science-tools/pandas/index.md
@ -119,6 +119,15 @@ Another option for subsetting a dataframe is using the loc and iloc methods. The
 ages = df.loc["age"]
 ```

+Instead of passing only one column name inside the brackets, we can pass a List of column names. The return value is a DataFrame.
+```python
+person_info = df[["name","age","address"]]
+```
+The `person_info` variable is a reference to the original `df`. If you want to make a clone that does not reference the original, simply use the `copy` method:
+```python
+person_info = df[["name","age","address"]].copy()
+```
+
 ### Basic Statistics
 Descriptive statistics can be performed on each column of a pandas dataframe. 

@ -163,11 +172,10 @@ left = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
                      index=['K0', 'K1', 'K2']) 

 right = pd.DataFrame({'C': ['C0', 'C2', 'C3'],
-                    'D': ['D0', 'D2', 'D3']},
+                      'D': ['D0', 'D2', 'D3']},
                      index=['K0', 'K2', 'K3'])
 ```

-
 ```python
 left.join(right)
 ```
@ -214,9 +222,6 @@ It wil return a Boolean value telling you whether it’s a missing value.
 ```pd.dropna()```
 This will drop all rows that have any missing values.

-
-
-
 #### More Information:
 1. [pandas](http://pandas.pydata.org/)
 2. [read_csv](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html?highlight=read_csv#pandas.read_csv)