From 0deac73642a1ed09a84d76da296250d19d621932 Mon Sep 17 00:00:00 2001 From: Alessia Vanni Date: Mon, 29 Nov 2021 06:24:18 +0100 Subject: [PATCH] chore(curriculum): add instructions 08 (#44160) Co-authored-by: Krzysztof <60067306+gikf@users.noreply.github.com> --- .../demographic-data-analyzer.md | 53 +++++++++++++-- ...-variance-standard-deviation-calculator.md | 55 ++++++++++++++-- .../medical-data-visualizer.md | 65 +++++++++++++++++-- .../page-view-time-series-visualizer.md | 35 ++++++++-- .../sea-level-predictor.md | 39 +++++++++-- 5 files changed, 227 insertions(+), 20 deletions(-) diff --git a/curriculum/challenges/english/08-data-analysis-with-python/data-analysis-with-python-projects/demographic-data-analyzer.md b/curriculum/challenges/english/08-data-analysis-with-python/data-analysis-with-python-projects/demographic-data-analyzer.md index 135049889c..9b67fad630 100644 --- a/curriculum/challenges/english/08-data-analysis-with-python/data-analysis-with-python-projects/demographic-data-analyzer.md +++ b/curriculum/challenges/english/08-data-analysis-with-python/data-analysis-with-python-projects/demographic-data-analyzer.md @@ -8,13 +8,58 @@ dashedName: demographic-data-analyzer # --description-- -In this challenge you must analyze demographic data using Pandas. You are given a dataset of demographic data that was extracted from the 1994 Census database. +You will be [working on this project with our Replit starter code](https://replit.com/github/freeCodeCamp/boilerplate-demographic-data-analyzer). -You can access [the full project description and starter code on Replit](https://replit.com/github/freeCodeCamp/boilerplate-demographic-data-analyzer). +We are still developing the interactive instructional part of the Python curriculum. For now, here are some videos on the freeCodeCamp.org YouTube channel that will teach you everything you need to know to complete this project: -After going to that link, fork the project. Once you complete the project based on the instructions in 'README.md', submit your project link below. +- [Python for Everybody Video Course](https://www.freecodecamp.org/news/python-for-everybody/) (14 hours) +- [Learn Python Video Course](https://www.freecodecamp.org/news/learn-python-video-course/) (10 hours) -We are still developing the interactive instructional part of the data analysis with Python curriculum. For now, you will have to use other resources to learn how to pass this challenge. +# --instructions-- + +In this challenge you must analyze demographic data using Pandas. You are given a dataset of demographic data that was extracted from the 1994 Census database. Here is a sample of what the data looks like: + +```markdown +| | age | workclass | fnlwgt | education | education-num | marital-status | occupation | relationship | race | sex | capital-gain | capital-loss | hours-per-week | native-country | salary | +|---:|------:|:-----------------|---------:|:------------|----------------:|:-------------------|:------------------|:---------------|:-------|:-------|---------------:|---------------:|-----------------:|:-----------------|:---------| +| 0 | 39 | State-gov | 77516 | Bachelors | 13 | Never-married | Adm-clerical | Not-in-family | White | Male | 2174 | 0 | 40 | United-States | <=50K | +| 1 | 50 | Self-emp-not-inc | 83311 | Bachelors | 13 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 0 | 0 | 13 | United-States | <=50K | +| 2 | 38 | Private | 215646 | HS-grad | 9 | Divorced | Handlers-cleaners | Not-in-family | White | Male | 0 | 0 | 40 | United-States | <=50K | +| 3 | 53 | Private | 234721 | 11th | 7 | Married-civ-spouse | Handlers-cleaners | Husband | Black | Male | 0 | 0 | 40 | United-States | <=50K | +| 4 | 28 | Private | 338409 | Bachelors | 13 | Married-civ-spouse | Prof-specialty | Wife | Black | Female | 0 | 0 | 40 | Cuba | <=50K | +``` + +You must use Pandas to answer the following questions: + +- How many people of each race are represented in this dataset? This should be a Pandas series with race names as the index labels. (`race` column) +- What is the average age of men? +- What is the percentage of people who have a Bachelor's degree? +- What percentage of people with advanced education (`Bachelors`, `Masters`, or `Doctorate`) make more than 50K? +- What percentage of people without advanced education make more than 50K? +- What is the minimum number of hours a person works per week? +- What percentage of the people who work the minimum number of hours per week have a salary of more than 50K? +- What country has the highest percentage of people that earn >50K and what is that percentage? +- Identify the most popular occupation for those who earn >50K in India. + +Use the starter code in the file `demographic_data_analyzer`. Update the code so all variables set to "None" are set to the appropriate calculation or code. Round all decimals to the nearest tenth. + +Unit tests are written for you under `test_module.py`. + +## Development + +For development, you can use `main.py` to test your functions. Click the "run" button and `main.py` will run. + +## Testing + +We imported the tests from `test_module.py` to `main.py` for your convenience. The tests will run automatically whenever you hit the "run" button. + +## Submitting + +Copy your project's URL and submit it to freeCodeCamp. + +## Dataset Source + +Dua, D. and Graff, C. (2019). [UCI Machine Learning Repository](http://archive.ics.uci.edu/ml). Irvine, CA: University of California, School of Information and Computer Science. # --hints-- diff --git a/curriculum/challenges/english/08-data-analysis-with-python/data-analysis-with-python-projects/mean-variance-standard-deviation-calculator.md b/curriculum/challenges/english/08-data-analysis-with-python/data-analysis-with-python-projects/mean-variance-standard-deviation-calculator.md index c1e596625d..447e9ece0b 100644 --- a/curriculum/challenges/english/08-data-analysis-with-python/data-analysis-with-python-projects/mean-variance-standard-deviation-calculator.md +++ b/curriculum/challenges/english/08-data-analysis-with-python/data-analysis-with-python-projects/mean-variance-standard-deviation-calculator.md @@ -8,13 +8,60 @@ dashedName: mean-variance-standard-deviation-calculator # --description-- -Create a function that uses Numpy to output the mean, variance, and standard deviation of the rows, columns, and elements in a 3 x 3 matrix. +You will be [working on this project with our Replit starter code](https://replit.com/github/freeCodeCamp/boilerplate-mean-variance-standard-deviation-calculator). -You can access [the full project description and starter code on Replit](https://replit.com/github/freeCodeCamp/boilerplate-mean-variance-standard-deviation-calculator). +We are still developing the interactive instructional part of the Python curriculum. For now, here are some videos on the freeCodeCamp.org YouTube channel that will teach you everything you need to know to complete this project: -After going to that link, fork the project. Once you complete the project based on the instructions in 'README.md', submit your project link below. +- [Python for Everybody Video Course](https://www.freecodecamp.org/news/python-for-everybody/) (14 hours) +- [Learn Python Video Course](https://www.freecodecamp.org/news/learn-python-video-course/) (10 hours) -We are still developing the interactive instructional part of the data analysis with Python curriculum. For now, you will have to use other resources to learn how to pass this challenge. +# --instructions-- + +Create a function named `calculate()` in `mean_var_std.py` that uses Numpy to output the mean, variance, standard deviation, max, min, and sum of the rows, columns, and elements in a 3 x 3 matrix. + +The input of the function should be a list containing 9 digits. The function should convert the list into a 3 x 3 Numpy array, and then return a dictionary containing the mean, variance, standard deviation, max, min, and sum along both axes and for the flattened matrix. + +The returned dictionary should follow this format: + +```py +{ + 'mean': [axis1, axis2, flattened], + 'variance': [axis1, axis2, flattened], + 'standard deviation': [axis1, axis2, flattened], + 'max': [axis1, axis2, flattened], + 'min': [axis1, axis2, flattened], + 'sum': [axis1, axis2, flattened] +} +``` + +If a list containing less than 9 elements is passed into the function, it should raise a `ValueError` exception with the message: "List must contain nine numbers." The values in the returned dictionary should be lists and not Numpy arrays. + +For example, `calculate([0,1,2,3,4,5,6,7,8])` should return: + +```py +{ + 'mean': [[3.0, 4.0, 5.0], [1.0, 4.0, 7.0], 4.0], + 'variance': [[6.0, 6.0, 6.0], [0.6666666666666666, 0.6666666666666666, 0.6666666666666666], 6.666666666666667], + 'standard deviation': [[2.449489742783178, 2.449489742783178, 2.449489742783178], [0.816496580927726, 0.816496580927726, 0.816496580927726], 2.581988897471611], + 'max': [[6, 7, 8], [2, 5, 8], 8], + 'min': [[0, 1, 2], [0, 3, 6], 0], + 'sum': [[9, 12, 15], [3, 12, 21], 36] +} +``` + +The unit tests for this project are in `test_module.py`. + +## Development + +For development, you can use `main.py` to test your `calculate()` function. Click the "run" button and `main.py` will run. + +## Testing + +We imported the tests from `test_module.py` to `main.py` for your convenience. The tests will run automatically whenever you hit the "run" button. + +## Submitting + +Copy your project's URL and submit it to freeCodeCamp. # --hints-- diff --git a/curriculum/challenges/english/08-data-analysis-with-python/data-analysis-with-python-projects/medical-data-visualizer.md b/curriculum/challenges/english/08-data-analysis-with-python/data-analysis-with-python-projects/medical-data-visualizer.md index d8ef3d60c9..540499ebf5 100644 --- a/curriculum/challenges/english/08-data-analysis-with-python/data-analysis-with-python-projects/medical-data-visualizer.md +++ b/curriculum/challenges/english/08-data-analysis-with-python/data-analysis-with-python-projects/medical-data-visualizer.md @@ -8,13 +8,70 @@ dashedName: medical-data-visualizer # --description-- -In this project, you will visualize and make calculations from medical examination data using matplotlib, seaborn, and pandas. +You will be [working on this project with our Replit starter code](https://replit.com/github/freeCodeCamp/boilerplate-medical-data-visualizer). -You can access [the full project description and starter code on Replit](https://replit.com/github/freeCodeCamp/boilerplate-medical-data-visualizer). +We are still developing the interactive instructional part of the Python curriculum. For now, here are some videos on the freeCodeCamp.org YouTube channel that will teach you everything you need to know to complete this project: -After going to that link, fork the project. Once you complete the project based on the instructions in 'README.md', submit your project link below. +- [Python for Everybody Video Course](https://www.freecodecamp.org/news/python-for-everybody/) (14 hours) +- [Learn Python Video Course](https://www.freecodecamp.org/news/learn-python-video-course/) (10 hours) -We are still developing the interactive instructional part of the data analysis with Python curriculum. For now, you will have to use other resources to learn how to pass this challenge. +# --instructions-- + +In this project, you will visualize and make calculations from medical examination data using matplotlib, seaborn, and pandas. The dataset values were collected during medical examinations. + +## Data description + +The rows in the dataset represent patients and the columns represent information like body measurements, results from various blood tests, and lifestyle choices. You will use the dataset to explore the relationship between cardiac disease, body measurements, blood markers, and lifestyle choices. + +File name: medical_examination.csv + +| Feature | Variable Type | Variable | Value Type | +|:-------:|:------------:|:-------------:|:----------:| +| Age | Objective Feature | age | int (days) | +| Height | Objective Feature | height | int (cm) | +| Weight | Objective Feature | weight | float (kg) | +| Gender | Objective Feature | gender | categorical code | +| Systolic blood pressure | Examination Feature | ap_hi | int | +| Diastolic blood pressure | Examination Feature | ap_lo | int | +| Cholesterol | Examination Feature | cholesterol | 1: normal, 2: above normal, 3: well above normal | +| Glucose | Examination Feature | gluc | 1: normal, 2: above normal, 3: well above normal | +| Smoking | Subjective Feature | smoke | binary | +| Alcohol intake | Subjective Feature | alco | binary | +| Physical activity | Subjective Feature | active | binary | +| Presence or absence of cardiovascular disease | Target Variable | cardio | binary | + +## Tasks + +Create a chart similar to `examples/Figure_1.png`, where we show the counts of good and bad outcomes for the `cholesterol`, `gluc`, `alco`, `active`, and `smoke` variables for patients with cardio=1 and cardio=0 in different panels. + +Use the data to complete the following tasks in `medical_data_visualizer.py`: + +- Add an `overweight` column to the data. To determine if a person is overweight, first calculate their BMI by dividing their weight in kilograms by the square of their height in meters. If that value is > 25 then the person is overweight. Use the value 0 for NOT overweight and the value 1 for overweight. +- Normalize the data by making 0 always good and 1 always bad. If the value of `cholesterol` or `gluc` is 1, make the value 0. If the value is more than 1, make the value 1. +- Convert the data into long format and create a chart that shows the value counts of the categorical features using seaborn's `catplot()`. The dataset should be split by 'Cardio' so there is one chart for each `cardio` value. The chart should look like `examples/Figure_1.png`. +- Clean the data. Filter out the following patient segments that represent incorrect data: + - diastolic pressure is higher than systolic (Keep the correct data with `(df['ap_lo'] <= df['ap_hi'])`) + - height is less than the 2.5th percentile (Keep the correct data with `(df['height'] >= df['height'].quantile(0.025))`) + - height is more than the 97.5th percentile + - weight is less than the 2.5th percentile + - weight is more than the 97.5th percentile +- Create a correlation matrix using the dataset. Plot the correlation matrix using seaborn's `heatmap()`. Mask the upper triangle. The chart should look like `examples/Figure_2.png`. + +Any time a variable is set to `None`, make sure to set it to the correct code. + +Unit tests are written for you under `test_module.py`. + +## Development + +For development, you can use `main.py` to test your functions. Click the "run" button and `main.py` will run. + +## Testing + +We imported the tests from `test_module.py` to `main.py` for your convenience. The tests will run automatically whenever you hit the "run" button. + +## Submitting + +Copy your project's URL and submit it to freeCodeCamp. # --hints-- diff --git a/curriculum/challenges/english/08-data-analysis-with-python/data-analysis-with-python-projects/page-view-time-series-visualizer.md b/curriculum/challenges/english/08-data-analysis-with-python/data-analysis-with-python-projects/page-view-time-series-visualizer.md index f0501c882b..2ca374ca72 100644 --- a/curriculum/challenges/english/08-data-analysis-with-python/data-analysis-with-python-projects/page-view-time-series-visualizer.md +++ b/curriculum/challenges/english/08-data-analysis-with-python/data-analysis-with-python-projects/page-view-time-series-visualizer.md @@ -8,13 +8,40 @@ dashedName: page-view-time-series-visualizer # --description-- -For this project you will visualize time series data using a line chart, bar chart, and box plots. You will use Pandas, matplotlib, and seaborn to visualize a dataset containing the number of page views each day on the freeCodeCamp.org forum from 2016-05-09 to 2019-12-03. The data visualizations will help you understand the patterns in visits and identify yearly and monthly growth. +You will be [working on this project with our Replit starter code](https://replit.com/github/freeCodeCamp/boilerplate-page-view-time-series-visualizer). -You can access [the full project description and starter code on Replit](https://replit.com/github/freeCodeCamp/boilerplate-page-view-time-series-visualizer). +We are still developing the interactive instructional part of the Python curriculum. For now, here are some videos on the freeCodeCamp.org YouTube channel that will teach you everything you need to know to complete this project: -After going to that link, fork the project. Once you complete the project based on the instructions in 'README.md', submit your project link below. +- [Python for Everybody Video Course](https://www.freecodecamp.org/news/python-for-everybody/) (14 hours) +- [Learn Python Video Course](https://www.freecodecamp.org/news/learn-python-video-course/) (10 hours) -We are still developing the interactive instructional part of the data analysis with Python curriculum. For now, you will have to use other resources to learn how to pass this challenge. +# --instructions-- + +For this project you will visualize time series data using a line chart, bar chart, and box plots. You will use Pandas, Matplotlib, and Seaborn to visualize a dataset containing the number of page views each day on the freeCodeCamp.org forum from 2016-05-09 to 2019-12-03. The data visualizations will help you understand the patterns in visits and identify yearly and monthly growth. + +Use the data to complete the following tasks: + +- Use Pandas to import the data from "fcc-forum-pageviews.csv". Set the index to the "date" column. +- Clean the data by filtering out days when the page views were in the top 2.5% of the dataset or bottom 2.5% of the dataset. +- Create a `draw_line_plot` function that uses Matplotlib to draw a line chart similar to "examples/Figure_1.png". The title should be "Daily freeCodeCamp Forum Page Views 5/2016-12/2019". The label on the x axis should be "Date" and the label on the y axis should be "Page Views". +- Create a `draw_bar_plot` function that draws a bar chart similar to "examples/Figure_2.png". It should show average daily page views for each month grouped by year. The legend should show month labels and have a title of "Months". On the chart, the label on the x axis should be "Years" and the label on the y axis should be "Average Page Views". +- Create a `draw_box_plot` function that uses Searborn to draw two adjacent box plots similar to "examples/Figure_3.png". These box plots should show how the values are distributed within a given year or month and how it compares over time. The title of the first chart should be "Year-wise Box Plot (Trend)" and the title of the second chart should be "Month-wise Box Plot (Seasonality)". Make sure the month labels on bottom start at "Jan" and the x and x axis are labeled correctly. The boilerplate includes commands to prepare the data. + +For each chart, make sure to use a copy of the data frame. Unit tests are written for you under `test_module.py`. + +The boilerplate also includes commands to save and return the image. + +## Development + +For development, you can use `main.py` to test your functions. Click the "run" button and `main.py` will run. + +## Testing + +We imported the tests from `test_module.py` to `main.py` for your convenience. The tests will run automatically whenever you hit the "run" button. + +## Submitting + +Copy your project's URL and submit it to freeCodeCamp. # --hints-- diff --git a/curriculum/challenges/english/08-data-analysis-with-python/data-analysis-with-python-projects/sea-level-predictor.md b/curriculum/challenges/english/08-data-analysis-with-python/data-analysis-with-python-projects/sea-level-predictor.md index a708eedae1..04742507b6 100644 --- a/curriculum/challenges/english/08-data-analysis-with-python/data-analysis-with-python-projects/sea-level-predictor.md +++ b/curriculum/challenges/english/08-data-analysis-with-python/data-analysis-with-python-projects/sea-level-predictor.md @@ -8,13 +8,44 @@ dashedName: sea-level-predictor # --description-- -In this project, you will analyze a dataset of the global average sea level change since 1880. You will use the data to predict the sea level change through year 2050. +You will be [working on this project with our Replit starter code](https://replit.com/github/freeCodeCamp/boilerplate-sea-level-predictor). -You can access [the full project description and starter code on Replit](https://replit.com/github/freeCodeCamp/boilerplate-sea-level-predictor). +We are still developing the interactive instructional part of the Python curriculum. For now, here are some videos on the freeCodeCamp.org YouTube channel that will teach you everything you need to know to complete this project: -After going to that link, fork the project. Once you complete the project based on the instructions in 'README.md', submit your project link below. +- [Python for Everybody Video Course](https://www.freecodecamp.org/news/python-for-everybody/) (14 hours) +- [Learn Python Video Course](https://www.freecodecamp.org/news/learn-python-video-course/) (10 hours) + +# --instructions-- + +You will analyze a dataset of the global average sea level change since 1880. You will use the data to predict the sea level change through year 2050. + +Use the data to complete the following tasks: + +- Use Pandas to import the data from `epa-sea-level.csv`. +- Use matplotlib to create a scatter plot using the "Year" column as the x-axis and the "CSIRO Adjusted Sea Level" column as the y-axix. +- Use the `linregress` function from `scipy.stats` to get the slope and y-intercept of the line of best fit. Plot the line of best fit over the top of the scatter plot. Make the line go through the year 2050 to predict the sea level rise in 2050. +- Plot a new line of best fit just using the data from year 2000 through the most recent year in the dataset. Make the line also go through the year 2050 to predict the sea level rise in 2050 if the rate of rise continues as it has since the year 2000. +- The x label should be "Year", the y label should be "Sea Level (inches)", and the title should be "Rise in Sea Level". + +Unit tests are written for you under `test_module.py`. + +The boilerplate also includes commands to save and return the image. + +## Development + +For development, you can use `main.py` to test your functions. Click the "run" button and `main.py` will run. + +## Testing + +We imported the tests from `test_module.py` to `main.py` for your convenience. The tests will run automatically whenever you hit the "run" button. + +## Submitting + +Copy your project's URL and submit it to freeCodeCamp. + +## Data Source +[Global Average Absolute Sea Level Change](https://datahub.io/core/sea-level-rise), 1880-2014 from the US Environmental Protection Agency using data from CSIRO, 2015; NOAA, 2015. -We are still developing the interactive instructional part of the data analysis with Python curriculum. For now, you will have to use other resources to learn how to pass this challenge. # --hints--