However, you can get full credit using google sheets or Microsoft excel to build your visualization. (A graph is a visualization) Are there trends you notice?

Assignment Question

Your practical project is the culmination of all the things you learned in this class when it comes to coding. For this practical, you will select between datasets and write an analysis of those datasets. Follow the rubric. You are *NOT* writing about your code, but instead, write about what you learned about the data, and find supporting information.

Imagine if you were presenting this information to an employer who has to make a business decision based on it. The points of the assignment are similar to the other essays on a 0-4 standards scale, however, scaled so points balance out correctly with the final exam. Possible scores are 8, 6, 4, 2, 0 (4-0 * 2). In the writing portion, do not describe how your code works. Instead, gather what you learn about the data from the coding assignment along with any additional python data manipulation, work in excel, or further external research, and write an analysis describing the narrative around your chosen dataset, the possible conclusions you have found, and your own as well as others’ biases (don’t be afraid to use the word “I” in self-reflective sections!). Keep in mind that for this writing assignment, you will not have the usual submission grace period and resubmissions. Because of this, it is even more strongly recommended that you read through the essay rubric before you begin writing to get a clear idea of what is expected of your work. If you are nervous about submitting either draft, bring your essay into your Lab or Office Hours to get feedback from one of our TAs, who are familiar with the grading process and what’s expected of the assignment.

Your written report should be a maximum of six pages. There is no minimum, but you should be able to fully express the narrative in the space allowed. It should be noted that six page is a common number for conference proceedings, and does not include a separate title page or bibliography. We expect most reports to be under this number, maybe a couple pages at the most. Your report will be turned in via canvas, and you will find a rubric for the report in the assignment listing. Your TAs and instructor will grade your reports based on the rubric. What to include? Detail the narrative of the primary dataset you analyzed. How does this narrative fit with other information you have found online about similar datasets (i.e. other references)? A data visualization. Ideally, you use a python library like matplotlib to create a visual of your data. However, you can get full credit using google sheets or microsoft excel to build your visualization. (A graph is a visualization) Are there trends you notice? This can be comparison over time, or something that stands out to you in your data visualization. It should have an intro, body and conclusion at a minimum. Common Questions Can you include graphs? For full credit you need at least one data visualization. Graphs count for this. You DO NOT need to write the code to generate the graph. You can use google sheets, or excel if you don’t get the code running. Do you need references? Yes. Every dataset is referenced on the dataset page, and you should find outside sources to confirm any info you find. Why this practical project? Many of you will continue onto other majors, without much a demand in coding. However, nearly every major requires analyzing data in some form, and having experience coding means you can use that experience to help you write scripts and applications to analyze that data.

my code: import csv filename = “titanic.csv” # Step 0: Identify collumns. # In this step you will be making several variables to keep track of the # indexes of the csv file for the assignment. # To do this, open the file seperately and find which index matches which step. # These indexes will be used in future functions in later steps. with open(filename, ‘r’) as file: reader = csv.reader(file) header = next(reader) # Read the header row name_index = header.index(‘Name’) surv_index = header.index(‘Survived’) sex_index = header.index(‘Sex’) fare_index = header.index(‘Fare’) # Step 1: csv_reader(file) # reads a file using csv.reader and returns a list of lists # with each item being a row, and rows being the values # in the csv row. Look back at the CSV lab on reading csv files into a list. # Because this file has a header, you will need a skip it. def csv_reader(file): with open(file, ‘r’) as csv_file: reader = csv.reader(csv_file) next(reader) # Skip the header row return [row for row in reader] # Step 2: longest_passenger_name(passenger_list) # Parameter: list # returns the longest name in the list. def longest_passenger_name(passenger_list): longest_name = max(passenger_list, key=lambda x: len(x[name_index])) return longest_name[name_index] # Step 3: total_survival_percentage(passenger_list) # Parameter: list # returns the total percentage of people who survived in the list. # NOTE: survival in the sheet is denoted as a 1 while death is denoted as a 0. def total_survival_percentage(passenger_list): total_passengers = len(passenger_list) total_survived = sum(int(row[surv_index]) for row in passenger_list) survival_percentage = total_survived / total_passengers return round(survival_percentage, 2) # Step 4: survival_rate_gender(passenger_list) # Parameter: list # returns: a tuple containing the survival rate of each gender in the form of male_rate, female_rate. def survival_rate_gender(passenger_list): male_survived = sum(1 for row in passenger_list if row[sex_index].lower() == ‘male’ and row[surv_index] == ‘1’) female_survived = sum(1 for row in passenger_list if row[sex_index].lower() == ‘female’ and row[surv_index] == ‘1’) male_total = sum(1 for row in passenger_list if row[sex_index].lower() == ‘male’) female_total = sum(1 for row in passenger_list if row[sex_index].lower() == ‘female’) male_survival_rate = male_survived / male_total if male_total > 0 else 0 female_survival_rate = female_survived / female_total if female_total > 0 else 0 return round(male_survival_rate, 2), round(female_survival_rate, 2) # Step 5: average_ticket_fare(passenger_list) # Parameter: list # returns the average ticket fare of the given list. def average_ticket_fare(passenger_list): fares = [float(row[fare_index]) for row in passenger_list if row[fare_index]] average_fare = sum(fares) / len(fares) if fares else 0 return round(average_fare, 2) # Step 6: main # This is the function that will call all of the functions you have written in the previous steps. def main(): passenger_list = csv_reader(filename) print(“Longest Name:”, longest_passenger_name(passenger_list)) total_survival_percent = total_survival_percentage(passenger_list) print(“Total Survival Percentage: {:.2%}”.format(total_survival_percent)) male_survival_rate, female_survival_rate = survival_rate_gender(passenger_list) print(“Male Survival Percentage: {:.2%}”.format(male_survival_rate)) print(“Female Survival Percentage: {:.2%}”.format(female_survival_rate)) average_fare = average_ticket_fare(passenger_list) print(“Average Ticket Cost: {:.2f}”.format(average_fare)) if __name__ == ‘__main__’: main()

code instructions: Practical Project > Titanic The titanic sunk in 1912, but the general public doesn’t know much about its passengers. This dataset contains the details of passengers of the “unsinkable titanic”. Introduction In this practical you will be extracting data from a csv file about Titanic passengers, you will be trying to gather information about them as a whole. Make sure to open the CSV file and look at it to understand how the file works For quick reference, the file is laid out as follows. PassengerId (ID number of the given passenger) Survived (Did the passenger survive? 1 if yes 0 if no) Pclass (What class of ticket did the passenger buy,values range from 1-3) Name (What is the name of the passenger) Sex (What is the sex of the passenger) Age (How old was the passenger at the time of the disaster) SibSp (How many siblings and spouses did the passenger have aboard the ship) Parch (How many parents and children did the passenger have aboard the ship) Ticket (What ticket did the passenger have, ticket number) Fare (How much did the ticket cost) Cabin (What cabin was the passenger in) Embarked (Port of Embarkation C = Cherbourg; Q = Queenstown; S = Southampton) The names in bold are the columns that you will be using in your program. Variables (Step 0) you will create four variables as file wide variables (often called global). Each variable is the index value of the column in the titanic.csv file. name_index = ?? surv_index = ?? sex_index = ?? fare_index = ?? Note: Remember that you will be dealing with a list in future methods. Be sure to brush up on how to access certain values of a list. Step 1: csv_reader(file) Reads a file using csv.reader and returns a list of lists with each item being a row, and rows being the values in the csv row. Look back at the CSV lab on reading csv files into a list. The function will be mostly the same with one exception. Since the file has a header row, you will need to either skip the first row, or remove it after you are done. NOTE:* Recall that next(reader) can be used to skip a row. You should test this now. Maybe print out the length of the list returned from the method. For example, a test could be print(“TESTING”, len(csv_reader(file))) #where file is set above to either titanic.csv or the tests file Step 2: longest_passenger_name(passenger_list) This function will take in the list created from csv_reader and will parse through each list to find the various names of all the passengers. It will then try to find the longest name, and return that name at the end of the method. Make sure to test this method! Here is an example test (notice, we are just creating our own list) test_list = [[1,0,3,”Longest Name”],[2,0,2,”Short”]] print(“TESTING”, longest_passenger_name(test_list)) print(“TESTING”, longest_passenger_name(csv_reader(filename))) Step 3: total_survival_percentage(passenger_list) This function will take in the list created from csv_reader and will parse through the list to find what percentage of passengers survived the sinking of the titanic. NOTE: In the survived column, those who survived will have a 1, while those who died will have a 0. The total number of survived should be divided by the total number of people to find the percentage. test_list = [[1,0],[2,1],[3,1],[4,1]] print(“TESTING”,total_survival_percentage(test_list)) print(“TESTING”, total_survival_percentage(csv_reader(filename))) your answer from the file should be a long decimal value and that is okay, we will format it in a later step! Step 4: survival_rate_gender(passenger_list) This function will do something very similar to step 3, but instead of keeping an overall survival percentage, it keeps a seperate survival percentage for male and female. This means you will need to count their number of survives and total number for each gender seperately. At the end you will return a tuple in the form of (male_surivival_rate, female_survival_rate) Remember in order to return a tuple you use the form return (item, item) test_list = [[1,1,3,”alice”,”female”],[2,0,2,”John”,”male”],[3,0,1,”Jane”, “female”]] print(“TESTING”,survival_rate_gender(test_list)) print(“TESTING”, survival_rate_gender(csv_reader(filename))) Step 5: average_ticket_fare(passenger_list) This function will take in a list created from the csv reader and will parse through it to find the average ticket price, as denoted by the fare column of the file. Step 6: main() This is the function that you will write to call all the functions that you have already written. You will need to print out each function return to match the formatting in order. NOTE: Tuples can be accessed similar to a list. tuple[0] accesses the first element with tuple[1] being the second and so on. All decimal numbers should be formatted to two decimal places Longest Name: Penasco y Castellana, Mrs. Victor de Satode (Maria Josefa Perez de Soto y Vallejo) Total Survival Percentage: 0.38 Male Survival Percentage: 0.19 Female Survival Percentage: 0.74 Average Ticket Cost: 32.20