DATA EXPLORATION AND ANALYSIS
Assessment - Vis critique and creation
This assessment has five parts. In the first part, you will critique an existing visualization. In the second part. you will compare and contrast a variety of visualizations of the same dataset. In the third part, you will create your own visualization of tabular data using Python and the seaborn library. In the fourth part, you will create your own visualization of geospatial data using Python and the folium library. In the fifth and final part, you will create your own interactive visualization of network data using Python and the bokeh library
Part 1 - Visualization critique
For this part, you will be critiquing Nick Routley`s visualization of the world`s population at 8 billion people, which appears above as the first image in this document, but can also be accessed.
Your task is to describe and evaluate this visualization. Some questions you may wish to address include:
• What do you think was the intended purpose of this vis? Does it succeed in that purpose?
• Who do you think was the intended audience for this vis?
• What information is represented in this vis? What are the items, attributes, etc? How are these represented as marks and channels?
• What are some tasks, comparisons, or evaluations that are enabled by this vis?
• Would you suggest any changes/improvements?
For this part, you will submit a single document containing your critique to the assessment link
Part 2 - Visualization compare and contrast
For this part, you will be critiquing, comparing, and contrasting a variety of visualizations representing the same underlying data. The vis examples that you`ll be addressing can be found at Storytelling with Data.
To start, you will critique the following visualization (also available at the above link):
At the above link, they present a variety of alternative visualizations for this data, several of which can be found here, here, here, and here. Of these five visualizations (including the original), which do you think is/are most effective, and why? How do different design decisions result in more or less effective visualizations? Are different visualizations more appropriate to perform different tasks or comparisons?
For this part, you will submit a single document containing your critique and comparison to the assessment link
Part 3 - Tabular data visualization with Python and seaborn
Twitter/X user Matthew Moyle maintains a database of every AFL (Australian Football League) match since 1897, which can be downloaded (as a .zip file which extracts to a .csv) here. In this part, you will be creating visualizations based on this dataset using Python and seaborn.
There are quite a lot of items and attributes in this database. For our purposes, let`s restrict our consideration to just a few of these attributes, namely:
• match_home_team - Home team name
• match_home_team_score - Score for the home team in the match
• match_away_team - Away team name
• match_away_team_score - Score for the away team in the match
• match_date - Date the match was played
• match_round - Round in which the match took place (mostly numbers, but includes a variety of strings indicating various finals rounds: `Semi Final`, ‘Preliminary Final`, ‘Grand Final`, etc.) Use as categorical OR create a derived attribute that converts the strings to numbers and use as ordered
• match_winner - Name of the team that won the match, or ‘Draw` if it ended in a tie
• match_margin - Margin of victory (always >= 0)
Your job is to create a visualization, or visualizations, that enables interesting comparisons in this dataset. Please feel free to explore as you see fit, but if you don`t have any thoughts, consider creating a visualization that explores the following question:
If the away team scores some number of points, how likely are they to have won the match? Has this changed over time?
Depending on the task/question you choose, you may want to consider deriving one or more new attributes. For the question above, for example, you`ll need to derive a variable indicating whether the away team won, drew, or lost the match. (You may find the numpy.sign() function useful for this purpose.)
Some code that might be useful:
# Relevant imports
importshutil
import seaborn assns
import pandas as pd
importnumpyas np
# Download and extract the .csv file
!wget
shutil.unpack_archive(`fryziggafl.zip`)
# Load the .csv file into a pandas dataframe named afl
afl = pd.read_csv(`fryziggafl.csv`)
# Filter to just use matches post 2000, because there is A LOT of data
# You can adjust this to use even less of the data while you are
# prototyping
afl_post_2000 = afl[(afl[`match_date`] >`2000-01-01`)]
Submit your visualization(s), along with a short report indicating what tasks/questions/comparisons they are designed to enable and how,to the assessment link on the LMS by the deadline
Part 2.4 - Geospatial data visualization with Python and folium
There is a huge variety of geospatial data available, often for free, online. In this part, you will be creating a visualization based on parts of the Australian 2021 census dataset. The above map visualizes the population of each Australian state or territory as a choropleth map.
(NOTE: Dealing with free-range geospatial data is significantly more complicated than tabular data! I`ve provided the code to get you started on the next page; if you`re working in colab, you can copy that into a code cell or cells to load the datasets. It may take several minutes or more, so please be patient!)
Your job is to create a visualization, or visualizations, that enables interesting comparisons in this dataset. Please feel free to explore as you see fit, but if you don`t have any thoughts, consider creating a visualization that explores the following question:
Plot a choropleth map of Australia where the size of each state/territory is proportional to the male-to-female population ratio. (# of males in each state is in column Tot_P_M, # of females in each state is in column Tot_P_F)
# Do the relevant imports of pandas and folium import pandas as pd import folium
# Import requests, zipfile, io to unpack our zipped data import requests, zipfile, io
australian_states = requests.get
datapack_2021_zipped = requests.get
datapack_2021 = zipfile.ZipFile(io.BytesIO(datapack_2021_zipped.content))
datapack_2021.extractall()
census_data = pd.read_csv(`./2021 Census GCP States and Territories for AUS/2021Census_G01_AUST_STE.csv`)
#updateing the states code in the pandas dataframe to match the id- #-attribute of the geospatial data in the JSON file
census_data.STE_CODE_2021 = census_data.STE_CODE_2021 - 1
census_data.drop(8)
# This code produces the vis shown at the top of this question
m_australia = folium.Map(location=(-23.07, 132.08), zoom_start=5)
folium.Choropleth(
geo_data=australian_states,
data=census_data,
columns=["STE_CODE_2021", "Tot_P_P"],
key_on=`feature.id`,
fill_color=`RdBu`
).add_to(m_australia)
m_australia
Submit your visualization(s), along with a short report indicating what tasks/questions/comparisons they are designed to enable and how,to the assessment link on the LMS by the deadline
Part 2.5 - Network data visualization with Python, networkx and bokeh
In this part, you will be creating a visualization based on the character interactions in Victor Hugo`s novel Les Misérables (later a famous musical and film adaptation featuring Australian (formerly?) national treasure Hugh Jackman. (Not to be confused with American National Treasure starring Nicolas Cage)). The above map is a simple visualization of this data set.
Your job is to create a visualization, or visualizations, that enables interesting comparisons in this dataset. Please feel free to explore as you see fit, but if you don`t have any thoughts, consider creating a visualization that (a) uses a circular layout and (b) takes advantage of the weights included in the dataset.
Code to get you started follows on the next page:
import pandas as pd
importnetworkxasnx
importshutil
from bokeh.io importoutput_notebook, show
frombokeh.plottingimport figure
frombokeh.modelsimport Range1d, Circle
frombokeh.plottingimportfrom_networkx
output_notebook()
output_notebook()
shutil.unpack_archive(`lesmis.zip`)
# Read the dataset (replace `lesmis.mtx` with the correct filename)
data = pd.read_csv(`lesmis.mtx`, sep=` `, skiprows=2, names=[`n1`, `n2`, `weight`])
# Create a graph
G = nx.Graph()
for index, row indata.iterrows():
G.add_edge(row[`n1`], row[`n2`], weight=row[`weight`])
layout = nx.spring_layout(G)
# Create a Bokeh plot
title = "Les Misérables Character Interactions"
plot = figure(title=title, x_range=Range1d(-1.1, 1.1), y_range=Range1d(-1.1, 1.1))
# Create a Bokeh graph from NetworkX graph
network_graph = from_networkx(G, layout)
# Set node size and color
network_graph.node_renderer.glyph = Circle(size=10, fill_color=`skyblue`)
# Add network graph to the plot
plot.renderers.append(network_graph)
# Show the plot
show(plot)
Submit your visualization(s), along with a short report indicating what tasks/questions/comparisons they are designed to enable and how,to the assessment link on the LMS by the deadline.
Note: You need to do part 3, 4 and 5 of this.
BEN02 Planning and Presenting a Micro-Enterprise Idea BTEC Level 1/2
Read MoreBTEC Unit 35: Engineering Services Delivery Plan for Sector-Specific Organizations | HND Level 5 Assignment 2
Read MoreTQUK Level 3 Administering Medication and Monitoring Effects in Adult Care Assignment
Read MoreUnit 10: 3D Modelling and Assembly Drawing for Vice – Engineering Design Portfolio BTEC Level 3
Read MoreWhy is it important that you correlate the appropriate information of the patient when they arrive for their appointment?
Read MoreNCFE Level 3 Roles and Responsibilities in Health And Social Care
Read MoreMP3395 Turbocharger Performance Evaluation and System Analysis CW2 Assessment, AY2024-25
Read MoreKey Research Policies and Funding Models at University of Strathclyde
Read MoreCIPD Level 5 Associate Diploma Key Assessment Questions
Read MoreLaw Assignment Questions Critical Legal Analysis & Solutions
Read More