Data and Analytics Assignment
Due Sunday by 11:59pm Points 80 Submitting a text entry box or a website url Introduction
You will again use Google Docs to create a short paper. If you need a refresher, please view the introduction to Google Docs tutorial.
Last week, we talked about databases and big data. And in those discussions, we also talked about data warehouses and data mining techniques. It’s bit difficult to conceptualize the purpose of all this data and the kinds of questions we can or might want to answer. Thus, in this assignment, you will use what I think to be a couple of fun tools that creates interactive graphs of all kinds of population, demographic, and other data over a long period of time. This tool is provided by an organization called Gapminder. A second group called Our World Data is partially funded by Oxford University. They also publish a number of world datasets and visualization. In addition, you will acquire data from the Unites States Energy Information.
You will then use this data to make fact-based decisions and opinions. You will also try to determine the root causes of trends and what that data is telling you. That is the whole purpose of big data.
You will need to use the following links to Gapminder, Our World Data, and the US Energy Information Administration to complete this assignment:
The Gapminder visualization tool , which contains a default chart of life expectancy over income over time.
The Gapminder data tool has a list containing all of the source data that is used by the tools. This data is available in Excel and other formats. Thus, you can load these files yourself and work with them.
The United States Energy Information homepage.
The United States Energy Information Administration is where you can find raw data and graphs of US energy production and consumption.
Our World in Data publishes datasets similar to Gapminder.org.
Note: As you work through this assignment feel free to use other data sources. There are many of them.
To give you an idea of what I want you to do and what I am looking for, follow along with this short exercise.
Visit the Gapminder tool to view the default dataset. This dataset appears in the following figure. Your screen may vary slightly as new data is being added and the site updated.
The Gapminder tool has the following user interface elements:
A bubble graph occupies the main screen. It displays the year being displayed. The graph type can be changed to view the data on a map or to create line charts that visualize trends.
The track bar and play button allows you to visualize the data over time. Along the right side of the track bar, you control the speed of the visualization.
Along the left margin is a selectable drop-down with which you can change the data displayed on the Y axis. Life expectancy is being rendered in the above figure.
Below the graph is a selectable drop-down with which you can change the data displayed on the X axis. Income is being rendered in the above figure.
Now, play the simulation. As you do, pay close attention to the following:
Look closely what happened to life expectancy in 1917 and 1918. It dropped significantly worldwide. Do you know why? If you research why, you will find that it can be attributed to the influenza epidemic of 1918. This epidemic killed more people than World War I.
Look what happened to income between about 1927 and 1930. Especially in the United States. It’s the great depression.
Look again what happed to the worldwide population between about 1945 and 1950. Life expectancy dropped significantly. Especially for countries like Russia and Poland. You are seeing the effects of World War II. You can also use this visualization to look at those countries that suffered the greatest loss of life because of the war.
Using the drop-down button on the Y Axis, change the dataset. When you click the drop-down, you will see a drill-down menu appear. Select Energy, Total, Energy Production. Play the simulation again.
China produced very little energy before the 1970. Suddenly, energy production spiked significantly. Why did this happen? Did they strike oil? Did they build nuclear power plants?
Now change the selection to select only hydroelectric power. You should see that nearly all of the growth in energy production was caused by hydro power.
So once again, you have discovered the root cause. The following table explains the data. As you can see, China has built many very large dams.
Now look at another visualization tool. Visit the Our World in Data Population page. Scroll down and play the population simulation. Here, the data is rendered on a map. The regions change color to indicate the population. I find this visualization to be less granular and flexible than the Gapminder tool.
Assignment Details and Tasks
You should answer the first four questions in a paragraph or two. I would expect your final answer to be one to two pages long.
Value: 10 points
Summarize where the world gets its energy (regions of the world) and the source of that energy—oil, nuclear, hydroelectric, wind, etc. How have those sources changed over time?
To answer this question, you should use the visualization tools supplied by Gapminder or Our World in Data. Look at energy production and consumption worldwide. You will need to tell me the time periods of the data you reviewed and which tools and visualizations you used to create your answer. What raw data did you look at to create your answer?
Value: 10 points
In what region (countries), is the per capita energy consumption the greatest? Have the trends changed between the 1960s to the present day?
Value: 10 points
This EIA Annual Energy Review page contains significant data about United States energy sources and usage patterns. Using the spreadsheets and graphics from this site, explain the trends in total energy production and the different resources (gas, oil, coal) used to produce what energy.
Then answer where that energy goes (is consumed)? Quite a bit of it is used to generate electricity.
Value: 10 points
To what extent do renewable energy sources contribute to total energy production? Have the trends changed between the 1960s to the present day? Is the amount of renewable energy resources enough to satisfy demand in the United States?
Value: 20 points
Electric cars are in vogue. In fact, many of you have talked about electric cars as a disruptive technology. I have heard the statement many times that “electric cars will eliminate carbon dioxide emissions.” For this question, you should evaluate the validity of this statement using the data that you acquire from these different data sources.
Years of completion
10/3/21, 8:23 PM Data and Analytics Assignment
The assignment is worth 60 points. To receive full credit, you should do the following:
Make sure that you cite the exact source of your data to support the statements that you make. For example, you might say that hydroelectric power in China increased by 21% between 2007 and 2018 according to the Gapminder raw electric production data file (Hydroelectric electricity production, total). Show me the numbers and make sure to tell me the unit of measure. For example, “According to the Gapminder raw data” the hydroelectric production in 2007 was 41725193.5 TOE in 2007 and 50317024.9 in 2008.
Feel free to create your own charts in Google Sheets to explain your answers.
Note that different sources measure energy in different units such as BTUs or TOE. Make sure that you identify those units of measure and convert them to equivalent results as your prepare your answers.
Share your Google document with me and post the URL as your assignment submission. My Google account is [email protected] (mailto:[email protected]) . Thus, your final assignment deliverable will contain one link. Feel free to embed Google Sheet(s) as you see fit to express your answers. You might also create your own charts.