In this assignment, we are going to study test scores from the The Programme for International Student Assessment (PISA) scheme, which tests 15-year-old students across all states in Australia.
You can find the data sets and a code book for the assignment in the Data folder. Broadly speaking, PISA measures scholastic ability across three categories: science, reading and math.
The goal of this analysis is to understand if differences exist between PISA testing scores across various dimensions, such as income, school type, extra-curricular activities and gender.
Please ensure that the report knits properly into html and all the R code and R outputs are visible in the final knitted report.** **You will need to save your rendered html document into a pdf file (you can use your internet browser to print your html file into a pdf file) and upload that pdf file into Moodle for submission.
This is an individual assignment and you must use R code to answer all the questions. Make sure that you have your messages and warnings turned off before you submit the assessment (see lines 15-17 of this Rmd file) and echo = FALSE set for the R code chunk where you load your libraries.
Question
1. Read in the pisa data set and show the last 5 rows and last three variables in the data frame (1pt).
2. Calculate the 75th quantile for `math`, `science` and `read` score across states. Create a table where you display the results. Which state has the highest score in `read`.
3. For female students born after or on 2000, which type of school had the highest average performce in `read.
4. For male and female students born on or after 2000, across the different types of schools, whose scores were more variable in reading?. Math?. Science ? Place all results in a single table to receive full marks.
5. For the states of VIC, NSW, and QLD, using `geom_histogram` plot the distribution of marks in `read` by gender using a faceted plot with shading to capture school type. Which combination of states/schools do female students peform worst in? Are the results similar for `science`?
6. Repeat the above exercise using `geom_density`. Which set of results, those for `geom_density` or `geom_histogram` allows one to more accurately compare across results? Why?
7. First, create a data frame called _pisa_filtered_ that excludes observations with missing values **among any variable. Then, calculate a new variable called _tot_score_ that is the sum of the math/science/reading scores and add this to the data frame _pisa_filtered_; in addition, calculate a new variable _tot_time_ as the sum of the math_time/read_time/science_time and add this new variable to the data frame _pisa_filtered_ (1pt). Using a `geom_point()` describe the relationship between _tot_time_ (x-axis) and _tot_score_ (y-axis).
8. The Australian PISA test is administered in English. It is believed that, on average, students who speak languages other than english at their primary residence may be disadvantaged by having to take the test in English. The variable `language` records the language most often spoken at students homes, with language code 313 referring to `English`. On average, do students who do not speak English in their home perform worse than native english speakers in terms of total scores? Does your answer remain the same when we look at students who perform in the lowest 25th quantile?
9. Using `facet_wrap()`, plot the total scores using densities for both males and females. What do the results tell us about the usefulness of looking at average scores?
10. Previous analysis has so yielded several categorical variables that may influence test scores, such as _gender_, _music_instr_, and what type of school the student is from, _schtype_. However, there are many other numerical variables that we may have missed. As a first step in understanding these variables, we can cluster the data to try and see if any patterns emerge beyond those already seen. Use k-means clustering on the following variables with $k=5$ clusters: _anxtest_, _motivat_, _tot_time_, and _tot_score_.Report the number of elements in each cluster.
11. To understand the clusters, produce a table that displays the median value of each variable in the cluster, and arrange the values from largest to smallest by _tot_score. Describe the relationship between the clusters and the variables _tot_score_, _anxtest_, _motivat_; i.e., what do you notice about the similarity between these variables and the assigned cluster?
12. Plot the relationship between `tot_scores` and color the plot by cluster. . Is there a meaningful difference between the scores across the different clusters? (. What do you think this finding says about the importance of motivational factors in overall test scores?
13. Do the findings described in question 11 remain true if we instead analyse the relationship between _read_ and cluster assignment?
This ETC1010-5510 - IT Computer Science has been solved by our PhD Experts at UnilearnO. Our Assignment Writing Experts are efficient to provide a fresh solution to this question. We are serving more than 10000+ Students in Australia, UK and US by helping them to score HD in their academics. Our Experts are well trained to follow all marking rubrics and referencing style.
Be it a used or new solution, the quality of the work submitted by our assignment experts remains unhampered. You may continue to expect the same or even better quality with the used and new assignment solution files respectively. There’s one thing to be noticed that you could choose one between the two and acquire an HD either way. You could choose a new assignment solution file to get yourself an exclusive, plagiarism (with free Turnitin file), expert quality assignment or order an old solution file that was considered worthy of the highest distinction
SHN6023 : Mental Health, Resilience and Recovery Across the Life-course – Case Study Assignment
Read MoreBUS6009 : International Business Management – Written Case Report
Read MoreBUS6018 : PROJECT MANAGEMENT – PROJECT PLAN
Read MoreHCM4003 : Communication and Interprofessional Collaboration – Podcast
Read MoreQHO335 : Business Project – Critical evaluation of an organisation’s response during the cost-of-living crisis in the UK
Read MorePRM7006 : Management of Traditional Projects – PID Assignment
Read MoreBMA5108-20H : International Business – Strategic Evaluation
Read MoreCA5055 : Airline Revenue and Pricing Management – REPORT
Read MoreCA5056 Aviation Psychology and Human Factors Assignment brief
Read MoreHow can i assist with youGBEN5006 : Intrapreneurial Development – Portfolio
Read More