CS5801 Quantitative Data Analysis Assessment – UK

The coursework is based on under taking an analysis of a real world data set. It is a shared assessment block for CS5701 Quantitative Data Analysis and CS5702 Modern Data. (Note: We have made some small modifications to the data set so please use our version from Brightspace do not use any original source version.)

This assessment offers an opportunity to bring together your skills from CS5701 Quantitative Data Analysis and CS5702 (Modern Data). Note that there is also a second shared assessment block CS5802 which takes the form of a written examination.

The requirements for the assessment block are as follows:
1. You will be provided with:
(i) a data set and its metadata, (note students will work on a subset of the data but in the same data format)
(ii) research questions to guide your analysis
(ii) a proforma Rmd file to use to complete your coursework.

2.Using the pro forma please address the following:

  1. Organise and clean the data
    1.1 subset the data into the specific data subset allocated
    1.2 data quality analysis
    1.3 data cleaning
  2. Exploratory data analysis
  3. Modelling
    3.1 explain your analysis plan
    3.2 build a model for property price
    3.3 critique your model using relevant diagnostics
    3.4 based on 3.2 and 3.3 suggest improvements to your model
  4. Extension work:
    4.1 Model the likelihood of a property being furnished (using the is_furnished variable provided)

3. Generating your personal data sets

  1. Each student should use data based a subset from the overall dataset houses analysis.Rda which can be downloaded from the Brightspace page for CS5801.
  2. The subset of the data that you will work on depends on your student id.
  3. At the top of the proforma-2022-v1.0.Rmd file provided can be downloaded from the Bright space page of CS5801 there are instructions on how to use your student id to obtain the sub set of data you need to work on.
  4. The code for sub setting is embedded in the RMarkdown template. You need to configure it for your student id.
  5. If you are uncertain please check!

General Guidance:

  1. You are expected to use R and RMarkdown for your analysis.
  2. Use the template RMarkdown as a starting point to structure your report but remember to remove our scaffolding and guidance comments before you submit. It is available from the CS5801 Brightspace page.
  3. Update the YAML to include your name and other identifier information.
  4. Include all relevant R code chunks and provide explanation, comments and discussion as appropriate.
  5. Follow the principles of ‘literate programming’ so choose meaningful variable and function names and add comments.
  6. You can also submit supplementary files if you wish, but you must include a single report file that contains your entire report in .Rmd format. Make sure any supplementary files are cross- referenced from your main report.
  7. Where appropriate cite external sources and add a bibliography at the end of your main report.
  8. The report should be professionally presented with good structure, an absence of spelling errors and other typos and written in an appropriate style (i.e., simple to the point, unemotive language).
  9. Make sure you respect the 3000 word limit as we discourage excessive padding, so unnecessary words and waffle will militate against professional presentation.
  10. Some times even suitable models do not have good fit due to the nature of the data. In such circumstances you will not be penalised.
  11. Whilst we encourage collaboration and sharing of ideas this is an individual report and so must be based on your own understanding, analysis and words. Wise Flow automatically cross compares all submissions.
  12. Wise Flow also has a plagiarism detector for external sources. We encourage you to use such sources including R packages, code, ideas for data analysis and other statistical sources but you must acknowledge the sources. In other words, do not attempt to pass off the work of others as your own.
  13. If you have questions please check the assessment FAQs or ask one of us. Don’t guess!

LO1: Design and implement methods and protocols for data preparation and exploration using advanced statistical techniques.
LO2: Apply these methods on real data to generate novel insight, critically evaluate its value and design a frame work for data management and sharing.

Below we give the detailed marking scheme.

No Comment.