+44 7743 307695
May 26, 2023


This assignment aims to evaluate students’ proficiency in creating a database system and designing a model for comprehensive data management. The objectives for this module include:

CLO1: Creating database systems using appropriate design techniques and scripting languages (C6, PLO1, MQF1)

CLO2: Designing models for storage, cleansing, transforming, and integrating data to meet organizational standards (P7, PLO2, MQF2)


The following criteria will be utilized to assess your assignment/project submission:

  • Professional presentation standards
  • Suitability and feasibility of the proposed data management system
  • Adequacy of database creation and adherence to organizational requirements
  • Clear listing of assumptions


You are tasked with creating a comprehensive data extraction, transformation, and loading (ETL) project based on a real dataset. You have the freedom to choose any dataset as long as it is a freely shared valid dataset generated by a legitimate organization or company.

The ETL project will involve data extraction, transformation, and loading processes along with supporting activities, as outlined in the tasks below.

Extract, Transform, Load (ETL) Case Study

You are assigned to an ETL project that involves processing a dataset in ETL and analytics activities. The project begins with web scraping activity to extract public data available on a website, such as weather data or Covid-19 pandemic-related information. For web scraping, you may utilize automation tools like Python-based Beautiful Soup. The tasks are divided into the following sections.

Accomplishing these tasks successfully will lead to the completion of the project. You may use the Python programming language for the tasks, utilizing an Integrated Development Environment (IDE) tool such as Jupyter Notebook. At the end of the extraction, ensure the dataset has more than sufficient attributes (approximately 10 attributes) including various data types like character, integer, number/float, date, etc. The dataset may also contain missing values and outliers.

Task 1: Data Extraction and Understanding the Dataset

Select a source website for web scraping and justify your choice. After web scraping, list down all available column names and provide explanations.

Task 2: Identifying the Variables/Columns

Analyze the dataset and recommend its use for decision-making purposes. Explain the relevance of each variable in the dataset and its suitability as an independent or dependent variable for the proposed model.

Task 3: Data Categorization and Data Profiling

Categorize columns into independent and dependent variables. Explain data profiling and its components. Evaluate the dataset with data profiling components.

Task 4: Data Transformation

Decide on data transformation techniques such as eliminating duplicate rows, handling missing values, converting categorical columns to numerical ones, detecting and handling outliers, and standardizing the data. Include justifications and screenshots.

Task 5: Analysis (Factor analysis / Classifications)

Describe the data and its correlations, conduct factor analysis or classifications, explain the relevance of discovered factors, and visualize them. Drop unnecessary variables.

Task 6: Data Loading

Load the transformed data into a local SQL database (e.g., MySQL, PostgreSQL) and provide a table specification using the specified template.

Recent Post

Order this Assignment now

Total: GBP120

fables template