A. RATIONALE
This assignment aims to evaluate students’ proficiency in creating a database system and designing a model for comprehensive data management. The objectives for this module include:
CLO1: Creating database systems using appropriate design techniques and scripting languages (C6, PLO1, MQF1)
CLO2: Designing models for storage, cleansing, transforming, and integrating data to meet organizational standards (P7, PLO2, MQF2)
B. ASSIGNMENT/PROJECT EVALUATION
The following criteria will be utilized to assess your assignment/project submission:
C. REQUIREMENTS
You are tasked with creating a comprehensive data extraction, transformation, and loading (ETL) project based on a real dataset. You have the freedom to choose any dataset as long as it is a freely shared valid dataset generated by a legitimate organization or company.
The ETL project will involve data extraction, transformation, and loading processes along with supporting activities, as outlined in the tasks below.
Extract, Transform, Load (ETL) Case Study
You are assigned to an ETL project that involves processing a dataset in ETL and analytics activities. The project begins with web scraping activity to extract public data available on a website, such as weather data or Covid-19 pandemic-related information. For web scraping, you may utilize automation tools like Python-based Beautiful Soup. The tasks are divided into the following sections.
Accomplishing these tasks successfully will lead to the completion of the project. You may use the Python programming language for the tasks, utilizing an Integrated Development Environment (IDE) tool such as Jupyter Notebook. At the end of the extraction, ensure the dataset has more than sufficient attributes (approximately 10 attributes) including various data types like character, integer, number/float, date, etc. The dataset may also contain missing values and outliers.
Task 1: Data Extraction and Understanding the Dataset
Select a source website for web scraping and justify your choice. After web scraping, list down all available column names and provide explanations.
Task 2: Identifying the Variables/Columns
Analyze the dataset and recommend its use for decision-making purposes. Explain the relevance of each variable in the dataset and its suitability as an independent or dependent variable for the proposed model.
Task 3: Data Categorization and Data Profiling
Categorize columns into independent and dependent variables. Explain data profiling and its components. Evaluate the dataset with data profiling components.
Task 4: Data Transformation
Decide on data transformation techniques such as eliminating duplicate rows, handling missing values, converting categorical columns to numerical ones, detecting and handling outliers, and standardizing the data. Include justifications and screenshots.
Task 5: Analysis (Factor analysis / Classifications)
Describe the data and its correlations, conduct factor analysis or classifications, explain the relevance of discovered factors, and visualize them. Drop unnecessary variables.
Task 6: Data Loading
Load the transformed data into a local SQL database (e.g., MySQL, PostgreSQL) and provide a table specification using the specified template.
Business Economics Assignment 3: Case Study Analysis on Price Controls in Pharma and Monopoly Power in Airline Industry
Read MoreCSC408 MIS Case Study Assignment Report: Analysis of Issues and Solutions in Information Systems
Read MoreScientific Research Review Assignment 4: Advancements and Ethical Practices in Your Study Area Literature Synthesis
Read MoreOrganizational Development Assignment: Tech Solutions Inc. Case Study on Engagement, Retention, and Inclusive Culture
Read MoreHPGD3103 Instructional Technology Assignment: ASSURE Model-Based Lesson Design in Google Classroom
Read MoreAI Deepfake Cybersecurity Assignment: Evaluating Security Risks and Detection Techniques for Safe Digital Environments
Read MoreMGT4216E Strategic Innovation Management Assignment: Exploring Innovation Capabilities, Strategy Stages, and Leadership for Business Transformation
Read MoreEmployee Engagement & HR Strategy Assignment: Case Analysis of Tech Solutions Inc. on Turnover, D&I, Performance, and Work-Life Balance
Read MoreHigher Education Reform Assignment: Curriculum Development Strategies for a Future-Ready Malaysia
Read MoreCross-Cultural Management Assignment Report: Importance, Challenges & Strategies in Global Teams
Read More