Question
1. This question consists of multiple CSV files (In the Zipped Folder) with ‘large texts’ in one of the columns in each file. Your job is to use the open-source NLP (Natural Language Processing) libraries and perform various tasks.
Task
i. Extract the ‘text’ in all the CSV files and store them into a single ‘.txt file’.
ii. Research Install the libraries(SpaCy - scispaCy ‘en_core_sci_sm’/’en_ner_bc5cdr_md’). Install the libraries (Transformers (Hugging Face) - and any bio-medical model (BioBert) that can detect drugs, diseases, etc from the text).
iii. Programming and Research
Using any in-built library present in Python, count the occurrences of the words in the text (.txt) and give the ‘Top 30’ most common words. And store the ‘Top 30’ common words and their counts into a CSV file.
Using the ‘Auto Tokenizer’ function in the ‘Transformers’ library, write a ‘function’ to count unique tokens in the text (.txt) and give the ‘Top 30’ words.
iv. Named-Entity Recognition (NER) Extract the ‘diseases’, and ‘drugs’ entities in the ‘.txt file’ separately using ‘en_core_sci_sm’/’en_ner_bc5cdr_md’ and biobert. And compare the differences between the two models (Example: Total entities detected by both of them, what’s the difference, check for most common words, and check the difference.)
2. Here’s an adventurous story intertwined with Python programming questions that involve nested for loops, conditional statements, string manipulations, and more.
Business Economics Assignment 3: Case Study Analysis on Price Controls in Pharma and Monopoly Power in Airline Industry
Read MoreCSC408 MIS Case Study Assignment Report: Analysis of Issues and Solutions in Information Systems
Read MoreScientific Research Review Assignment 4: Advancements and Ethical Practices in Your Study Area Literature Synthesis
Read MoreOrganizational Development Assignment: Tech Solutions Inc. Case Study on Engagement, Retention, and Inclusive Culture
Read MoreHPGD3103 Instructional Technology Assignment: ASSURE Model-Based Lesson Design in Google Classroom
Read MoreAI Deepfake Cybersecurity Assignment: Evaluating Security Risks and Detection Techniques for Safe Digital Environments
Read MoreMGT4216E Strategic Innovation Management Assignment: Exploring Innovation Capabilities, Strategy Stages, and Leadership for Business Transformation
Read MoreEmployee Engagement & HR Strategy Assignment: Case Analysis of Tech Solutions Inc. on Turnover, D&I, Performance, and Work-Life Balance
Read MoreHigher Education Reform Assignment: Curriculum Development Strategies for a Future-Ready Malaysia
Read MoreCross-Cultural Management Assignment Report: Importance, Challenges & Strategies in Global Teams
Read More