Question
1. This question consists of multiple CSV files (In the Zipped Folder) with ‘large texts’ in one of the columns in each file. Your job is to use the open-source NLP (Natural Language Processing) libraries and perform various tasks.
Task
i. Extract the ‘text’ in all the CSV files and store them into a single ‘.txt file’.
ii. Research Install the libraries(SpaCy - scispaCy ‘en_core_sci_sm’/’en_ner_bc5cdr_md’). Install the libraries (Transformers (Hugging Face) - and any bio-medical model (BioBert) that can detect drugs, diseases, etc from the text).
iii. Programming and Research
Using any in-built library present in Python, count the occurrences of the words in the text (.txt) and give the ‘Top 30’ most common words. And store the ‘Top 30’ common words and their counts into a CSV file.
Using the ‘Auto Tokenizer’ function in the ‘Transformers’ library, write a ‘function’ to count unique tokens in the text (.txt) and give the ‘Top 30’ words.
iv. Named-Entity Recognition (NER) Extract the ‘diseases’, and ‘drugs’ entities in the ‘.txt file’ separately using ‘en_core_sci_sm’/’en_ner_bc5cdr_md’ and biobert. And compare the differences between the two models (Example: Total entities detected by both of them, what’s the difference, check for most common words, and check the difference.)
2. Here’s an adventurous story intertwined with Python programming questions that involve nested for loops, conditional statements, string manipulations, and more.
BEN02 Planning and Presenting a Micro-Enterprise Idea BTEC Level 1/2
Read MoreBTEC Unit 35: Engineering Services Delivery Plan for Sector-Specific Organizations | HND Level 5 Assignment 2
Read MoreTQUK Level 3 Administering Medication and Monitoring Effects in Adult Care Assignment
Read MoreUnit 10: 3D Modelling and Assembly Drawing for Vice – Engineering Design Portfolio BTEC Level 3
Read MoreWhy is it important that you correlate the appropriate information of the patient when they arrive for their appointment?
Read MoreNCFE Level 3 Roles and Responsibilities in Health And Social Care
Read MoreMP3395 Turbocharger Performance Evaluation and System Analysis CW2 Assessment, AY2024-25
Read MoreKey Research Policies and Funding Models at University of Strathclyde
Read MoreCIPD Level 5 Associate Diploma Key Assessment Questions
Read MoreLaw Assignment Questions Critical Legal Analysis & Solutions
Read More