IPython file – IT Computer Science Assignment Help

You must also submit a 2-page report where you describe how you achieved each question briefly, with the relevant observations asked for below. You have the data in the file fake_news.tsv. Ensure your code runs from top to bottom without errors before submission. If you do use more than one IPython file, it must be clear which file corresponds to which questions

  1. Cross-validation on training data Using the load_data function already present in the
    template file, you are now ready to process the fake news data from fake_news.tsv. In order to train a good classifier, finish the implementation of the cross_validate function to do a 10-fold cross validation on the training data (leave the test data split of 20% alone for now). Make use of the given functions train_classifier and predict_labels to do the cross-validation. Make sure that your program stores the precision, recall, f1 score, and accuracy of your classifier in a variable cv_results, which should be an average for all folds and be returned by this function.
  2. Error Analysis Look at the performance of the classes using a confusion matrix (through the method provided confusion_matrix_heatmap to see what the balance of false positives and false negatives is for the ‘false’ (fake) label. Carry out an error analysis on a simple train-test split of the training data (e.g. the first fold from your cross-validation function). For this you should print out (or better, print to file) all the false positives and false negatives for the FAKE label to try to understand why the classifier is not getting these correct and write in your report some observations and examples of where it is getting confused
  3. Optimising pre-processing and feature extraction. Now that you have the numbers for accuracy of your classifier and have done some initial error analysis, think of ways to improve this score. Some ideas as to how to do this:
    • Improve the preprocessing. Which tokens might you want to throw out or preserve?
    • What about punctuation? Do not forget normalisation, lemmatising, stop word removal – what aspects of this might be useful?
    • Think about the features: what could you use other than unigram tokens? It may be useful to look beyond single words to combinations of words or characters. Also the feature weighting scheme: what could you do other than using binary values?
    • You could add extra stylistic features like the number of words per sentence.
    • You could consider playing with the parameters of the SVM (cost parameter? per-class weighting?)
    • You could do some feature selection, limiting the numbers of features through different controls on e.g. the vocabulary.
    • You could use external resources like publicly available lists of famous individuals or swear words to further add.
    Report what methods you tried and what the effect was on the classifier performance in your report and evidence the exploration in your notebook.
  4. Using other metadata in the file Now look beyond textual features of the statement.

The file provided contains a number of other features for each statement in the data file, which up till now you should not have used (subject, speaker, speaker_job_title, state_info, party_affiliation, total_barely_true_counts, total_false_counts, total_half_true_counts, total_mostly_true_counts, total_pants_on_fire_counts). You have to modify the load_data function to get these into the system, in addition to the other functions you have already modified.
Experiment with using the values from these columns as additional features to optimize your classifier’s performance and record the improvements in a results table in your report. Make it clear in your notebook how you have explored them and what improvements were caused by which features.

 

This IT Computer Science Assignment has been solved by our IT Computer Science Experts at TheTVAH. Our Assignment Writing Experts are efficient to provide a fresh solution to this question. We are serving more than 10000+ Students in Australia, UK & US by helping them to score HD in their academics. Our Experts are well trained to follow all marking rubrics & referencing Style. Be it a used or new solution, the quality of the work submitted by our assignment experts remains unhampered.
You may continue to expect the same or even better quality with the used and new assignment solution files respectively. There’s one thing to be noticed that you could choose one between the two and acquire an HD either way. You could choose a new assignment solution file to get yourself an exclusive, plagiarism (with free Turn tin file), expert quality assignment or order an old solution file that was considered worthy of the highest distinction.

No Comment.