DISASTER ALERT SYSTEM

 

Introduction

    From my Bachelor year, I have been seeing many of my close friends and relations dying due to road accidents. This is a serious problem which concerns me a lot. I decided from then on to follow this topic and the solutions that the major brains are figuring out.
    This post is a solution for more border problems where we train a model for detecting disasters from tweets.
    The project proposal can be found here. This is the first door that opens a wide project with numerous possibilities.
    I used the existing tweets dataset from Kaggle to train my model and the COLAB Notebook



Dataset

The following is the dataset
Fig 1: Disaster Twiter feed Dataset


The dataset consists of the id of the tweet, keywords in the tweet text, location of the tweet and the tweet text. The training set of the tweet alone has the target parameter which holds the resolution of the tweet.

If the target is 0 then it means that the tweet doesn't show any disaster-related information.
If the target is 1 then it means that the tweet has disaster-related information.

This dataset has 7613 data with 3 feature columns.

The Approach

  1. We will create a classifier that will predict whether a given tweet has disaster information or not.
  2. We then test it against the test dataset.
  3. Then we'll try different methods to improve the accuracy.

The Work

I chose the Naive Bayes Classifier for classification. I have given a blog post reference regarding my previous work here which I have extended for the current project. I have used the following concepts which I have explained in detail in the previous blog posts to improve the model.
  1. K-fold cross-validation
  2. Bayes Theorem
  3. Naive Bayes Theorem
  4. Smoothening
  5. Removing Stop words and specific positive & negative words
At first, I got the initial accuracy to be 75% on doing all of the above. To improve the accuracy I had to do some extra work.

The extra work that I had to put here was to add the two other columns probabilities to the result ie. "keyword" and "location"

So all I did was count their sentiment frequency and divide by total sentiments to get their individual probabilities and apply the Naive Bayes Theorem.

P(Word_1,Word_2,...,Word_n, keyword, location | Sentiment) = P(Word_1 | Sentiment).P(Word_2 | Sentiment). ... .P(Word_n | Sentiment).P(keyword | Sentiment).P(location | Sentiment)

Problems Faced

  • The picking of the rite most frequent words from the tweet was difficult.
  • Merging data frames were causing too many issues which I have learnt now.
  • The data processing for sliding the string took more time than the previous almost 44 minutes need to be waited for to get the results as the dataset was too large. I think we need to improve the performance by using any techniques that will save time.

Future Work

This project has given only the prediction for any random tweet to be a disaster or not. With this, we can extend our application to a whole new dimension of applications like:
  • First response notification when a threat occurs which I have mentioned in the project proposal.
  • Finding frequent accident/terror/disaster zones and providing suitable suggestions like:
    • Dynamic Speed Limits.
    • Optimal and nearer Locations for the first response team.
    • Disaster countermeasures.

Results




The validation efficiency is 78.831% which was a great improvement from my start.

The final Predictions:


References

Comments

Popular posts from this blog

CIFAR10 Image Classifier using PyTorch

Sentiment Prediction using Naive Bayes Algorithm