This app will classify computer science papers from their abstract and predict the papers topic using machine learning model hosted on Google Cloud Run.
How The Web App Works:
When the user enters the text and hits the "Submit" button the Python script then passes the text to another Rest APi that serves a model that predicts the topic of the paper. To learn more about how I deployed this app checkout this blogpost.
How The Model Works:
The model for this app is a text classification algorithm using Scikit-learn library. It was trained on an imbalanced dataset (corpus) that I created from summaries of papers published on arxiv.org. The topic of each paper was already labeled as the category therefore alleviating the need for me to label the dataset. The data was stored a MongoDB database and used to train a Support Vector Machine that uses weighting to alleviate the imbalance in the number of paper topics. You can read more about this part of the model development in my first blog post. To improve the model performance we used stop word removal and stemming through the Natural Language Toolkit (NLTK). Finally, we persist the entire pipeline using Joblib. You can read more about this aspect of the model training in the second blog post.