Exploring the superiority of LLMs: A Dive into a Challenging Multi-label Classification task!

Somsubhra De

Amid the COV-19 pandemic, social media had become a battleground for facts and falsehoods. Millions of tweets with anti-vaccine sentiments swirling around, each carrying its own message, some informative, others misleading – posing a significant challenge to public health efforts. The situation involved a mix of perspectives, with skepticism towards vaccines prevailing for various reasons such as political dynamics, religion, apprehensions about side effects, country of manufacture, pharma and more.

But what if we told you that ML can be effectively used to provide valuable insights for policymakers, health organizations to intervene, utilize real time social media discussions and inform evidence-based strategies that address public concerns and promote informed decision-making!

We worked on a dataset consisting of around 9.9k tweets posted during the years 2020-21 where a tweet could be classified into 12 unique categories, with each text being mapped to more than one label, contingent on the stances expressed. Challenging, right? Our focus was on developing a robust multi-label classifier capable of assigning specific concern labels to tweets based on the articulated apprehensions towards vaccines. Well… traditional ML methods like SVM, Naive Bayes, Random Forest and even transformer models like BERT, DistilBERT… and Classifier chains have become common nowadays- something that is tried by many in this domain!

What might sound interesting is, besides these, we tried the state of the art GPT 3.5 – which was almost unexplored in the realm of this intricate multi-label classification. Our exploration involved the utilization of diverse prompt engineering methods, such as chain of thought (CoT), zero-shot & few-shot learning. We saw that an amalgamation of these prompting styles proved most effective for our objectives. Subsequently, we formulated a novel prompt template and noted an enhancement in the model performance when prompts requiring the model to generate explanations or reasonings were incorporated. It was impressive to see GPT 3.5 out-performing the other ML methods and proving its effectiveness in such a complex task, even after not having an OpenAI paid subscription and hence, using a small training capacity 🙂 A more extensive exploration and understanding of diverse prompting strategies could yield even better results in the future!

Curious? Do check out our complete research paper (Somsubhra De and Shaurya Vats, Decoding Concerns: Multi-label Classification of Vaccine Sentiments in Social Media, 2023) preprint at https://arxiv.org/abs/2312.10626