How is the Reported Sentiment Score Calculated?

The reported sentiment score on Buzzlytix is a comprehensive metric that reflects the overall mood of news coverage across all analyzed articles and sources. Here's how we calculate it:

1. Article Collection & Preprocessing

  • We gather articles from a variety of news sources daily.
  • Each article is processed to extract its main content and title.

2. Sentiment Analysis

  • For each article, we use a 5-class sentiment model (very positive, positive, neutral, negative, very negative) to predict the sentiment of the article's content.
  • If the content is missing, a fallback 2-class model is used on the title.
  • Each article's sentiment is represented as a probability distribution across the five classes.

3. Keyword Extraction & Consolidation

  • We extract key entities (people, organizations, places, etc.) from each article's title using advanced natural language processing (spaCy).
  • Keywords are consolidated to group similar or related terms together, ensuring accurate aggregation.

4. Aggregating Sentiment by Keyword

  • For each keyword, we aggregate the sentiment distributions from all articles mentioning it.
  • Soft counts (probabilities) are summed, and only keywords with at least 5 articles are included in the final statistics.

5. Calculating the Overall Sentiment Score

  • For each keyword, we compute a sentiment score by assigning values to each class: very negative = -2, negative = -1, neutral = 0, positive = 1, very positive = 2.
  • The score is normalized to the range [-1, 1] and damped by the proportion of neutral sentiment (to reduce the impact of uncertainty).
  • The final reported sentiment score is the average of these damped scores across all keywords.
Interpretation: A score near 1 means news is overwhelmingly positive, -1 means overwhelmingly negative, and 0 means neutral or mixed coverage.
Technical Details
  • Sentiment models: OpenAI API (gpt-4.1-nano, 5-class prompt-based classification).
  • Keyword extraction: spaCy large English model.
  • Data is updated daily and statistics are recalculated for each new batch of articles.