How is the Reported Sentiment Score Calculated?

The reported sentiment score on Buzzlytix is a comprehensive metric that reflects the overall mood of news coverage across all analyzed articles and sources. Here's how we calculate it:

1. Article Collection & Preprocessing

We gather articles from a variety of news sources daily.
Each article is processed to extract its main content and title.

2. Sentiment Analysis

For each article, we use a 5-class sentiment model (very positive, positive, neutral, negative, very negative) to predict the sentiment of the article's content.
If the content is missing, a fallback 2-class model is used on the title.
Each article's sentiment is represented as a probability distribution across the five classes.

3. Keyword Extraction & Consolidation

We extract key entities (people, organizations, places, etc.) from each article's title using advanced natural language processing (spaCy).
Keywords are consolidated to group similar or related terms together, ensuring accurate aggregation.

4. Aggregating Sentiment by Keyword

For each keyword, we aggregate the sentiment distributions from all articles mentioning it.
Soft counts (probabilities) are summed, and only keywords with at least 5 articles are included in the final statistics.

5. Calculating the Overall Sentiment Score

For each keyword, we compute a sentiment score by assigning values to each class: very negative = -2, negative = -1, neutral = 0, positive = 1, very positive = 2.
The score is normalized to the range [-1, 1] and damped by the proportion of neutral sentiment (to reduce the impact of uncertainty).
The final reported sentiment score is the average of these damped scores across all keywords.

Interpretation: A score near 1 means news is overwhelmingly positive, -1 means overwhelmingly negative, and 0 means neutral or mixed coverage.

Technical Details

Sentiment models: OpenAI API (gpt-4.1-nano, 5-class prompt-based classification).
Keyword extraction: spaCy large English model.
Data is updated daily and statistics are recalculated for each new batch of articles.