How is the Reported Sentiment Score Calculated?
The reported sentiment score on Buzzlytix is a comprehensive metric that reflects the overall mood of news coverage across all analyzed articles and sources. Here's how we calculate it:
1. Article Collection & Preprocessing
- We gather articles from a variety of news sources daily.
 - Each article is processed to extract its main content and title.
 
2. Sentiment Analysis
- For each article, we use a 5-class sentiment model (very positive, positive, neutral, negative, very negative) to predict the sentiment of the article's content.
 - If the content is missing, a fallback 2-class model is used on the title.
 - Each article's sentiment is represented as a probability distribution across the five classes.
 
3. Keyword Extraction & Consolidation
- We extract key entities (people, organizations, places, etc.) from each article's title using advanced natural language processing (spaCy).
 - Keywords are consolidated to group similar or related terms together, ensuring accurate aggregation.
 
4. Aggregating Sentiment by Keyword
- For each keyword, we aggregate the sentiment distributions from all articles mentioning it.
 - Soft counts (probabilities) are summed, and only keywords with at least 5 articles are included in the final statistics.
 
5. Calculating the Overall Sentiment Score
- For each keyword, we compute a sentiment score by assigning values to each class: 
very negative = -2,negative = -1,neutral = 0,positive = 1,very positive = 2. - The score is normalized to the range [-1, 1] and damped by the proportion of neutral sentiment (to reduce the impact of uncertainty).
 - The final reported sentiment score is the average of these damped scores across all keywords.
 
        Interpretation: A score near 1 means news is overwhelmingly positive, -1 means overwhelmingly negative, and 0 means neutral or mixed coverage.
    
    Technical Details
- Sentiment models: OpenAI API (gpt-4.1-nano, 5-class prompt-based classification).
 - Keyword extraction: spaCy large English model.
 - Data is updated daily and statistics are recalculated for each new batch of articles.