The reported sentiment score is a comprehensive metric that reflects the overall mood of news coverage across all analyzed articles and sources. Here's how we calculate it.
Article Collection & Preprocessing
We gather articles from a variety of news sources daily.
Each article is processed to extract its main content and title.
Sentiment Analysis
For each article, we use a 5-class sentiment model (very positive, positive, neutral, negative, very negative) to predict the sentiment of the article's content.
If the content is missing, a fallback 2-class model is used on the title.
Each article's sentiment is represented as a probability distribution across the five classes.
Keyword Extraction & Consolidation
We extract key entities (people, organizations, places, etc.) from each article's title using advanced natural language processing (spaCy).
Keywords are consolidated to group similar or related terms together, ensuring accurate aggregation.
Aggregating Sentiment by Keyword
For each keyword, we aggregate the sentiment distributions from all articles mentioning it.
Soft counts (probabilities) are summed, and only keywords with at least 5 articles are included in the final statistics.
Calculating the Overall Sentiment Score
For each keyword, we compute a sentiment score by assigning values to each class:
The score is normalized to the range [−1, 1] and damped by the proportion of neutral sentiment (to reduce the impact of uncertainty).
The final reported sentiment score is the average of these damped scores across all keywords.