How is the Sentiment Score Calculated?

The reported sentiment score is a comprehensive metric that reflects the overall mood of news coverage across all analyzed articles and sources. Here's how we calculate it.

Article Collection & Preprocessing

We gather articles from a variety of news sources daily.

Each article is processed to extract its main content and title.

Sentiment Analysis

For each article, we use a 5-class sentiment model (very positive, positive, neutral, negative, very negative) to predict the sentiment of the article's content.

If the content is missing, a fallback 2-class model is used on the title.

Each article's sentiment is represented as a probability distribution across the five classes.

Keyword Extraction & Consolidation

We extract key entities (people, organizations, places, etc.) from each article's title using advanced natural language processing (spaCy).

Keywords are consolidated to group similar or related terms together, ensuring accurate aggregation.

Aggregating Sentiment by Keyword

For each keyword, we aggregate the sentiment distributions from all articles mentioning it.

Soft counts (probabilities) are summed, and only keywords with at least 5 articles are included in the final statistics.

Calculating the Overall Sentiment Score

For each keyword, we compute a sentiment score by assigning values to each class:

The score is normalized to the range [−1, 1] and damped by the proportion of neutral sentiment (to reduce the impact of uncertainty).

The final reported sentiment score is the average of these damped scores across all keywords.

very negative = −2negative = −1neutral = 0positive = +1very positive = +2

Interpretation — A score near +1 means news coverage is overwhelmingly positive, a score near −1 means overwhelmingly negative, and a score near 0 indicates neutral or mixed coverage.

+1 Positive0 Neutral−1 Negative

Technical Details

Sentiment models: HuggingFace Transformers and custom 5-class classifier.

Keyword extraction: spaCy large English model.

Data is updated daily and statistics are recalculated for each new batch of articles.