sentivity commited on
Commit
0f4504c
·
verified ·
1 Parent(s): 141078d

Upload articleTickerScore README.txt

Browse files
Files changed (1) hide show
  1. articleTickerScore README.txt +95 -0
articleTickerScore README.txt ADDED
@@ -0,0 +1,95 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ArticleTickerScore: News-Based Ticker Sentiment Tool
2
+
3
+ A tool designed to measure short-term sentiment around major stock tickers using recent financial news headlines. It uses a combination of live article fetching, text preprocessing, and a PyTorch-based neural model to return a normalized sentiment score.
4
+
5
+
6
+ Goals
7
+
8
+ - Allow users to input a stock ticker and retrieve the most recent news article associated with it (via Polygon.io).
9
+ - Clean and tokenize the article for interpretation.
10
+ - Run the processed text through a trained LSTM sentiment model.
11
+ - Normalize and display the result in an interpretable format.
12
+ - Cache results for efficiency and refresh scores every 30 minutes.
13
+
14
+
15
+ Requirements
16
+
17
+ - gradio
18
+ - torch
19
+ - requests
20
+ - transformers
21
+ - datetime
22
+ - re
23
+ - os
24
+
25
+
26
+ Model Components
27
+
28
+ ScorePredictor class: A PyTorch-based LSTM classifier for sentiment scoring. It includes:
29
+ - An embedding layer (based on vocab size)
30
+ - A hidden LSTM layer for sequential understanding
31
+ - A linear + sigmoid output layer for binary-style scoring (normalized afterward)
32
+
33
+ AutoVectorizer: A trained vectorizer model that transforms input strings into vectors, capturing the strings’ textual features. The vector form can be interpreted by the AutoClassifier.
34
+
35
+ AutoClassifier: A binary classification model that labels vectorized Reddit posts as either sociopolitical (1) or not (0). It is used to filter out any irrelevant posts from the data set.
36
+
37
+
38
+
39
+ Main Script
40
+
41
+ 1. Input Validation
42
+
43
+ - Converts the ticker to uppercase.
44
+ - Checks if it’s among the predefined tickers (AAPL, GOOG, AMZN, META, NVDA).
45
+ - If invalid, returns a friendly message and a default score.
46
+
47
+ 2. Caching
48
+
49
+ Uses a global cache (sentiment_cache) to store:
50
+
51
+ - Last article
52
+ - Last sentiment score
53
+ - Timestamp
54
+
55
+ Uses is_cache_valid to determine if data is stale (older than 30 minutes).
56
+
57
+ 3. Article Fetching
58
+
59
+ Uses Polygon.io’s /v2/reference/news API to fetch the most recent article for the ticker.
60
+
61
+ Extracts the title + description into a single string for model input.
62
+
63
+ 4. Preprocessing
64
+
65
+ Cleans the article text using regex:
66
+
67
+ 5. Sentiment Scoring
68
+
69
+ Tokenizes the cleaned text using the same tokenizer the model was trained with (cardiffnlp/xlm-twitter-politics-sentiment).
70
+
71
+ Passes the tokens into the ScorePredictor model.
72
+
73
+ Applies a custom normalization from [0.3, 0.9] → [0.0, 1.0].
74
+
75
+ 6. Output
76
+
77
+ Returns a dictionary containing:
78
+ "article" – full text of the news snippet
79
+ "sentiment" – normalized score between 0.0 and 1.0
80
+
81
+
82
+ Helper Functions:
83
+
84
+ fetch_articles(ticker): Pulls a single article for the ticker via Polygon API.
85
+ preprocess_text(text): Cleans and tokenizes the article text.
86
+ predict_sentiment(text): Runs the cleaned text through the LSTM model and returns a normalized sentiment score.
87
+ is_cache_valid(timestamp): Checks if cached data is less than 30 minutes old
88
+ analyze_ticker(ticker): Full logic for validating, caching, fetching, scoring, and returning sentiment results.
89
+ display_sentiment(ticker): Converts sentiment results into HTML format for rendering.
90
+
91
+
92
+ End Result
93
+
94
+ A web app that allows you to display the predicted sentiment of five major tickers: AAPL, GOOG, AMZN, NVDA, META.
95
+