tickerArticleScorer

Sleeping

App Files Files Community

sentivity commited on May 19

Commit

0f4504c

verified ·

1 Parent(s): 141078d

Upload articleTickerScore README.txt

Browse files

Files changed (1) hide show

articleTickerScore README.txt +95 -0

articleTickerScore README.txt ADDED Viewed

	@@ -0,0 +1,95 @@

+ArticleTickerScore: News-Based Ticker Sentiment Tool
+A tool designed to measure short-term sentiment around major stock tickers using recent financial news headlines. It uses a combination of live article fetching, text preprocessing, and a PyTorch-based neural model to return a normalized sentiment score.
+Goals
+- Allow users to input a stock ticker and retrieve the most recent news article associated with it (via Polygon.io).
+- Clean and tokenize the article for interpretation.
+- Run the processed text through a trained LSTM sentiment model.
+- Normalize and display the result in an interpretable format.
+- Cache results for efficiency and refresh scores every 30 minutes.
+Requirements
+- gradio
+- torch
+- requests
+- transformers
+- datetime
+- re
+- os
+Model Components
+ScorePredictor class: A PyTorch-based LSTM classifier for sentiment scoring. It includes:
+- An embedding layer (based on vocab size)
+- A hidden LSTM layer for sequential understanding
+- A linear + sigmoid output layer for binary-style scoring (normalized afterward)
+AutoVectorizer: A trained vectorizer model that transforms input strings into vectors, capturing the strings’ textual features. The vector form can be interpreted by the AutoClassifier.
+AutoClassifier: A binary classification model that labels vectorized Reddit posts as either sociopolitical (1) or not (0). It is used to filter out any irrelevant posts from the data set.
+Main Script
+1. Input Validation
+- Converts the ticker to uppercase.
+- Checks if it’s among the predefined tickers (AAPL, GOOG, AMZN, META, NVDA).
+- If invalid, returns a friendly message and a default score.
+2. Caching
+Uses a global cache (sentiment_cache) to store:
+- Last article
+- Last sentiment score
+- Timestamp
+Uses is_cache_valid to determine if data is stale (older than 30 minutes).
+3. Article Fetching
+Uses Polygon.io’s /v2/reference/news API to fetch the most recent article for the ticker.
+Extracts the title + description into a single string for model input.
+4. Preprocessing
+Cleans the article text using regex:
+5. Sentiment Scoring
+Tokenizes the cleaned text using the same tokenizer the model was trained with (cardiffnlp/xlm-twitter-politics-sentiment).
+Passes the tokens into the ScorePredictor model.
+Applies a custom normalization from [0.3, 0.9] → [0.0, 1.0].
+6. Output
+Returns a dictionary containing:
+"article" – full text of the news snippet
+"sentiment" – normalized score between 0.0 and 1.0
+Helper Functions:
+fetch_articles(ticker): Pulls a single article for the ticker via Polygon API.
+preprocess_text(text):  Cleans and tokenizes the article text.
+predict_sentiment(text): Runs the cleaned text through the LSTM model and returns a normalized sentiment score.
+is_cache_valid(timestamp): Checks if cached data is less than 30 minutes old
+analyze_ticker(ticker): Full logic for validating, caching, fetching, scoring, and returning sentiment results.
+display_sentiment(ticker): Converts sentiment results into HTML format for rendering.
+End Result
+A web app that allows you to display the predicted sentiment of five major tickers: AAPL, GOOG, AMZN, NVDA, META.