Stock Price Prediction via Financial News Sentiment Analysis

Data Source:

WSJ News

Full-attribute Wall St. Journal News, including news published timestamp, keywords list, news headline, news body content, and news url. All in JSON file format.

Reuters News

Same as WSJ news data.

Intra-Day Stock Price

By-minute stock price scrapped from Google Stock API.
Use tickers from three major U.S. Enquity Stock Trade Exchanges: Nasdaq, NYSE, AMEX.
Scrapped data only contains the records that have price changes, meaning that if one stock does not have any price change in one hour, there will be no by-minute record within that hour.

meta/alias2Ticker.json

This file is collected from running queries with stock tickers to CityFalcon (https://www.cityfalcon.com/). It is still under construction and cleaning for better matching accuracy.
JSON schema: {"alias":"american airlines group","ticker":"AAL"}
use stopword.txt to delete tickers and aliases that show as common English words.

meta/tickerInfo.json

JSON schema: {"ticker": "ABC", "sector": "Some_Sector", "category": "Some_Cat", "group": "Some_Group"}
Includes 9 sectors, 31 categories, 212 groups

Data Pipeline

Actual Data Flow

Predicting Result

We found that a Linear Regression model could get quite good result. The comparision of different reacting time shows that the stock is sensitive to the financial press. The 7 minutes reacting time has highest accuracy among others.

Also, here is a sample of HeatMap of investment table, x-axis is company name represented by stock ticker, and y-axis is date of month. Well konwn companies like Google or Apple were on the press very often, but others like AMD showed up just once at that month.

Reference:

Git: http://www.vogella.com/tutorials/Git/article.html
UDFs: https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-sql-udfs.html
Dataset: https://www.balabit.com/blog/spark-scala-dataset-tutorial/
DataFrames: https://docs.databricks.com/spark/latest/dataframes-datasets/introduction-to-dataframes-scala.html
Word2Vec Sentiments: https://github.com/linanqiu/word2vec-sentiments

To-Do:

Improvement:

Stocks that do not have contineuous by-minute price record needs better simulation. Currently, the simulation is done by having all the missing records be the one that appear the latest. We need to mock up the missing price in a linear fashion between two price record.
Inspect the alias2ticker json file:

To eliminate confusing aliases.
To remove aliases that referring to non-US traded ticker symbols.
To add potential match-able aliases.
To add ticker symbols that constains .[dot]
To add alias-ticker pairs for missing tickers.

We have not tried the Deep Learning approaches; however, past researches have shown steep improvement with DL.
Automate the data ETL process.
Render real-time result on Tableau.

Experiment:

Currently, the target is computed from the average price of the open, close, highest, and lowest. We have not take the volume into the training. We should also try to work out variations on target value.
Consult with Financial Engineering researchers on constructing models that have more factors in consideration.

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
Models		Models
Paper		Paper
Sample-Scala-Code		Sample-Scala-Code
Utils		Utils
WebScrapperScripts		WebScrapperScripts
heapmap_sample		heapmap_sample
meta		meta
.gitignore		.gitignore
README.md		README.md
Stock_Price_Prediction_via_Financial_News_Sentiment_Analysis.pdf		Stock_Price_Prediction_via_Financial_News_Sentiment_Analysis.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Models

Models

Paper

Paper

Sample-Scala-Code

Sample-Scala-Code

Utils

Utils

WebScrapperScripts

WebScrapperScripts

heapmap_sample

heapmap_sample

meta

meta

.gitignore

.gitignore

README.md

README.md

Stock_Price_Prediction_via_Financial_News_Sentiment_Analysis.pdf

Stock_Price_Prediction_via_Financial_News_Sentiment_Analysis.pdf

Repository files navigation

Stock Price Prediction via Financial News Sentiment Analysis

Data Source:

Data Pipeline

Actual Data Flow

Predicting Result

Reference:

To-Do:

Improvement:

Experiment:

About

Releases

Packages

Contributors 2

Languages

Finance-And-ML/US-Stock-Prediction-Using-ML-And-Spark

Folders and files

Latest commit

History

Repository files navigation

Stock Price Prediction via Financial News Sentiment Analysis

Data Source:

Data Pipeline

Actual Data Flow

Predicting Result

Reference:

To-Do:

Improvement:

Experiment:

About

Topics

Resources

Stars

Watchers

Forks

Languages