- Notifications
You must be signed in to change notification settings - Fork0
Stock Forecasting Application
License
naserih/stockastic
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Stock Data Science With Python
This is a startup project to test some of the data science tools on stock data to find meaningful trends.
- You will need a git installed on your machine.
- You will require Python 3 on your machine with a working version of pip3 to start this project
- you can check versions of python and pip using any of the following comments:
python --version
,python3 --version
,pip --version
,pip3 --version
- Clone the current repository
git clone https://github.com/hn617/stockastic.git
cd stockastic
- make a new git branch
git checkout -b lesson1
pip3 install -r requirements
- make sure you have all the requirements installed into the python3
- Learn about TSX stock tickers and daily stock prices (open, close), and volume.
In this lesson, we are going to build a python script to read TSX historical stock prices (2019-2020) and sort the stock tickers according to their average volume.
Read data from CSV file ...
- data directory contains daily stock values for TSX stocks for the year 2019-2020. Files' names are stock tickers. Open a couple of the CSV files and check the data structure. We are going to create a ticker dictionary containing file path and stock details.
ticker_dic = {'<TIKER_0>' : { 'FILE_PATH': '<full_path_to_ticker_0_file>'}, 'mean_volume' : xx, 'order_volume' : xx, }, '<TIKER_1>' : { 'FILE_PATH': '<full_path_to_ticker_1_file>'}, 'mean_volume' : xx, 'order_volume' : xx, }
later we will add fore data into the ticker dictionary.
- Use python to list all the CSV files (stock tickers) from
./data/TSX/20190222
import osmypath = ""onlyfiles = [f for f in os.listdir(mypath) if ".csv" in f]
Then create a dictionary with ticker name as key and full file path to the csv file as value. You can do something like.
ticker_dic = {}for filename in onlyfiles: ticker_dic[filename[:-4]] = {'filepath':os.path.join(mypath, filename)}
- Write a function to read a CSV file for a given ticker as a panda dataframe.HELP
import pandas as pd import jsondf = pd.read_csv("full_path_to_csv_file", header=0,sep=",", thousands=',', index_col=None, parse_dates=['Date'])if len(df['Volume']) == 0: del ticker_dic[ticker]
- Write a function to return the
mean
of the stockVolumes
for a input ticker.df.mean(axis=0)
def get_mean_volume(ticker): mean_volume = ... //finds mean volume return mean_volume
- Modify the function to add the mean_volume into the ticker_dic.
ticker_dic[ticker]['mean_volume'] = mean_volume
- sort tickers by their mean_volume and add the ticker order to the ticker_dic
sorted_by_volume = sorted(ticker_dic, key=lambda k: ticker_dic[k]['mean_volume'], reverse=True) # check to make sure it is working print (sorted_by_volume) for i in range(len(sorted_by_volume)): ticker = sorted_by_volume[i] order_volume = i ticker_dic[ticker]['order_volume'] = order_volume
In this lesson we will add stockopen
andclose
arrays into theticker_dic
and plot stock values for some of high volume tickers.
Visualize Data in matplotlib
Similar to the previous lesson, add
median_volume
andorder_median_volume
into the ticker dictionary.Create panda array with ticker's
order_median_volume
,order_mean_volume
,median_volume
, andmean_volume
.
df = pd.DataFrame(tickers_dic.values())
- plot stock
mean_volume
andmedian_volume
vsorder_volume
df.plot(x='order_median_volume', y='median_volume')
In this lesson, we will work with stock Open and Close values. We will investigate the correlation between Close and Open values of the stock.
Compare stock Open to its previous Close
- For a given ticker in
tickers_dic
calculate the ratio betweenAdj. Close
andOpen
for each row and store them as a new columnC/O
. - calculate the ratio between
Open
and the previous day'sAdj. Close
values for each row and store them as a new columnO/C
. - Plot
O/C
vsC/O
ticker = 'AC.TO'close_open_ratio = tickers_dic[ticker]['df']['Adj. Close'] / tickers_dic[ticker]['df']['Open'] tickers_dic[ticker]['df']['C/O'] = close_open_ratioopen_close_ratio = tickers_dic[ticker]['df']['Open'] / tickers_dic[ticker]['df']['Adj. Close'].shift(1)tickers_dic[ticker]['df']['O/C'] = open_close_ratiodf.plot(x='C/O', y='O/C')
In this lesson we want to build a function to allow us to compare stockAdj. Close
values for two tickers.
Comparing two stocks
We want to define a function to get tickers_dic, fixed_ticker, moving_ticker, interval, time_shift, today_date, forecast_days and return plot the fix_ticker_value vs mocinv_ticker_value.
fixed_ticker: is a base ticker that we want to forecast. moving_ticker: is a ticker that we want to compare to fixed_ticker and use for forecasting.interval: is the time interval that we want to compare two stocks. For example 60 business days (~three months).time_shift: is a shift between the fixed_ticker snd moving_ticker.today_date: date for the start of forecasting. In reality, this will be today's date but for training, we will change this date and test the forecasting.forecast_days: number of the days from today_date that we want to forecast.Note: If the today_date is the last available date for fixed_ticker then time_shift should be greater than the forecast_days. For example, if we want to forecast the next 10 business days (forecast_days = 10) then time_shift should be greater than 10.
- define a function with the requested parameters:
def compare_stocks(tickers_dic, fixed_ticker, moving_ticker, interval, time_shift, today_date, forecast_days)
- read dfs for both fixed and moving tickers
- convert
Date
column to datetime.date object. Similarly, convert thetoday_date
string to the datetime.date object. - normalize the 'Adj. Close' values of both stocks using z-score (standardized). Store the normalization parameters to allow us to convert the normalized values back to the
- filter the fixed_ticker df to read the number of
interval
rows starting fromtoday_date
- filter the moving_ticker df to read the number of
interval
+time_shift
rows starting fromtoday_date
- plot normilied 'Adj. Close' values for both fixed_ticker and shifted moving_ticker