The goal of this article is to provide an easy introduction to stock market analysis using Python. In this notebook we will use pandas_datareader module. We will walk through a simple Python script to retrieve, analyze, and visualize data from different markets.

Project Setup¶

Once we've got a blank Jupyter notebook open, the first thing we'll do is import the required dependencies.

In [22]:

import pandas as pd
from datetime import datetime, timedelta

We will use pandas_datareader to get data from the market.

In [23]:

import pandas_datareader as pdr

Making the same request repeatedly can use a lot of bandwidth, slow down your code and may result in your IP being banned.

pandas_datareader allows you to cache queries using requests_cache

In [24]:

import requests_cache
expire_after = timedelta(days=1)
session = requests_cache.CachedSession(cache_name='cache', backend='sqlite', expire_after=expire_after)

In [25]:

%reload_ext watermark
%watermark -v -m --iversions

requests_cache0.5.0
pandas_datareader0.7.0
IPython   5.0.0
pandas    0.23.4
CPython 3.6.7
IPython 5.0.0

compiler   : GCC 5.4.0 20160609
system     : Linux
release    : 4.15.0-47-generic
machine    : x86_64
processor  : x86_64
CPU cores  : 4
interpreter: 64bit

Pulling Data¶

We will try MOEX GAZP ticker. You can explore securities secid with this link http://iss.moex.com/iss/securities.xml?q=Gazprom

In [26]:

gazp = pdr.DataReader('GAZP', 'moex', start=datetime(2018, 1, 1), end=datetime(2019, 1, 1), session=session)

In [27]:

gazp['OPEN'].head(10)

Out[27]:

TRADEDATE
2018-01-03    131.46
2018-01-03    131.03
2018-01-04       NaN
2018-01-04    135.42
2018-01-04    132.50
2018-01-05       NaN
2018-01-05    136.18
2018-01-05    135.60
2018-01-09       NaN
2018-01-09    139.01
Name: OPEN, dtype: float64

We will clear rows with NaN with dropna as is and plot OPEN price

In [28]:

gazp['OPEN'].dropna().plot(figsize = (14, 6))

Out[28]:

<matplotlib.axes._subplots.AxesSubplot at 0x7f4bc1e9e2e8>

We can use other data provider, lets try Yahoo finance and Chevron Corporation (CVX) ticker

In [29]:

cvx = pdr.DataReader('CVX',  'yahoo', start=datetime(2018, 1, 1), end=datetime(2019, 1, 1), session=session)

In [30]:

cvx.head()

Out[30]:

	High	Low	Open	Close	Volume	Adj Close
Date
2018-01-02	127.739998	125.540001	125.709999	127.580002	5626000.0	121.618179
2018-01-03	128.940002	126.900002	127.459999	128.509995	5805500.0	122.504707
2018-01-04	128.350006	127.220001	127.949997	128.110001	4598300.0	122.123413
2018-01-05	128.100006	127.099998	127.970001	127.900002	4189200.0	121.923218
2018-01-08	128.630005	127.629997	127.860001	128.529999	4826100.0	122.523773

Next, we'll generate a simple chart as a quick visual verification that the data looks correct.

In [31]:

cvx['Open'].plot(figsize = (14, 6))

Out[31]:

<matplotlib.axes._subplots.AxesSubplot at 0x7f4bc1e4ef60>

Retrive index data¶

Now we wanna compare stock price with some index

We will download stock prices for European Energy Companies as listed at https://www.nasdaq.com/screening/companies-by-industry.aspx?industry=Energy&region=Europe

In [32]:

securities = ['BP', 'TOT', 'EQNR', 'SLB', 'E', 'PHG']

sec_data = {}
for security in securities:
    sec_df = pdr.DataReader(security,  
                            'yahoo', 
                            start=datetime(2018, 1, 1), 
                            end=datetime(2019, 1, 1), 
                            session=session)
    sec_data[security] = sec_df['Open']

In [33]:

sec_data.keys()

Out[33]:

dict_keys(['BP', 'TOT', 'EQNR', 'SLB', 'E', 'PHG'])

Now we have a dictionary of 6 dataframes, each containing the historical daily average exchange prices.

We can preview TOT Open price to make sure it looks ok.

In [34]:

sec_data['TOT'].plot(figsize = (14, 6))

Out[34]:

<matplotlib.axes._subplots.AxesSubplot at 0x7f4bc1cd7128>

Correlations in 2018¶

Calculate the pearson correlation coefficients for Energy companies in 2018

In [35]:

sec_data_2018 = pd.DataFrame(sec_data)
sec_data_2018.head()

Out[35]:

	BP	TOT	EQNR	SLB	E	PHG
Date
2018-01-02	42.060001	55.450001	NaN	68.070000	33.189999	37.880001
2018-01-03	42.430000	55.970001	NaN	69.970001	33.369999	37.700001
2018-01-04	43.009998	57.279999	NaN	71.760002	34.380001	38.950001
2018-01-05	43.049999	57.810001	NaN	72.949997	34.709999	39.400002
2018-01-08	42.990002	57.730000	NaN	73.349998	34.660000	39.630001

In [36]:

sec_data_2018.pct_change().corr(method='pearson')

Out[36]:

	BP	TOT	EQNR	SLB	E	PHG
BP	1.000000	0.801044	0.832809	0.630966	0.739382	0.443860
TOT	0.801044	1.000000	0.775889	0.606954	0.790225	0.521720
EQNR	0.832809	0.775889	1.000000	0.660128	0.757535	0.466831
SLB	0.630966	0.606954	0.660128	1.000000	0.623800	0.348697
E	0.739382	0.790225	0.757535	0.623800	1.000000	0.490694
PHG	0.443860	0.521720	0.466831	0.348697	0.490694	1.000000

These correlation coefficients are all over the place. Coefficients close to 1 or -1 mean that the series' are strongly correlated or inversely correlated respectively, and coefficients close to zero mean that the values tend to fluctuate independently of each other.

Heatmap Visualization¶

In [37]:

corr = sec_data_2018.pct_change().corr(method='pearson')
corr.style.background_gradient(cmap='binary', low=0.2, high=0.9)

Out[37]:

	BP	TOT	EQNR	SLB	E	PHG
BP	1	0.801044	0.832809	0.630966	0.739382	0.44386
TOT	0.801044	1	0.775889	0.606954	0.790225	0.52172
EQNR	0.832809	0.775889	1	0.660128	0.757535	0.466831
SLB	0.630966	0.606954	0.660128	1	0.6238	0.348697
E	0.739382	0.790225	0.757535	0.6238	1	0.490694
PHG	0.44386	0.52172	0.466831	0.348697	0.490694	1

Here, the dark red values represent stronger correlations (note that each currency is, obviously, strongly correlated with itself), and values with light background represent strong inverse correlations.

In this notebook, we tested pandas_datareader library, got fresh data on stocks of European Energy Companies for 2018 year and visualized the heatmap correlation matrix.

Getting stock data with pandas-datareader

Project Setup¶

Pulling Data¶

Retrive index data¶

Correlations in 2018¶

Heatmap Visualization¶

Comments

Project Setup¶

Pulling Data¶

Retrive index data¶

Correlations in 2018¶

Heatmap Visualization¶

Related Posts:

Comments