Stock market portfolio optimisation.

The goal of this notebook is to provide an introduction to stock market portfolio optimisation using Python. We will use pandas_datareader module to get data and PyPortfolioOpt for optimisation. We will walk through a simple Python script to retrieve, analyze, and visualize data for predefined portfolio of stocks.

To read and use this notebook clone it from remote repository to your local computer and run in preferred environment.

Permanent link to repository: bitbucket.org/Nekrasovp/stock-market-portfolio-optimisation

You can view this notebook on Jupyter Notebook nbviewer website: nbviewer.jupyter.org/urls/bitbucket.org/Nekrasovp/stock-market-portfolio-optimisation

Table of content:

  • Project setup
  • Pulling data
  • Preparing data
  • Analyze portfolio
    • Calculate the 'Markowitz portfolio', minimising volatility for a given target return.
    • Calculate the Sharpe-maximising portfolio for a given volatility (i.e max return for a target risk).
  • Visualise portfolio
    • Widget that allows you to interactively change the allocation.
    • Plot with individual stock's annual return and annual risk, efficient frontier curve and best allocation cases.

Project setup

First thing we'll do is import the required dependencies.

In [41]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from ipywidgets import interactive, FloatSlider, IntText, HBox, Layout, VBox, interact, Dropdown
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

plt.style.use('fivethirtyeight')

%matplotlib inline
%config InlineBackend.figure_format = 'retina'

We will use pandas_datareader to get data from the market. And pypfopt for optimisation goals.

In [42]:
from pypfopt.efficient_frontier import EfficientFrontier
from pypfopt import risk_models, expected_returns, discrete_allocation
In [43]:
import pandas_datareader as pdr

Making the same request repeatedly can use a lot of bandwidth, slow down your code and may result in your IP being banned.

pandas_datareader allows you to cache queries using requests_cache

In [44]:
import requests_cache
expire_after = timedelta(days=1)
session = requests_cache.CachedSession(cache_name='cache', backend='sqlite', expire_after=expire_after)
In [45]:
%reload_ext watermark
%watermark -v -m --iversions
matplotlib        3.1.1
pandas            0.24.2
requests_cache    0.5.0
pandas_datareader 0.7.0
numpy             1.16.4
CPython 3.7.3
IPython 7.6.1

compiler   : GCC 8.0.1 20180414 (experimental) [trunk revision 259383
system     : Linux
release    : 4.15.0-54-generic
machine    : x86_64
processor  : x86_64
CPU cores  : 4
interpreter: 64bit

Pulling data

Define tickers you would like to analyze

In [46]:
securities = ['PRSP', 
              'GOOGL',
              'IBM',
              'BTI',
              'BA',
              'BIDU',
              'CMA',
              'WDC',
              'SLB',
              'SKM',
              'F',
              'EIX',
              'FAST',
              'NOK',
              'DLR',
              'INTC',
              'CBPO',
              'NVDA',
              'SHI',
              'CAT', ]

sec_data = {}
for security in securities:
    sec_df = pdr.DataReader(security,  
                            'yahoo', 
                            start=datetime(2018, 7, 10), 
                            end=datetime(2019, 7, 10), 
                            session=session)
    sec_data[security] = sec_df

By looking at keys of resulted dict we can explore which securities we fetched successfully.

In [47]:
sec_data.keys()
Out[47]:
dict_keys(['PRSP', 'GOOGL', 'IBM', 'BTI', 'BA', 'BIDU', 'CMA', 'WDC', 'SLB', 'SKM', 'F', 'EIX', 'FAST', 'NOK', 'DLR', 'INTC', 'CBPO', 'NVDA', 'SHI', 'CAT'])
In [48]:
sec_data['GOOGL'].head()
Out[48]:
High Low Open Close Volume Adj Close
Date
2018-07-10 1173.500000 1162.560059 1169.989990 1167.140015 1066700 1167.140015
2018-07-11 1180.439941 1155.369995 1155.619995 1171.459961 1662600 1171.459961
2018-07-12 1201.989990 1173.099976 1174.859985 1201.260010 2207400 1201.260010
2018-07-13 1210.439941 1195.290039 1202.800049 1204.420044 1630600 1204.420044
2018-07-16 1208.709961 1193.400024 1203.810059 1196.510010 1339200 1196.510010

Let's look at how the price of each stock has evolved within give time frame. While price of each stock is different there is no meaning to look at prices themselves so we will plot daily returns (percent change compared to the day before). By plotting daily returns instead of actual prices, we can see the stocks' volatility.

In [49]:
plt.figure(figsize=(14, 7))
for security, df in sec_data.items():
    df['PctChange'] = df.Close.pct_change()
    plt.plot(df.index, df.PctChange, lw=3, alpha=0.8,label=security)
plt.legend(loc='upper left', fontsize=10)
plt.ylabel('daily returns')
Out[49]:
Text(0, 0.5, 'daily returns')

It seems that EIX has experienced a real "roller coaster" in the study period.

In [50]:
sec_data['EIX']['Close'].dropna().plot(figsize = (14, 7))
plt.ylabel('price in $')
Out[50]:
Text(0, 0.5, 'price in $')

Now we have a dictionary of dataframes, each containing the historical ohlcv prices data.

Preparing data

For analyzing we will use only Close prices from each ohlcv tuple.

In [51]:
 def build_pricing_data_from_ohlcv(ohlcv_data: dict):
        """
        Build new pricing dict from ohlcv_data 'Close' column

        Add base asset column with price equal to '1'.

        :param ohlcv_data: Pricing data
        :return: Formated pricing data dict
        """
        data_c = pd.DataFrame()
        for m, ohlcv in ohlcv_data.items():
            data_c[m] = ohlcv['Close']
        return data_c
In [52]:
prices_df = build_pricing_data_from_ohlcv(sec_data)
In [53]:
pd.set_option('display.max_columns', 10)
prices_df.head()
Out[53]:
PRSP GOOGL IBM BTI BA ... INTC CBPO NVDA SHI CAT
Date
2018-07-10 22.250000 1167.140015 144.710007 52.250000 347.160004 ... 52.160000 106.720001 253.250000 60.139999 141.250000
2018-07-11 22.559999 1171.459961 144.940002 51.380001 340.600006 ... 51.200001 105.809998 247.529999 59.200001 136.759995
2018-07-12 22.809999 1201.260010 146.449997 51.740002 346.029999 ... 52.349998 100.459999 251.229996 60.500000 139.419998
2018-07-13 22.870001 1204.420044 145.899994 51.799999 350.790009 ... 52.220001 100.889999 249.320007 61.150002 140.750000
2018-07-16 22.770000 1196.510010 145.460007 51.090000 356.100006 ... 52.009998 101.400002 248.199997 62.650002 138.080002

5 rows × 20 columns

Analyze portfolio

Methods of "portfolio optimization" as a solution to a particular optimization problem is a widely studied area of human knowledge.

Look more here:

As mentioned above we will use PyPortfolioOpt package because its provide clear and uncomplicated access to pandas and scipy optimisation methods used in our case.

Expected returns models:

  • mean historical return
  • exponentially weighted mean historical return

Risk models:

  • sample covariance
  • semicovariance
  • exponentially weighted covariance
  • mininum covariance determinant
  • shrunk covariance matrices:

    • manual shrinkage
    • Ledoit Wolf shrinkage
    • Oracle Approximating shrinkage
In [54]:
def generate_analysis_model(pricing_data, weight_bounds):
    """Generate efficient frontier model with simple and intuitive estimators

    :param pricing_data: Adjusted closing prices of the asset, each row is a date
               and each column is a ticker/id.
    :param weight_bounds: Minimum and maximum weight of an asset
    :return: Efficient Frontier model
    """
    # Annualised mean (daily) historical return from input (daily) asset prices
    mu = expected_returns.mean_historical_return(pricing_data, frequency=252)
    # Annualised sample covariance matrix of (daily) asset returns
    s = risk_models.sample_cov(pricing_data, frequency=252)
    return EfficientFrontier(mu, s, weight_bounds=weight_bounds, gamma=0)
In [55]:
def make_discrete_allocation(prices_df, cleaned_weights, total_portfolio_value):
    latest_prices = discrete_allocation.get_latest_prices(prices_df)
    da = discrete_allocation.DiscreteAllocation(weights=cleaned_weights,
                                                latest_prices=latest_prices,
                                                total_portfolio_value=total_portfolio_value
                                                )
    allocation, leftover = da.lp_portfolio(verbose=False)
    print(allocation)
    print("Funds remaining: ${:.2f}".format(leftover))
In [56]:
def draw_cleaned_weights_with_perfomance(cleaned_weights, p_perfomance):
    # Filter non-zero values
    non_zero_weights = dict(filter(lambda w: w[1] > 0.0, cleaned_weights.items()))
    # Plot horizontal bar chart with resulted weights
    plt.figure(figsize=(10, 7))
    _ = non_zero_weights
    ax = plt.barh(range(len(_.keys())), 
             [float(k)*100 for k in _.values()],
             align='center',
             height=0.5,
             color='#f3ba2f',
             edgecolor='black',
             tick_label=[k for k in _.keys()])
    plt.yticks(fontsize=14)
    plt.xlabel('Ticker allocation, %')
    plt.ylabel('Portfolio tickers')
    plt.title('Expected annual return: {:.2%} Annual volatility: {:.2%} Sharpe Ratio: {:.2f}'.format(
                p_perfomance[0],
                p_perfomance[1],
                p_perfomance[2]))
    for rect in ax.patches:
        # Get X and Y placement of label from rect.
        x_value = rect.get_width()
        y_value = rect.get_y() + rect.get_height() / 2
        # Number of points between bar and label. Change to your liking.
        space = 5
        # Vertical alignment for positive values
        ha = 'left'
        # If value of bar is negative: Place label left of bar
        if x_value < 0:
            # Invert space to place label to the left
            space *= -1
            # Horizontally align label at left
            ha = 'left'
        # Use X value as label and format number with one decimal place
        label = "{:.1f}".format(x_value)
        # Create annotation
        plt.annotate(
            label,                      # Use `label` as label
            (x_value, y_value),         # Place label at end of the bar
            xytext=(space, 0),          # Horizontally shift label by `space`
            textcoords="offset points", # Interpret `xytext` as offset in points
            va='center',                # Vertically center label
            ha=ha)                      # Horizontally align label
    # plt.savefig("image.png")
    plt.show()
In [57]:
EFmodel = generate_analysis_model(prices_df, (0.0, 1.0))

Calculate the 'Markowitz portfolio', minimising volatility for a given target return.

In [58]:
EFmodel.efficient_return(target_return=0.2) 
Out[58]:
{'PRSP': 0.07457978484549457,
 'GOOGL': 0.0,
 'IBM': 0.0,
 'BTI': 0.0,
 'BA': 0.0,
 'BIDU': 3.754050022231059e-18,
 'CMA': 2.5343225781848666e-18,
 'WDC': 7.632783294297951e-17,
 'SLB': 5.0550926292136644e-18,
 'SKM': 0.09059050741712786,
 'F': 0.0,
 'EIX': 0.10282214987821588,
 'FAST': 0.6087972153096218,
 'NOK': 2.0477868532819965e-17,
 'DLR': 0.12321034254954018,
 'INTC': 0.0,
 'CBPO': 0.0,
 'NVDA': 0.0,
 'SHI': 1.6506978076091805e-17,
 'CAT': 0.0}
In [59]:
cleaned_weights = EFmodel.clean_weights(cutoff=1e-4, rounding=4)
p_perfomance = EFmodel.portfolio_performance(verbose=False)
draw_cleaned_weights_with_perfomance(cleaned_weights, p_perfomance)
make_discrete_allocation(prices_df, cleaned_weights, 10000)
15 out of 20 tickers were removed
{'PRSP': 31, 'SKM': 37, 'EIX': 15, 'FAST': 195, 'DLR': 10}
Funds remaining: $7.08

You can do better than 1/N

Here is a quote from Portfolio optimisation: lessons learnt:

"A significant body of research suggests that, in light of the aforementioned failures of MVO, we should instead use 1/N portfolios(Optimal Versus Naive Diversification: How Inefficient is the 1/N Portfolio Strategy?) – it is noted that they often beat efficient frontier optimisation significantly in out-of-sample testing.

However, an interesting paper by Kritzman et al. (2010) finds that it is not the case that there is anything special about 1/N diversification: it is just that the expected returns are an incredibly poor estimator and any optimisation scheme which relies on them will likely go astray.

The easiest way to avoid this problem is to not provide the expected returns to the optimiser, and just optimise on the sample covariance matrix instead. Effectively we are saying that although previous returns won’t predict future returns, previous risks might predict future risks. This is intuitively a lot more reasonable – the sample covariance matrix really seems like it should contain a lot of information. Empirical results support this, showing that minimum variance portfolios outperform both standard MVO and 1/N diversification.

In my own work I have found that the standard minimum variance portfolio is a very good starting point, from which you can try a lot of new things:

  • Shrinkage estimators on the covariance matrix
  • Exponential weighting
  • Different historical windows
  • Additional cost terms in the objective function (e.g small-weights penalty)
  • ..."

Calculate the Sharpe-maximising portfolio for a given volatility (i.e max return for a target risk).

In [60]:
EFmodel.efficient_risk(target_risk=0.25)
Out[60]:
{'PRSP': 0.05055098383161007,
 'GOOGL': 4.934204086981531e-16,
 'IBM': 1.4046584533543294e-15,
 'BTI': 0.0,
 'BA': 0.0,
 'BIDU': 0.0,
 'CMA': 2.7773193900931803e-16,
 'WDC': 0.0,
 'SLB': 0.0,
 'SKM': 4.137525554375604e-15,
 'F': 0.0,
 'EIX': 0.11517889383646501,
 'FAST': 0.7954449526321907,
 'NOK': 5.570156423780059e-16,
 'DLR': 0.0388251696997469,
 'INTC': 7.154311323052942e-16,
 'CBPO': 2.3506926114837123e-15,
 'NVDA': 0.0,
 'SHI': 0.0,
 'CAT': 0.0}
In [61]:
cleaned_weights = EFmodel.clean_weights(cutoff=1e-4, rounding=4)
p_perfomance = EFmodel.portfolio_performance(verbose=False)
draw_cleaned_weights_with_perfomance(cleaned_weights, p_perfomance)
make_discrete_allocation(prices_df, cleaned_weights, 10000)
16 out of 20 tickers were removed
{'PRSP': 21, 'EIX': 17, 'FAST': 254, 'DLR': 3}
Funds remaining: $7.97

Visualise portfolio

One of the goal of this notebook is visualising Efficient portfolio. To achive this we will provide:

  • Portfolio securities list.
  • Maximum target return and minimum volatility cases information.
  • Widget where you can choose different portfolio allocations along the Efficient frontier curve.
  • Final allocations for presented portfolio value.

Next will plot:

  • Individual stock's annual return and annual risk.
  • Maximum target return and minimum volatility allocation.
  • Efficient frontier curve.
  • Target return portfolio perfomance
In [63]:
def generate_portfolio(prices_df, target_return = None, target_risk = None):
    """
    :return: Expected annual return(mu), Annual volatility(sigma), Sharpe Ratio
    """
    weight_bounds = (0.0, 1.0)
    EFmodel = generate_analysis_model(prices_df, weight_bounds)
    if target_return is not None:
        EFmodel.efficient_return(target_return=target_return)
        return EFmodel.portfolio_performance()
    if target_risk is not None: 
        EFmodel.efficient_risk(target_risk=target_risk)
        return EFmodel.portfolio_performance()
In [64]:
def draw_portfolio(prices_df, opt_obj, total_v, target_return, target_risk):
    weight_bounds = (0.0, 1.0)
    EFmodel = generate_analysis_model(prices_df, weight_bounds)
    if opt_obj == 'target_return': EFmodel.efficient_return(target_return=target_return)
    elif opt_obj == 'target_risk': EFmodel.efficient_risk(target_risk=target_risk)
    elif opt_obj == 'max_sharpe': EFmodel.max_sharpe()
    elif opt_obj == 'min_volatility': EFmodel.min_volatility()
    cleaned_weights = EFmodel.clean_weights(cutoff=1e-4, rounding=4)
    p_perfomance = EFmodel.portfolio_performance(verbose=False)
    draw_cleaned_weights_with_perfomance(cleaned_weights, p_perfomance)
    make_discrete_allocation(prices_df, cleaned_weights, total_v)
In [67]:
p = prices_df.copy()

# Calculate the 'Markowitz portfolio', minimising volatility for a given target_return
max_annual_return_portfolio = generate_portfolio(prices_df=p, target_return=1.0)

# Calculate the Sharpe-maximising portfolio for a given volatility(max return for a target_risk).
min_annual_volatility_portfolio = generate_portfolio(prices_df=p, target_risk=0.0)

print(f"Portfolio tickers:{securities}\n")
print("Minimising volatility for maximum target_return portfolio allocation:\n")
print(f"Annualised Return:{max_annual_return_portfolio[0]:.1%}")
print(f"Annualised Volatility:{max_annual_return_portfolio[1]:.1%}")
print(f"Sharpe Ratio:{max_annual_return_portfolio[1]:.2f}\n")
print("Sharpe-maximising portfolio for a minimum volatility(max return for a target_risk) allocation:\n")
print(f"Annualised Return:{min_annual_volatility_portfolio[0]:.1%}")
print(f"Annualised Volatility:{min_annual_volatility_portfolio[1]:.1%}")
print(f"Sharpe Ratio:{min_annual_volatility_portfolio[1]:.2f}\n")

def f(opt_obj, x, tr, tri):
    draw_portfolio(p, opt_obj, x, tr, tri)

opt_obj_widget = Dropdown(options=['target_return', 'target_risk', 'max_sharpe', 'min_volatility'],
                          value='target_return',
                          description='Otimisation objective:',
                          style={'description_width': 'initial'},
                          disabled=False, )
def update_opt_obj(*args):
    if opt_obj_widget.value == 'target_return':
        risk_slider_widget.disabled = True
        return_slider_widget.disabled = False
    elif opt_obj_widget.value == 'target_risk':
        risk_slider_widget.disabled = False
        return_slider_widget.disabled = True
    elif opt_obj_widget.value == 'max_sharpe':
        risk_slider_widget.disabled = True
        return_slider_widget.disabled = True 
    elif opt_obj_widget.value == 'min_volatility':
        risk_slider_widget.disabled = True
        return_slider_widget.disabled = True 
        
opt_obj_widget.observe(update_opt_obj, 'value')

total_v_widget = IntText(value=10000, description='Total value:', disabled=False)

risk_slider_widget = FloatSlider(value=0, 
                                 min=min_annual_volatility_portfolio[1],
                                 max=max_annual_return_portfolio[1], 
                                 step=0.001,
                                 description='Target risk:',
                                 disabled=True,
                                 continuous_update=False,
                                 layout=Layout(width='50%', height='40px'),
                                 readout=True,
                                 readout_format='.1%',)


return_slider_widget = FloatSlider(value=max_annual_return_portfolio[0]*0.9, 
                                   min=0.0, 
                                   max=max_annual_return_portfolio[0], 
                                   step=0.001,
                                   description='Target return:',
                                   disabled=False,
                                   continuous_update=False,
                                   orientation='horizontal',
                                   style={'description_width': 'initial'},
                                   layout=Layout(width='50%', height='40px'),
                                   readout=True,
                                   readout_format='.1%',)

interactive_plot = interactive(f,
                               opt_obj=opt_obj_widget,
                               x=total_v_widget,
                               tr=return_slider_widget, 
                               tri=risk_slider_widget,)
interactive_plot
Portfolio tickers:['PRSP', 'GOOGL', 'IBM', 'BTI', 'BA', 'BIDU', 'CMA', 'WDC', 'SLB', 'SKM', 'F', 'EIX', 'FAST', 'NOK', 'DLR', 'INTC', 'CBPO', 'NVDA', 'SHI', 'CAT']

Minimising volatility for maximum target_return portfolio allocation:

Annualised Return:26.9%
Annualised Volatility:26.6%
Sharpe Ratio:0.27

Sharpe-maximising portfolio for a minimum volatility(max return for a target_risk) allocation:

Annualised Return:-3.8%
Annualised Volatility:12.7%
Sharpe Ratio:0.13

Next we can plot each individual stocks on the plot with the corresponding values of each stock's annual return and annual risk. This way we can see and compare how diversification is lowering the risk by optimising the allocation. For example we will take recommended in previous state portfolio by filtering presented dataframe.

In [66]:
individual_assets = ['PRSP', 'GOOGL', 'IBM', 'BTI', 'BA', 'BIDU', 'CMA', 'WDC', 'SLB', 'SKM', 'F', 'EIX', 'FAST', 'NOK', 'DLR', 'INTC', 'CBPO', 'NVDA', 'SHI', 'CAT']
# individual_assets = ['PRSP', 'SKM', 'EIX', 'FAST', 'DLR']
t_return_opt = 0.16
i_stocks_assets_data = prices_df[individual_assets].copy()

max_annual_return_portfolio = generate_portfolio(prices_df=i_stocks_assets_data, target_return=0.5)
min_annual_volatility_portfolio = generate_portfolio(prices_df=i_stocks_assets_data, target_risk=0.0)
max_annual_volatility_portfolio = generate_portfolio(prices_df=i_stocks_assets_data, target_risk=1.0)
opt_portfolio = generate_portfolio(prices_df=i_stocks_assets_data, target_return=t_return_opt)

i_stocks_perfomance = {}
for a in individual_assets:
    EFmodel = generate_analysis_model(i_stocks_assets_data[a], (0.0, 1.0))
    EFmodel.efficient_return(target_return=1.0)
    i_stocks_perfomance[a] = EFmodel.portfolio_performance(verbose=False)

fig, ax = plt.subplots(figsize=(12, 7))

ax.scatter([p[1] for p in i_stocks_perfomance.values()],
           [p[0] for p in i_stocks_perfomance.values()],
           marker='o',s=200)

for a, p in i_stocks_perfomance.items():
    ax.annotate(a, (p[1],p[0]), xytext=(10,0), textcoords='offset points')

target = np.linspace(min_annual_volatility_portfolio[1], max_annual_return_portfolio[1], 30)
ef_curve_portfolios_perfomance = {}
for t in target:
    EFmodel = generate_analysis_model(i_stocks_assets_data, (0.0, 1.0))
    EFmodel.efficient_risk(target_risk=t)
    ef_curve_portfolios_perfomance[t] = EFmodel.portfolio_performance(verbose=False)
    
target_ret = np.linspace(max_annual_volatility_portfolio[0], max_annual_return_portfolio[0], 30)
for t in target_ret:
    EFmodel = generate_analysis_model(i_stocks_assets_data, (0.0, 1.0))
    EFmodel.efficient_return(target_return=t)
    ef_curve_portfolios_perfomance[t] = EFmodel.portfolio_performance(verbose=False)

    
    
ax.plot([p[1] for p in ef_curve_portfolios_perfomance.values()], 
        [p[0] for p in ef_curve_portfolios_perfomance.values()], 
        linestyle='-.', 
        color='black', 
        label='efficient frontier')
    
ax.scatter(opt_portfolio[1],
           opt_portfolio[0],
           marker='p',color='y',s=300, label='Minimum Volatility for target return')    
    
ax.scatter(max_annual_return_portfolio[1],
           max_annual_return_portfolio[0],
           marker='*',color='r',s=300, label='Minimum volatility for maximum return')

ax.scatter(min_annual_volatility_portfolio[1],
           min_annual_volatility_portfolio[0],
           marker='*',color='g',s=300, label='Sharpe-maximising portfolio for minimum volatility')

ax.set_title('Portfolio Optimization with Individual Stocks')
ax.set_xlabel('annualised volatility')
ax.set_ylabel('annualised returns')
ax.legend(labelspacing=0.8, loc='best')
# ax.savefig("image.png")
Out[66]:
<matplotlib.legend.Legend at 0x7f3adf8a71d0>

As you can see from the above plot, the stock with the least risk is SKM at around 0.20. But with portfolio optimisation, we can achieve even lower risk at 0.14, and still with a higher return than SKM. Best risk/return allocation are in FAST in our case and its matched with portfolio optimised for minimum volatility for maximum return.

In [ ]:
 

Comments

comments powered by Disqus