Python, with its powerful data analysis package pandas, has taken the financial markets analysis world by storm. It enables researchers to perform sophisticated analysis that once required dedicated and expensive packages.

(A version of this post was published on Towards Data Science on September 16, 2019)

With this article, we are going to wade in the water of financial market analysis using Python and pandas. We will plot a market price series and add a basic indicator. With next articles, we will learn step by step how to swim by building a complete toolbox that can be used to create systems to trade stocks or other financial assets.

Most of the concepts that we are going to implement belong to a domain known as Technical Analysis, a discipline that aims at evaluating investments and identify opportunities by analyzing data generated by trading activities, such as price and volume.

Introducing the Moving Average

The moving average is one of the simplest technical indicators, yet it can be used and combined in many different ways to provide the backbone of trading systems and frameworks for investment decision making.

Moving averages, as all technical indicators, are based on the price series of a financial instrument. In our example we will consider the daily price series of the SPDR S&P 500 ETF Trust (symbol: SPY), an Exchange Traded Fund (ETF) traded on the NYSE Arca exchange. This ETF mimics the performance of the S&P 500 stock market index. We want to use moving averages to make investment decisions - to decide when to buy or sell shares in SPY.

There are actually several different kinds of moving averages. The three most commonly used are:

  • Simple Moving Average
  • Linearly Weighted Moving Average
  • Exponentially Smoothed Moving Average

The examples in this article will focus on the Simple Moving Average (SMA). Its construction is quite simple: we first define a rolling window of length $n$ (a number of days in our example - let’s use $n=5$). We then start from day 5 and consider all the prices from day 1 to day 5 (included). We compute the arithmetic mean of those prices (by summing them up and dividing by 5): that is the value of our SMA for day 5. Then, we move on to day 6, we take out the price on day 1, include the price on day 6 and compute the next SMA. We keep repeating the process until we reach the last price in our series. This table should make the process clearer:

Date Price SMA 5 Prices Included
1 1 NaN -
2 7 NaN -
3 2 NaN -
4 4 NaN -
5 3 3.4 1, 7, 2, 4, 3
6 2 3.6 7, 2, 4, 3, 2
7 4 3.0 2, 4, 3, 2, 4
8 8 4.2 4, 3, 2, 4, 8
9 1 3.6 3, 2, 4, 8, 1
10 8 4.6 2, 4, 8, 1, 8

You have noticed that the first 4 values for our moving average are missing: until day 5 there are not enough days to fill a 5 day window.

A visual example

Let us put this concept to work and create some practical examples using Python, pandas and Matplotlib. I am assuming that you have at least some basic Python knowledge and know what pandas DataFrame and Series objects are. If this is not the case, you can find a gentle introduction here. I also suggest that you use a Jupyter notebook to follow along with the following code. Here you can find a handy Jupyter tutorial. However, you always execute the code in your favourite way, through a script or interactive prompt. We start by loading the required libraries and checking their versions:

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as mpl
import sys

print('Python version: ' + sys.version)
print('pandas version: ' + pd.__version__)
print('matplotlib version: ' + mpl.__version__)
Out [1]:
Python version: 3.7.4 (default, Aug 13 2019, 15:17:50) 
[Clang 4.0.1 (tags/RELEASE_401/final)]
pandas version: 0.25.1
matplotlib version: 3.1.1

If you are using Jupyter, it’s a good idea to display the chart within the notebook:

In [2]:
%matplotlib inline

If you are executing your in the prompt on through a script editor, you will need to add plt.show() to the code each time we are plotting a chart in order to make it visible.

Next, we load data into a DataFrame. I have obtained a CSV file of daily data for SPY from Yahoo! Finance. You can download my csv file here.

In [3]:
datafile = 'data/SPY.csv'
data = pd.read_csv(datafile, index_col = 'Date')
data
Out [3]:
Open High Low Close Adj Close Volume
Date
2014-08-20 198.119995 199.160004 198.080002 198.919998 180.141846 72763000
2014-08-21 199.089996 199.759995 198.929993 199.500000 180.667160 67791000
2014-08-22 199.339996 199.690002 198.740005 199.190002 180.386368 76107000
2014-08-25 200.139999 200.589996 199.149994 200.199997 181.301010 63855000
2014-08-26 200.330002 200.820007 200.279999 200.330002 181.418716 47298000
... ... ... ... ... ... ...
2019-08-13 287.739990 294.149994 287.359985 292.549988 292.549988 94299800
2019-08-14 288.070007 288.739990 283.760010 283.899994 283.899994 135622100
2019-08-15 284.880005 285.640015 282.390015 284.649994 284.649994 99556600
2019-08-16 286.480011 289.329987 284.709991 288.850006 288.850006 83018300
2019-08-19 292.190002 293.079987 291.440002 292.329987 292.329987 53571800

1258 rows × 6 columns

Our data frame has six columns in total, one for Open, High, Low, Close, Adjusted Close prices and Volume respectively. To keep things simple, in our examples we will use only the Adjusted Close prices. That is the price series that reflects dividends, stock splits and other corporate events that affect stock returns.

In [5]:
close = data['Adj Close']
close.index = pd.to_datetime(close.index) # This converts the index to pandas datetime
close
Out [5]:
Date
2014-08-20    180.141846
2014-08-21    180.667160
2014-08-22    180.386368
2014-08-25    181.301010
2014-08-26    181.418716
                 ...    
2019-08-13    292.549988
2019-08-14    283.899994
2019-08-15    284.649994
2019-08-16    288.850006
2019-08-19    292.329987
Name: Adj Close, Length: 1258, dtype: float64

We can easily plot the price series for visual inspection:

In [6]:
close.plot()
Out [6]:
<matplotlib.axes._subplots.AxesSubplot at 0x103f00710>

png

Pandas makes calculating a 50 day moving average easy. Using the rolling() method we set a 50 day window, on which we calculate the arithmetic average (mean) using the mean() method:

In [7]:
sma50 = close.rolling(window=50).mean()
sma50
Out [7]:
Date
2014-08-20           NaN
2014-08-21           NaN
2014-08-22           NaN
2014-08-25           NaN
2014-08-26           NaN
                 ...    
2019-08-13    293.540820
2019-08-14    293.635375
2019-08-15    293.696567
2019-08-16    293.805137
2019-08-19    293.926582
Name: Adj Close, Length: 1258, dtype: float64

As we expect, the first 49 values of the series are empty:

In [8]:
sma50.iloc[45:52]
Out [8]:
Date
2014-10-23           NaN
2014-10-24           NaN
2014-10-27           NaN
2014-10-28           NaN
2014-10-29    178.725250
2014-10-30    178.750461
2014-10-31    178.806655
Name: Adj Close, dtype: float64

We can now plot our first moving average on a chart. To make our chart look nicer, we can use a predefined style:

In [9]:
plt.style.use('fivethirtyeight')

You can experiment with different styles, check out what is available here.

We can now plot our chart:

In [10]:
plt.figure(figsize = (12,6))
plt.plot(close, label='SPY Adj Close', linewidth = 2)
plt.plot(sma50, label='50 day rolling SMA', linewidth = 1.5)
plt.xlabel('Date')
plt.ylabel('Adjusted closing price ($)')
plt.title('Price with a single Simple Moving Average')
plt.legend()
Out [10]:
<matplotlib.legend.Legend at 0x103f00410>

png

If you’re not using Jupyter, you need to add the following line to make the plot visible:

plt.show()

Remember to do the same for each of the next plots.

Use of one moving average

Used on its own, our simple moving average does a good job by smoothing the price movements and helping us to visually identify the trend: when the average line is climbing up, we have an uptrend. When the average points down, we are in a downtrend.

We can do more than that: it is possible to generate trading signals by using one single moving average. When the closing price moves above the moving average from below, we have a buy signal:

buy signal

Similarly, when the price crosses the moving average from above, a sell signal is generated.

sell signal

Selecting date ranges

Let’s compare two moving averages with different length, 20 and 50 day respectively:

In [11]:
sma20 = close.rolling(window=20).mean()

plt.figure(figsize = (12,6))
plt.plot(close, label='SPY Adj Close', linewidth = 2)
plt.plot(sma20, label='20 day rolling SMA', linewidth = 1.5)
plt.plot(sma50, label='50 day rolling SMA', linewidth = 1.5)
plt.xlabel('Date')
plt.ylabel('Adjusted closing price ($)')
plt.title('Price with Two Simple Moving Averages')
plt.legend()
Out [11]:
<matplotlib.legend.Legend at 0x114e03350>

png

Our chart is now getting a bit crowded. It would be nice to be able to zoom on a date range of our choice. We could use a plt.xlim() instruction (e.g. plt.xlim('2017-01-01','2018-12-31'): just try to add it to the code above). However, I want to explore a different route: building a new dataframe that includes price and moving averages:

In [12]:
priceSma_df = pd.DataFrame({
      'Adj Close' : close,
      'SMA 20' : sma20,
      'SMA 50' : sma50
     })

priceSma_df
Out [12]:
Adj Close SMA 20 SMA 50
Date
2014-08-20 180.141846 NaN NaN
2014-08-21 180.667160 NaN NaN
2014-08-22 180.386368 NaN NaN
2014-08-25 181.301010 NaN NaN
2014-08-26 181.418716 NaN NaN
... ... ... ...
2019-08-13 292.549988 295.381998 293.540820
2019-08-14 283.899994 294.689998 293.635375
2019-08-15 284.649994 293.980998 293.696567
2019-08-16 288.850006 293.564998 293.805137
2019-08-19 292.329987 293.286497 293.926582

1258 rows × 3 columns

Having all of our series in a single dataframe makes it easy to create a snap plot:

In [13]:
priceSma_df.plot()
Out [13]:
<matplotlib.axes._subplots.AxesSubplot at 0x10245ec10>

png

With this method, however, I cannot easily select the line width: I will keep using a plot function call for each line. On this new data frame, we can easily select a date range (e.g. dates in 2017 and 2018, included) and plot only data within that range:

In [17]:
plt.figure(figsize = (12,6))
plt.plot(priceSma_df['2017':'2018']['Adj Close'], label='SPY Adj Close', linewidth = 2)
plt.plot(priceSma_df['2017':'2018']['SMA 20'], label='20 days rolling SMA', linewidth = 1.5)
plt.plot(priceSma_df['2017':'2018']['SMA 50'], label='50 days rolling SMA', linewidth = 1.5)
plt.xlabel('Date')
plt.ylabel('Adjusted closing price ($)')
plt.title('Price with Two Simple Moving Averages - Selected Date Range')
plt.legend()
Out [17]:
<matplotlib.legend.Legend at 0x11521af90>

png

Slicing data by date range is one of the great features of pandas. All the following work:

  • priceSma_df['2017-04-01':'2017-06-15']: range defined by two specific dates

  • priceSma_df['2017-01] : prices in a given month

  • priceSma_df['2017] : prices in a given year

Use of two moving averages

By combining two moving averages, we can use a technique called double crossover method. With this setting, a buy signal is generated whenever the shorter moving average crosses the longer from below. Similarly, a sell signal is generated whenever the shorter moving average crosses the longer from above:

Two MAs

Compared with the technique that uses just one single moving average and the price, the double crossover method produces fewer whipsaws. On the other hand, it can generate signals with some delay.

Use of three moving averages

If we can use two moving averages together, why not three then?

In [18]:
sma200 = close.rolling(window=200).mean()
priceSma_df['SMA 200'] = sma200
priceSma_df
Out [18]:
Adj Close SMA 20 SMA 50 SMA 200
Date
2014-08-20 180.141846 NaN NaN NaN
2014-08-21 180.667160 NaN NaN NaN
2014-08-22 180.386368 NaN NaN NaN
2014-08-25 181.301010 NaN NaN NaN
2014-08-26 181.418716 NaN NaN NaN
... ... ... ... ...
2019-08-13 292.549988 295.381998 293.540820 277.238597
2019-08-14 283.899994 294.689998 293.635375 277.327894
2019-08-15 284.649994 293.980998 293.696567 277.444337
2019-08-16 288.850006 293.564998 293.805137 277.589019
2019-08-19 292.329987 293.286497 293.926582 277.731843

1258 rows × 4 columns

In [19]:
start = '2016'
end = '2019'

plt.figure(figsize = (12,6))
plt.plot(priceSma_df[start:end]['Adj Close'], label='SPY Adj Close', linewidth = 2)
plt.plot(priceSma_df[start:end]['SMA 20'], label='20 day rolling SMA', linewidth = 1.5)
plt.plot(priceSma_df[start:end]['SMA 50'], label='50 day rolling SMA', linewidth = 1.5)
plt.plot(priceSma_df[start:end]['SMA 200'], label='200 day rolling SMA', linewidth = 1.5)
plt.xlabel('Date')
plt.ylabel('Adjusted closing price ($)')
plt.title('Price with a single Simple Moving Average')
plt.legend()
Out [19]:
<matplotlib.legend.Legend at 0x11572bdd0>

png

See how the long term 200 SMA helps us to identify the trend in a very smooth manner.

This brings us to the triple crossover method. The set of moving averages we have used, with length 20, 50 and 200 day respectively, is actually widely used among analysts. We can choose to consider a buy signal when the 20 SMA crosses the 50 SMA from below, but only when both averages are above the 200 SMA. All the buy crosses that occur below the 200 SMA will be disregarded.

Three MAs

The examples in this article are just some of the many possibilities that can be generated by combining price and moving averages of different length. Also, price and moving averages can be combined with other available technical indicators or indicators that we can create ourselves. Python and pandas provide all the power and flexibility needed to research and build a profitable trading system.