How do the Lok Sabha Elections effect Sensex? w/ code

15 minute read

Created: September 16, 2016

There’s another version of this post available that doesn’t include the R code used in analysis and generation of figures.

Source

I came across a time series analysis of German Market in the light of federal elections and decided to try the same for BSE SENSEX(Bombay Stock Exchange Sensitive Index), which is a stock market index of 30 companies listed at Dalal Street, and Indian Lok Sabha Elections.

Getting Data

I downloaded the dataset from BSEIndia.com. I believe this should also be available on Quandl.

Loading Packages

A lot of functions and operators I’ll be using here aren’t part of base R. They require either one or more of these packages to work.

    packages <- c("ggplot2", "splines", "zoo", "changepoint", "dplyr", "scales", "gridExtra", "lubridate", "MASS")

    packages <- lapply(packages, FUN = function(x) {
      if (!require(x, character.only = TRUE)) {
        install.packages(x)
        library(x, character.only = TRUE)
      }
    })

I learnt this neat recipe from the Introduction to Data Analysis course by François Briatte and Ivaylo Petev

Loading Sensex

I had to open the data and add a column to it else it wasn’t read properly. I wrote Datum in the first line from a text editor. This shifts all the columns to the left.

    setwd("D:/Data Science/sensex")

    bse.index <- read.csv("SENSEX.csv", stringsAsFactors = FALSE)
    head(bse.index)

    ##           Datum Date Open High    Low Close
    ## 1  3-April-1979   NA   NA   NA 124.15    NA
    ## 2  4-April-1979   NA   NA   NA 122.85    NA
    ## 3  6-April-1979   NA   NA   NA 123.52    NA
    ## 4  7-April-1979   NA   NA   NA 124.18    NA
    ## 5  9-April-1979   NA   NA   NA 124.30    NA
    ## 6 11-April-1979   NA   NA   NA 125.01    NA

    bse.index$Date <- as.Date(bse.index$Datum, format = "%d-%B-%Y")
    bse.index$Close <- bse.index$Low

As the columns have been shifted to the left. bse.index$Low has the closing values of sensex. I had to fix that.

Lok Sabha Election Dates

The election date I chose is the date of counting of votes. When that was not available, I picked the dates when the next Lok Sabha started. There’s usually a gap of couple of days when the last Lok Sabha is dissolved and new Lok Sabha sworns in.

    elections <- as.Date(c("1980-01-18", 
                           "1984-12-31", 
                           "1989-12-02", 
                           "1991-06-20", 
                           "1996-03-10", 
                           "1998-03-10", 
                           "1999-10-06", 
                           "2004-05-13", 
                           "2009-05-16", 
                           "2014-05-16"), format = "%Y-%m-%d")

What’s next?

This isn’t very different from the source analysis I mentioned above. I initialize sensex as a zoo time-series. Paul Teetor’s R Cookbook mentions zoo and xts as essential time-series packages.

Here I also calculate something called rolling variance. Rolling variance can be also called moving variance. It is variance calculated on overlapping windows of time. Example, if window width is fixed at 10 then variance will be calculated on time values from 1-10 then 2-11 then 3-12 then 4-13.

rollapply() function comes in handy in this case, the first argument is data, second argument specifies the width of the window, third argument mentions the function that is to be applied and fourth argument tells it to remove any NA it comes across.

    sensex <- zoo(bse.index$Close, order.by = bse.index$Date)
    rolvar <- rollapply(sensex, width = 7, var, na.rm = T)

I’m not completely convinced by using rolling variance as a measure of volatility so I will also explore other quantities.

Plot of Sensex with Rolling Variance

    par(mar = c(4, 4, 0.2, 0.2), oma = c(0, 0, 0, 0), mfrow = c(2, 1))
    plot(index(sensex), as.numeric(sensex), type = "l", xlab = "Date", ylab = "BSE Sensex")
    abline(v = elections, col = "red")
    plot(index(rolvar), as.numeric(rolvar), type = "l", xlab = "Date", ylab = "rolling variance (7 days)")
    abline(v = elections, col = "red")

I see two things here,

Sensex has become volatile over time.
There is a slight hint that market is effected by the results, however it can only be seen in 2004, 2009 and 2014 elections.

Zoom In

Sensex volatality(in terms of rolling variance) 100 days before and after the election results.

    elections1 <- as.Date(c("1991-06-20", "1996-03-10", "1999-10-06", "2004-05-13", "2009-05-16", "2014-05-16"), format = "%Y-%m-%d")

    par(mar = c(4, 4, 0.2, 0.2), oma = c(0, 0, 0, 0), mfrow = c(2, 3))
    for (i in 1:6) {
        plot(index(rolvar), as.numeric(rolvar), type = "l", xlab = "Date", ylab = "rolling variance (7 days)", 
            xlim = c(elections1[i] - 100, elections1[i] + 100))
        abline(v = elections1, col = "red")
        text(elections1[i] - 55, 1000000, paste("Election in\n", format(elections1[i], 
            format = "%Y")), cex = 1.5)
    }

I see a sign that markets have started becoming nervous around the elections and that volatality has increased.

Need for Normalization

There’s still a problem here. Some large values in rolling variance are making it harder to see the complete picture. We need a measure that normalizes the data. Daily differences when measured in percentages might prove to be a good candidate here. They’re generally called returns.

Return on i day r_i is calculated as,

\[r_i = \frac{p_i - p_{i-1}}{p_{i-1}}\]

where p_i is the price on day i

    bse.index <- mutate(bse.index, 
           difference = c(NA, diff(Close)),
           lagged = c(NA, Close[-length(Close)]),
           rate = (Close/lagged) - 1)

I’ll ditch the zoo object and instead just use data frame bse.index because zoo objects don’t work with ggplot2 and I don’t see an advantage of using them right now anyway.

    ggplot(data = subset(bse.index, Date > "2000-01-01"), aes(x = Date)) +
      geom_linerange(aes(ymax = rate), ymin = 0) +
      aes(color = ifelse(rate > 0, "positive", "negative")) +
      scale_color_manual("", values = c("positive" = "#2ECC71", "negative" = "#C0392B")) +
      scale_y_continuous(labels = percent) +
      theme(legend.position = "none") +
      labs(x = NULL, y = "returns", title = "Daily Returns of Sensex since 2000")

This is a complete mess! I cannot infer anything from daily values. This calls for aggregation!

Aggregating Data

I’m in two minds about aggregating the data. This data is clearly volatile enough to express meaningful daily variations but this volatality is getting in my way of clearly seeing what’s happening. Or is it?

    bse.index$year <- year(bse.index$Date)
    bse.index$month <- month(bse.index$Date)

    bse.by_month <- bse.index %>%
      group_by(year, month) %>%
      summarise(mean = mean(Close)) %>%
      ungroup() %>%
      arrange(year)

    bse.by_month$Date <- ymd(paste(bse.by_month$year, bse.by_month$month, "01", sep = "-"))

If this also shows the variations then it would imply that the elections effect market remembers the elections for more than a day or two!

    ggplot(data = subset(bse.by_month, year > 2000), aes(x = Date, y = mean)) +
      geom_line() +
      geom_smooth(method = 'lm') +
      geom_vline(xintercept = as.numeric(elections), linetype = "longdash", color = "#444444", size = 0.1) +
      labs(x = NULL, y = "Monthly Averages", title = "Sexsex (since 2000)")

There’s a clear linear trend visible. This is an example of a non-stationary time-series. We can detrend it to make it stationary, in other words make it horizontal.

Detrending a time series isn’t complex to understand once we’re clear about what is to be done here. I want to remove the upward linear trend from this series without effecting the shape at all. Or I want to subtract the linear trend from the time series.

Make a linear model with variables on y and x axis.

    bse.by_month.2000 <- subset(bse.by_month, year > 2000)
    ct <- lm(bse.by_month.2000$mean ~ bse.by_month.2000$Date)

Linear model is essentially a line. For each point on x-axis, there’s a point on y-axis corresponding to the line. These set of points on y-axis are called fitted values. To detrend, I just need to subtract the data from fitted values for the whole x-axis.

y_i − (β₁x_i+β₂)

β₁ and β₂ are parameters from the linear model. This difference is called residuals in regression theory. There’s a handy function resid() that calculates it given the linear model.

Plot the residuals

    qplot(x = bse.by_month.2000$Date, y = resid(ct), geom = "line") +
      geom_vline(xintercept = as.numeric(elections), linetype = "longdash", color = "#444444", size = 0.1) +
      labs(x = NULL, y = "residuals", title = "Detrended Sensex (since 2000)") +
      geom_smooth(method = "lm")

Now this is somewhat smooth, I should plot the returns to see the effect more pronounced

    bse.by_month.2000 <- mutate(bse.by_month.2000,
           difference = c(NA, diff(mean)),
           lagged = c(NA, mean[-length(mean)]),
           rate = (mean/lagged) - 1)

    ggplot(data = bse.by_month.2000, aes(x = Date)) +
      geom_linerange(aes(ymax = rate), ymin = 0) +
      geom_vline(xintercept = as.numeric(elections), linetype = "longdash", color = "#444444", size = 0.1) +
      scale_x_date(date_breaks = "2 year", date_labels = "%Y") +
      aes(color = ifelse(rate > 0, "positive", "negative")) +
      scale_color_manual("", values = c("positive" = "green", "negative" = "red")) +
      theme_minimal() +
      theme(legend.position = "none") +
      scale_y_continuous(labels = percent) +
      labs(x = NULL, y = "return", title = "BSE Sensex Monthly returns since 2000")

Whoa a 20% crash!

    which(abs(bse.by_month.2000$rate) > .2)

    ## [1]  94 100

    bse.by_month.2000[seq(92, 102), ]

    ## # A tibble: 11 x 7
    ##     year month      mean       Date  difference    lagged        rate
    ##    <dbl> <dbl>     <dbl>     <date>       <dbl>     <dbl>       <dbl>
    ## 1   2008     8 14722.133 2008-08-01  1005.94909 13716.184  0.07334030
    ## 2   2008     9 13942.808 2008-09-01  -779.32538 14722.133 -0.05293563
    ## 3   2008    10 10549.647 2008-10-01 -3393.16012 13942.808 -0.24336276
    ## 4   2008    11  9453.957 2008-11-01 -1095.69083 10549.647 -0.10386042
    ## 5   2008    12  9513.584 2008-12-01    59.62762  9453.957  0.00630716
    ## 6   2009     1  9350.417 2009-01-01  -163.16729  9513.584 -0.01715098
    ## 7   2009     2  9188.033 2009-02-01  -162.38437  9350.417 -0.01736654
    ## 8   2009     3  8995.451 2009-03-01  -192.58163  9188.033 -0.02096005
    ## 9   2009     4 10911.198 2009-04-01  1915.74724  8995.451  0.21296845
    ## 10  2009     5 13046.145 2009-05-01  2134.94626 10911.198  0.19566561
    ## 11  2009     6 14782.468 2009-06-01  1736.32368 13046.145  0.13309094

I need to look in the daily charts for this,

I will also change the positive color to blue as this shade of green isn’t apparent in front of the white background.

    ggplot(data = subset(bse.index, year == 2008), aes(x = Date)) +
      geom_linerange(aes(ymax = rate), ymin = 0) +
      facet_wrap(~month, scales = "free_x", ncol = 3) +
      scale_y_continuous(labels = percent) +
      aes(color = ifelse(rate > 0, "positive", "negative")) +
      scale_color_manual("", values = c("positive" = "blue", "negative" = "red")) +
      scale_x_date(date_breaks = "1 week", date_labels = "%W") +
      theme(legend.position = "none") +
      labs(y = "return", x = "Week", title = "BSE Sensex in the year 2008")

I’m thinking mean change(in percent) for the month might be better able to represent the volatility through the month.

2008 has surely been a rough year. The 10% fall visible in the month of October is the largest crash ever from BSE Sensex. Let’s check how other years compare to this.

Look at the 5th month! The results came out on 16th May, 2004 and clearly the market was more nervous than anytime in the year! If I’m not wrong, I should see a similar trend in the year 2009.

Trivia: Stable governments imply date of elections do not change. Results for general elections have been coming out exactly around 3rd week of May every 5 years in 21st century.

I can set the y-axis scale free to adjust itself for the values in each facet but that wouldn’t emphasize the difference between May and other months.

“Investors rejoiced the coming of a stable government in power. Bulls got encouraged as investors were screaming to buy in apprehension that stability will come in the economic policies,” brokerage firm Unicorn Financial CEO G Nagpal said.

There’s a clear spike of massive 15%. Market looks like in indifference till the results are announced followed by a sigh of relief. Markets like political stabiity. 2009 elections meant 5 more years of nearly the same economic policy. [1]

Back to the election spike, Is this statistically significant? Is that even a right question to ask?

I’m exciting to see how does Modi’s election look like. If I remember correctly Congress never even looked like they had a chance. There was some nervousness about a messy coalition government involving the 3rd front but market would have loved as one party got the required majority to make the government.

Look at the 5th panel, this is interesting! I was completely wrong! There’s no sign of nervousness for the chance of a coalition government, which is odd.

Log-Returns

Financial analysts love log-returns. Paul Teetor mentions in R Cookbook that volatility is measured as the standard deviation of daily log-returns.

This makes log-returns relevant to the subject of this post. In my quest to find “How does Lok Sabha Election effect Sensex”, I’ve failed to find a direct correlation between the two. Using log-returns I can check if it makes the market volatile or, in a way, nervous. :P

Let’s plot a couple of graphs to see where do they take us. I’ll make use of the fun themes in ggthemes package.

    library(ggthemes)

    elections <- data.frame(Date = as.Date(c("1980-01-18", "1984-12-31", "1989-12-02", "1991-06-20", "1996-03-10", "1998-03-10", "1999-10-06", "2004-05-13", "2009-05-16", "2014-05-16"), format = "%Y-%m-%d"))
    elections$year <- year(elections$Date)

    bse.index <- mutate(bse.index,
                        log.returns = c(NA, diff(log(Close))))

    ggplot(data = subset(bse.index, year >= 1995), aes(x = Date, y = log.returns)) +
      geom_line(color = "#C0392B") +
      theme_economist() +
      labs(x = NULL, title = "BSE Sensex log-returns") +
      geom_vline(xintercept = as.numeric(elections$Date), linetype = "dotted", color = "#444444", size = 1, alpha = 0.75)

Also take a look at how volatility has changed over time. For this I need to calculate the rolling volatility in Sensex using rollapply(). Did I mention that autoplot() can be used with zoo objects to make them compatible with ggplot layers?

    ret <- zoo(bse.index$log.returns, bse.index$Date)
    rolvol <- rollapply(ret, width = 7, FUN = function(x) {sqrt(252) * sd(x, na.rm = TRUE) * 100}, fill = NA)

    autoplot(rolvol) +
      geom_vline(xintercept = as.numeric(elections$Date), linetype = "dotted", color = "red", size = 1, alpha = 0.75) +
      theme_economist_white()

I need a grid to see the volatility around elections!

    p <- list()

    for (i in 1:10) {
      p[[i]] <- qplot(x = index(rolvol), y = coredata(rolvol), geom = "line") +
      coord_cartesian(xlim = c(elections$Date[i] - 100, elections$Date[i] + 100)) +
      geom_vline(xintercept = as.numeric(elections$Date), linetype = "dotted", color = "#C0392B", size = 1) +
      labs(x = NULL, y = NULL, title = as.character(elections$year[i])) +
      scale_x_date(date_breaks = "2 month", date_labels = "%b")
    }

    do.call(grid.arrange,c(p, ncol = 2, top = "Volatility in BSE Sensex with General Elections"))

This looks like the conclusive graph I was looking for. Sensex was nervous when an economist became the prime minister. Hah!

So, do the Lok Sabha elections make Sensex nervous? Well, it only made them nervous in 2004 and 2009. There’s a kind of apparent indifference that is widespread here.

Addressing the real oddity in the room,

Why was market so indifferent to general elections before 2000? (and even 2014)

One simple explanation is that market doesn’t care about elections at all and 2004, 2009 were plain coincidences.

Another explanation might be that indifference is a negative reaction. I’m probably missing something substantial as 1991 was a crucial year for Indian Economy but the market charts don’t show that at all!

    bse.by_month <- mutate(bse.by_month,
                           lagged = c(NA, mean[-length(mean)]),
                           change = (mean/lagged) - 1)

Grid of charts,

    p <- list()

    for (i in 1:10) {
      p[[i]] <- ggplot(data = subset(bse.by_month, year == elections$year[i]), aes(x = Date, y = change)) +
        geom_bar(stat = "identity") +
        aes(fill = ifelse(change > 0, "positive", "negative")) +
        scale_fill_manual("", values = c("positive" = "#2ECC71", "negative" = "#C0392B")) +
        scale_y_continuous(labels = percent) +
        theme(legend.position = "none") +
        scale_x_date(date_breaks = "2 month", date_labels = "%b") +
        labs(x = NULL, y = NULL, title = as.character(elections$year[i])) +
        geom_vline(xintercept = as.numeric(elections$Date), linetype = "dotted", color = "#444444", size = 1, alpha = 0.75)
    }

    do.call(grid.arrange,c(p, ncol = 2, top = "BSE Sensex Monthly Returns"))

When I switch to daily variations,

    p <- list()

    for (i in 1:10) {
      p[[i]] <- ggplot(data = bse.index, aes(x = Date, y = rate)) +
        geom_line() +
    #    aes(color = ifelse(change > 0, "positive", "negative")) +
    #    scale_color_manual("", values = c("positive" = "#2ECC71", "negative" = "#C0392B")) +
        scale_y_continuous(labels = percent) +
        theme(legend.position = "none") +
        scale_x_date(date_breaks = "2 month", date_labels = "%b") +
        labs(x = NULL, y = NULL, title = as.character(elections$year[i])) +
        geom_vline(xintercept = as.numeric(elections$Date), linetype = "dotted", color = "#C0392B", size = 1) +
        theme_minimal() +
        coord_cartesian(xlim = c(elections$Date[i] - 100, elections$Date[i] + 100))
    }

    do.call(grid.arrange,c(p, ncol = 2, top = "Returns in BSE Sensex in election years"))

This isn’t all that helpful. There’s something important that’s still missing. We need insights!

Other Analysis Methods I Missed

Changepoints Analysis

Changepoint algorithms find abrupt variations in the series. [2]

    ggplot(data = subset(bse.index, year == 2004), aes(x = Date, y = Close)) +
      geom_point(color = "blue", alpha = .8) +
      geom_line(color = "gray80") +
      geom_smooth(method = "rlm", formula = y ~ ns(x, 12), alpha = .25, fill = "lightblue") +
      labs(x = NULL, y = NULL, title = "BSE Sensex with a cubic spline") +
      geom_vline(xintercept = as.numeric(elections$Date), linetype = "dotted", color = "#C0392B", size = 1)

I can also add changepoints to this mix,

    cpt2 <- cpt.mean(subset(bse.index, year == 2004)$Close, method = "BinSeg")

Add the segments, and a vertical blue line to show election

    plot(cpt2, main = "BSE Sensex in 2004 w/ changepoints")
    abline(v = 91, col = "blue")

That looks good! Election is a changepoint! You might have noticed the magic number 91 that comes out of nowhere. I didn’t make that up. I can’t overlay date as-is because they’re not the same data-type. However, I can show

    bse.2004 <- subset(bse.index, year == 2004)
    bse.2004 <- dplyr::select(bse.2004, Date, Close)
    bse.2004 <- na.omit(bse.2004)
    which(bse.2004$Date == elections$Date[8])
    ## [1] 91

Decomposition

I just learnt that base R supports a time series class that can be initialized with ts(). It has some cool functions, one being decompose. This should be an aside.

    sensex.ts <- ts(bse.by_month.2000$mean, frequency = 12, start = c(2000, 01))
    d <- decompose(sensex.ts)
    plot(d)

Intervention Analysis

There’s something called intervention analysis that was referred by a lot of people in this question at Research Gate. Someone here also mentions introducing dummy variables which I didn’t quite understood.

Feedback

This post was my attempt to learn more about analysis of data and time serieses. Please let me know your thoughts and if I missed something!

Footnotes

That makes me think how much difference is really there in the financial policies of UPA and NDA governments, is it possible to measure?
changepoint: An R Package for Changepoint Analysis, Rebecca Killick and Idris A. Eckley, 2013