Saturday, 20 August 2016

One remarkably consistent aspect of test cricket

It seems that James Vince and I have at least one thing in common: despite starting the season full of high hopes, neither of us have had a very prolific summer. I haven't blogged much of late, and indeed I haven't paid as much attention to England's test summer as I normally would, due to various other things that have occupied my time and brain space. This is a shame for me, because the series with Pakistan seems to have been a great one, judging by the bits of coverage I did catch.

In today's return to the statistical fray, I was interested to have a look into how the relative importance of different parts of the batting order has changed over time in test cricket. For instance, it is a well worn claim that tail enders are better batsmen than they used to be- does this mean teams now rely on them more for runs compared to other parts of the team? England have relied heavily on their lower-middle order of late- is this part of a trend or just how things are in one team right now?

To get a sense of this I divided the batting order into 4 parts: openers (1-2), upper middle order (3-5), lower middle order (positions 6-8) and tail (9-11) and looked at the percentage of runs off the bat each part of the order contributed in tests in each year since 1946.

I don't want to undermine my own blog too much, but the result was the most strikingly featureless dataset I have ever written about- as you can see in the graph below. The points show year by year data and the lines show 10 year averages.

Consistently, openers get about 26% of the runs, positions 3-5 get about 41%, numbers 6-8 get 25 % and the tail about 8 %. This has barely changed at all in the last 70 years.

The one small trend you can pick up is that the gap between openers and the lower middle order closes over time from a position where openers were contributing 3-4% more than numbers 6-8
up until the present day when the two contributions are basically equal (openers 25.3% vs lower middle 25.7 % over the last 10 years). This change is consistent with the increased batting role of wicket keepers which we discussed in the last post. There is a big uptick in the lower middle order data just this year, that stands out as rather an outlier- this part of the batting order has made 32.7% of the runs 2016, several percentage points above the long term average. This is in large part driven by England's reliance on that part of the line up- fully 42.6% of England's runs off the bat have come from numbers 6-8 this year. I expect the global figure (and probably England's too) will regress to the mean a bit before the year is out.

Positions 3-5 consistently provide the biggest slice of the run scoring pie. The difference between their contribution and the openers is a couple of percentage points larger than can be explained by the fact there's simply one less player in the openers category. This is consistent with the notion that teams tend to put their best batsmen somewhere between 3 and 5.

Batsmen 9-11 meanwhile, for all the talk of improving tail enders, have chipped in with about 8%  of the teams runs extremely consistently all this while and show no signs of changing.

Plus ca change, plus c'est la meme chose.


Thursday, 26 May 2016

Charting the evolving role of the wicketkeeper

Last week's test between England and Sri Lanka belonged to Jonny Bairstow. A century on his home ground and a match winning one at that- rescuing England from 83-5 and dragging them to a total out of the reach of Sri Lanka's callow batting line up. Behind the stumps in his role as wicketkeeper he took 9 catches, making it an all round good 3 days at the office.

Bairstow is an example of what would seem to have become a pretty established pattern for the modern test match side: picking your wicketkeeper with a heavy emphasis on their willow-wielding ability, and a lesser focus on their glovemanship than might have been seen in previous generations. I don't think I'm going too far out on a limb to suggest that Bairstow is not the best pure wicketkeeper available to England, but out of the plausible  keeping options he's the best of the batsmen, at least for the longer format.

This has made me wonder: how much has the wicketkeeper's role evolved over time? How much more are teams relying on their keepers to score runs? And has an increased emphasis on the batting prowess of keepers had a measurable cost in their performance behind the stumps?

The simplest thing to think would be that picking keepers based on their batting would come at a price in catches and stumpings. But can this be seen in the data?

I particularly enjoyed researching this post, not least because answering those questions will take not one, not two, not three but four graphs.

First of all, the run scoring. The graph below shows the run scoring output of designated  wicketkeepers, as a percentage of total runs scored by batsmen in tests from 1946-2015. The red points are the year by year data and the blue line is the decade by decade average. The decade by decade averages give you a better sense of the long term trends.




This data shows a clear evolution towards a greater dependence on wicket keepers to provide runs. Wicket keepers provided only 6% of runs in the immediate post-war period, but they now provide nearly 10%. This is, of course, very much in line with conventional wisdom. One thing that struck me, however is how steady this increase has been. I had expected to see a rather more dramatic increase in the 90s and early 2000s after Adam Gilchrist made the swashbuckling batsman-keeper cool, but the importance of the wicketkeeper's runs had been rising steadily for a while (with a bit of a dip in the 1980s).

But what of their behind the stump performance? If teams' enthusiasm for batsman-keepers is leading to a lower standing of keeping, one might expect that to be reflected in how wickets are taken. If keepers are worse than they used to be then perhaps modes of dismissal which depend on them- catches behind and stampings- will decrease relative to other, non-keeper dependent, modes of dismissal.

The next graph shows the percentage of total wickets that were catches by the keeper in tests from 1946-2015. (Again, red points=year by year, blue line=decade by decade)



Far from decreasing, the reliance on wicketkeeper catches to provide wickets increases steadily post 1946- over the same period that keeper run scoring was on the rise- before hitting a plateau around the 1990s. Modern wicketkeepers provide about 19% of the total wickets through catches, and that figure has shown any noticeable downward shift since keepers have been expected to provide more runs. It may well be that what this graph is telling us has most to do with the evolution wicket keeping and bowling styles rather than keeping quality, but in any case its true that modern teams rely on wicket keepers both for more runs, and for more catches than teams 70 years ago. As the responsibility of keepers has increased their responsibility as glovemen has not diminished at all.

Wicket keepers can also contribute to dismissals via stampings. This is a much rarer mode of dismissal than caught behind but, we some may argue its a truer test of wicket keeping skill. The graph below shows the percentage of wickets that were stumpings over the same period as the graphs above.



The contribution of stumpings to the total wickets decreases in the post war years- over the same period that the contribution of catches increase (perhaps reflective of a decrease in standing up to the stumps? I'm not sure). But it's held steady between 1.3% and 1.9% for the last 50 years. So, wicket keepers continue to hold up their end in whipping off the bails.

If we can't see any strong changes in wicket keeping contributions to wickets, what about other ways of measuring wicket keeping quality? Byes, for instance. The graph below shows the number of byes conceded per 1000 deliveries in test cricket from 1946-2015.

The rate of conceding byes has hardly changed in 70 years. Looking at the decade by decade trends you could argue that it was on a steady decrease up to the 90s before taking an uptick, but these changes are miniscule- corresponding to maybe 1 extra bye conceded in a 1000 deliveries.

So, while its clear that more runs are indeed required of the modern keeper, the expectations behind the stumps have not shifted that much. Keepers contribute a consistent ~19% of wickets through catches with an additional ~1.5% through stumpings. They concede about 7 byes per 1000 balls and have barely budged from that for 70 years. Considering that the expectations on their batting have increased, while they have remained steady in other aspects of the game, keepers arguably have more on their plate than ever before.



Monday, 16 May 2016

Reverse Swept Radio

This week I had the pleasure of being interviewed by Andy Ryan on the excellent Reverse Swept Radio podcast. If you would like to hear me talk about cricket, stats and this blog, the link is here:

http://reversesweptradio.podbean.com/e/rsr-81-a-cricket-podcast/

Friday, 13 May 2016

How much more valuable are first division runs?

England announced their squad to play Sri Lanka this week, with Hampshire's James Vince getting the nod to take up the middle order slot unfortunately vacated by James Taylor. Nick Compton, meanwhile, keeps his place at number 3, at least for the time being. Essex's Tom Westley, who has had a productive start to the season and has been much talked up, was left out (I was hoping he would be picked, but not for any cricketing reason- I just wanted the opportunity to make some Princess Bride jokes).

As England squad selections draw near, with places up for grabs, attention often turns to the county championship averages. One of the few things everyone seems to agree on at this point is that runs made in the first division of the championship should be valued more highly, being made against higher quality attacks. This seems eminently reasonable, but raises a question: how much more valuable are they? Can we make the comparison quantitative?

I'm going to have a go.

What we want is to take a sample of batsmen who played in both divisions in successive seasons and ask, on average, how much did their run output drop/rise on switching divisions. Such a sample is provided to us by the championship's promotion and relegation system.

What I've done is go through the county averages for all the completed seasons since 2010, looking at the performance of players in teams that were relegated or promoted and then comparing their season's batting average before and after the change of divisions. (So, for example, I took the batsmen who played for Kent in division 1 in 2010 and compared each batsman's average to what they managed in division 2 in 2011).

I only included batsmen who played at least 10 matches in both seasons. The results are depicted in the graph below. The batting average in division 2 for each batsman in the sample is one the x-axis, with division 1 on the y-axis. Players in relegated teams are in red, promoted teams in blue. Points below the black line averaged higher in division 2 than division 1, and above vice versa. The green line is the best linear fit to the data.

Of the 81 players in the sample, 52 averaged higher in division 1 and 29 averaged lower. So, the intuition that runs are harder to get in division 1 seems solid, as expected. But how big is the difference?

Well, on average the relegated players in the sample increased their averages by 4.98 runs on going from division 1 to division 2. The promoted players saw their averages drop by an average of 7.12 runs on going from division 2 to division 1. So based on those numbers the difference is moderate but noticeable- able to turn a "very good" set of numbers into merely "good" ones and "good" into merely "acceptable".

The linear fit which I attempted (which should be taken with absolute ladelfulls of salt) gives:

average in div 1=28.2 + 0.12 * (average in div 2)

so it would predict a player who averages 50 in division 2 to average only 34.2 in division 1. (As I say, don't take this equation too seriously, and possibly not seriously at all, not least since it predicts that players averaging less than 32 in div 2 should be expected to do better in div 1).

There is a chance that the difference between divisions is exaggerated in this data by a selection bias. Specifically, looking at players who were promoted from div 2 or relegated from div 1 may bias the sample towards players who under-performed their "true" ability when in div 1 or over-performed in div 2. In this case the shift in batting averages may in part be a case of regression to the mean, on top of the real change in the difficulty of run-getting.

This caveat notwithstanding, the difference in divisions seems quite considerable, and division 1 runs are worthy of their additional praise.

Thursday, 5 May 2016

The candidates

Despite its title, this is not a surprise post about the extraordinary political wranglings currently in full swing in the land of baseball and chilli-dogs. No, this will be about the far weightier matter of whether certain batsmen are especially susceptible to being pinned LBW, and who those current players are.

In cricket commentary, it's common for players whose technique looks somehow prone to leave them trapped in front of their stumps to be described as "lbw candidates". This terminology seems to be applied specially to that particular means of dismissal- batsmen are rarely described as "caught behind candidates".

The questions I want to investigate in today's post stem from this.

Firstly, is "lbw candidate" a worthwhile category- is there a substantial subgroup of modern test batsmen who are especially more lbw prone than their peers?

Secondly, who are these prime candidates in the post-Shane Watson era? I've often heard Alastair Cook described as a "candidate". Does he deserve the title?

We'll also be touching on where in the world lbws are most prevalent.

To tackle this, I took a sample of 45 current test match players, representing all the test nations apart from Zimbabwe, who haven't had much opportunity to play recently. The sample was obtained by taking the most recent test for each nation and including all the batsmen in he top 7 who had played at least 15 tests and who weren't obvious night-watchmen. For each player I looked up the total number of LBW dismissals in their test career and divided it by the number of dismissals overall. This is what is on the x-axis of the graph below, with the batting average of each player on the y-axis. The colour/shape of each point indicates the country for which the batsman plays.

The black dashed line is the sample median (0.155) and the red dashed lines either side are the upper (0.187) and lower (0.125) quartiles. As you can see, the data is quite clustered horizontally suggesting only a fairly small degree of variation in vulnerability to LBW amongst current test batsmen. There's also no significant correlation between the LBWs/dismissal and the batting average, suggesting that having a high proportion of dismissals be LBW doesn't indicate much either way for a batsman's run scoring ability.

There are, however, a few noticeable outliers, far removed from the central cluster to whom we now come:


  • The Shane Watson memorial award for excellence in attracting LBW decisions (I like the idea of this award- we could call it the "iron pad" and award it annually) goes to South Africa's JP Duminy, who is way off to the right of the graph with 39% of his dismissals being LBW. (A lot of these were against spin bowlers).
  • There's a select trio of players to the left of the graph who hardly ever get pinned LBW. Namely Pakistan's Sarfraz Ahmed (0 lbws/28 dismissals), England's Ben Stokes (1/41) and Bangladesh's Tamim Iqbal (2/79). It may not be significant but these are all quite aggressive batsmen, so perhaps more than being good at avoiding LBWs, they're finding other, more exciting, ways to get out first.
  • There's a foursome of Pakistan players separated from the main cluster, at around 0.25 LBWs/dismissal. These are: Younis Khan, Misbah ul Haq, Asad Shafiq and Mohammed Hafeez. It's tempting to wonder whether this might be because they play a lot of tests in the UAE, where the low, slow pitches are thought to be favourable for LBWs. Indeed, in the graph below you can see that the UAE does have the highest rate of LBWs per dismissal of top 6 batsmen amongst test match hosts since 2010. However, this probably doesn't fully account for it- if we exclude tests in the UAE for these four players only Hafeez sees his percentage of LBWs drop significantly.


Overall, modern test batsmen don't vary too much in how frequently their pinned leg before, with a small number of exceptions. For what it's worth, Alastair Cook falls close to the central cluster of data points in our first graph, albeit slightly on the high side, with a rate of 0.19 LBWs/dismissal. And with Pakistan's apparently quite LBW prone top order coming to England this summer, it could be quite a good season for the thump of ball on pad, and the slowly raised finger. Maybe.

Saturday, 16 April 2016

Throwing out the form book?

So it's been quite a while since I posted anything here, but with the thrill of a new English cricket season upon me, I'm strapping on my pads of data, taking up my bat of analysis and striding out to the wicket of the internet.

As I scratch around, hoping to hit a bit of early season form, I'm going to attempt some rudimentary analysis of exactly that concept- "form". The point of this blog is meant to be try and hold up some of cricket's hoariest old cliches and nuggets of received wisdom to the light of some data. The idea of being "in form" is surely one of the foremost such cliches in cricket- perhaps in all of sport.

The eseential claim is this: a player is more likely to perform well at times when they have performed well in the recent past. A player who has performed well recently is usually said to be "in form".

The explanations for this tend to hinge on a player's confidence being high when their recent performances have been good. Or people may speak about players "being in a good rhythm", or "in a good place".

But to what extent is "a run of good form" distinguishable from a run of good luck? You sometimes hear commentators say something along the lines of:

"when you're in good form, it's amazing how the little bits of luck start going your way as well- playing and missing rather than nicking it,  balls in the air going between fielders rather than to them..."

At which I might want to say to them: "Is it amazing? Is it though? Or is it just that you only assign players the property of "good form" when they happen to be on a good run of scores- which requires a certain amount of luck?".

I'm not going to attempt a full analysis of whether form is a "real" phenomenon- in the sense of being meaningfully predictive of future performance- in one blog post. Although I may come back to different aspects of the question later.

I do, however, have some data to show which impacts on this question and I think it's interesting.

To make the question narrower, and therefore more tractable, I asked: "are test match batsmen more likely to score a century when they have already scored a test century in the last month?"

To answer this, I looked at the careers of the 23 most prolific test match century scorers in history. I did this because I needed a sample of players who had scored enough centuries that one could meaningfully compare the games when they hadn't scored one recently, with games where they had. Obviously, this does introduce quite a big selection bias- it's possible that the results I obtain may only be applicable to those players at the very top of cricketing history's tree. So be aware of that when you decide what to think of the results.

The graph below shows the rate of century scoring per match in games within a month of having previously scored a century against the total number of centuries scored per match for each player.
Points above the blue line represent players who had a higher rate of century scoring when they had recently scored a century and those below the blue line represent players who had a lower rate of century scoring when they had recently scored a century. The lone point way off to the top right of the graph is, of course, Sir Donald Bradman.

As a group these batsmen scored an overall total of 723 centuries in 2945 games- a rate of 0.246 centuries per match. In games within a month of having scored a test century my research puts them at a total of 182 centuries in 704 games- for a nearly identical (but slightly higher) rate of 0.258 centuries per match. On an individual level 11 of the players were more prolific when they'd recently hit a hundred and 12 were less so. For most players the difference was minor, as indicated by the fact that most points in the graph fall fairly close to the blue line.

There isn't enough evidence here for me to boldly claim that form makes no difference to batsmen. But it does suggest that form doesn't matter as much as you might imagine, at least for this sample of batsmen who belong among history's greatest.

For those out of form I would say this: take heart- form is an ephemeral thing which can return as suddenly as it departs. And maybe it doesn't matter so much whether you have it or not.

Friday, 25 December 2015

Festive Tidings and the Exceptional AB de Villiers

Today, I bring you a festive look at the data! I mean, not that there's anything particularly Christmassy about the content of this blog post- but hey, it's Christmas, there are mince pies in the oven and I'm writing about the batting statistics of wicket keepers. To me, that's festive.

This time around, the piece of cricketing received wisdom coming under the microscope is the belief that when a batsman who's able to keep wicket has to do so, it impacts negatively on their run scoring ability. The most famous (and, not coincidentally, also the most extreme) example of this is Kumar Sangakkara who averaged an acceptable 40.48 when playing as a wicket keeper and a stellar 66.78 when playing as a specialist batsman. It seems reasonable to believe that the physical and mental strain of long periods of wicket keeping would make run scoring harder, but the same could be said of the pressure of the captaincy- and we saw in the last post that captaincy actually seems not to generally make so much difference to run scoring output.

I actually prepared the research for this post a while ago, but didn't write a post on it because- as you'll see below- there isn't so much to work with in this case, and I worried that there wasn't enough numerical meat to make a satisfying analysis. However, the issue came up on the superb Switch Hit podcast this week- in the context of AB de Villiers' stewardship of the keeper's gloves for South Africa- and I thought that since it's an interesting question, I might as well write it up. Decide for yourselves whether the data justifies the conclusions.

So, what we want to do is take some test match players who've played a decent number of tests both as wicket keeper, and as a specialist batsman and compare their batting averages in those two sets of games. The problem is that there are very few players who fit that description. Specifically, I could find only seven players who played both at least 10 tests as the designated wicket keeper and at least 10 not as the wicket keeper. That rather select club is listed in the table below

In the graph below, I've plotted the batting average when playing as keeper against the average when not playing as keeper for each player. Players falling below the blue line have worse averages when playing as wicket keeper and those above have better batting averages when granted the gloves.

Seven players isn't much to draw a conclusion from but nevertheless, the evidence in this case weighs in favour of the received wisdom- it does seem that having to keep wicket depresses a batsman's average. Of our seven players 2 have better averages when playing as keeper and 5 do worse. That in itself could easily just be chance, but what's more notable is the players who are doing worse as keeper tend to be doing rather a lot worse, suggesting that there is a potentially rather a strong effect at play. The average difference between averages when keeping and not our sample was -10.19 runs- less extreme than Sanga's -26.3 but a pretty big difference all the same.

Which makes AB de Villiers' bucking of the trend all the more special. He averages fully 8.83 runs higher when keeping. Of course, this won't necessarily last. It's quite possible - maybe even likely - that if he stays as South Africa's first choice gloveman for a couple more years his average as keeper will regress back in line with his average when not keeping - or even below. Or perhaps - as he has in many other ways - de Villiers will prove to be exceptional in the truest sense of the word.

I want to finish this post by thanking you all for reading and to particularly thank Chris of the excellent blog "Declaration Game" for kindly promoting my blogging over the last 6 months. I was honoured to be included in his "Select XI" blog posts of the year, which if you haven't seen it yet is well worth a look- providing a very broad cross section of some extremely interesting cricket writing.

Merry Christmas!