Saturday, 28 March 2015

Needless lines in the sand

I just read a blog post from Andrew Hill, a guy whose work I admire greatly, entitled In defense of burger cartography. Go have a read if you've not already done so then come back...I've got a few things to say.

I've been quite vocal in my annoyance with many so-called new cartographers and their general fancy for creating needless dividing lines between their self-defined fresh approach and pretty much everything that ever went before. They tend to eschew previous thinking, research, technologies and people and view this as a huge positive. I'm afraid I read Andrew's blog in very much the same vein so here's a few quotes I've extracted that I'd respectfully like to contest:

"time to fall in love with maps again"
You'll likely find most of us never fell out of love with them. Is this perhaps just a rallying cry to further support the belief that all this new stuff is unequivocally super awesome? Cartography has always been great. I'm glad so many are finally finding it too but c'mon...join the club, don't try and start a new one.

"what I'm seeing has never been talked about before"
I'd strongly refute this observation. Most of what I see has been done before and has been talked about before. It's more likely that the people who offer this view simply aren't familiar with what's gone before or don't care to look beyond their own limited experience. Sure, technology is evolving and there's some new ways of making maps but largely we've seen it before in some shape or other. What is perhaps true is that many weren't around when it was done before but we have books and such like to reference. And why is it that people like me pipe up with blogs? It's precisely because we want to expose these very same people to stuff some of us already know. That's not being's about knowledge transfer and sharing. It's why I enjoyed being a lecturer for so long and why I now enjoy teaching about maps from within a different type of organisation. It's also why I wrote a daily blog last year to expose people to stuff they may not have seen before. The trouble is that it's a two-way process and does require people to want to learn and I'm afraid I find far too many who simply think they already know it all. None of us do. And the other reason why many perceive that so much hasn't been talked about before...because they don't inhabit the same places that many others inhabit so there's an immediate disconnect. I rarely see new-mapmakers at any of the established cartographic conferences where much of this stuff has been talked about for years (actually, NACIS in the US tends to buck this trend and to be absolutely fair to Andrew I did meet him at a NACIS conference a couple of years ago). But more mapmakers tend to inhabit a world of meetups and hangouts and don't go to where cartographers have historically tended to hang out. They are unlikely to be seen at the International Cartographic Conference for instance and that has to be by choice. Wouldn't it be so much better for us to all meet in the same sort of places where we can share knowledge, research, learn some rules and break some others? I know many of the established conferences and other meetings have been very keen to embrace new map-makers but take-up has been low. By the way...this year it's in Rio in August...maybe I'll see you there along with 1,000 or so other people who love maps?

"maps shouldn't remain difficult to create just because old school software hadn't been brought up to speed with modern technology"
Another fallacy. I recall a fantastic piece of software called Atlas Mapmaker I used (and taught with) for thematic mapping in the early 1990s. It was brilliant. I used it alongside Arc/Info and IDRISI. Each brought different things to the table. Sadly, Atlas Mapmaker is no more but other software has come and gone in its place and I think it's wrong to suggest there's a divide between old school software and other newer software. In the last 5 years more mapping software has come along too. So what do we class as 'old school'? I'm guessing this is a reference to software that's been around a fair while but look through the history books and you'll find it morphs, changes and updates regularly. It doesn't just keep pace, it sets standards. Yep, every now and again something new pops up and offers something a little different but I'd suggest that those who like to bash the so-called old-school software likely have never properly used it; they likely prefer to hate for hating's sake; and more than anything like the idea of being seen as new and exciting alongside something they proclaim as new and exciting. Let me put it like this...if you're SwissTopo there's software you use and software you don't use. Journalists might prefer a different approach. It's all fine and as I've said before I use many different pieces of software to make maps and don't feel the need to define and demarcate how I make maps by buttonology. After all, software is a tool that helps you get something done. Without knowledge of concepts, practice and a strong appreciation of the rich tapestry of cartography you're going to have to be pretty lucky to hit those buttons correctly enough to make a decent map. Maps have never been difficult to create. You just need time to figure out how to make it and what's going to help you make it. Knowing something about cartography gives you way more than the tool you use to make the map. It empowers you to know what you're doing, how to design and, crucially, which rules you are able to bend and break more than others. That said, yes of course we should expect modern software to take advantage of modern technology and it should also push what's possible. If I still had to use PC Arc/Info 3.4D to make maps I'd suggest we've not progressed very far. Except it has progressed. So has every other piece of software too.

"goodbye old world"
Why? If we always threw away what went before we'd be in a scientific and artistic wasteland. Build on the past and embrace the future would be my preference.

"people often get self-referentialy dogmatic about some previously decided laws of cartography"
I'm guessing I'm included in this accusation. Saying that rainbow colours are poor choices on some map types or that it's important to normalize data on a choropleth is neither self referential or dogmatic. It's fact. It's in books. It's based on research. It's the sort of established practice that helps us make better decisions when we make maps. That's all. No-one ever called them a law. It's called best practice and the reason I bang on about it is because so many people don't get it. Sure, lots of stuff can be challenged and that's fine but let's not set out to decry the essence of well established, useful knowledge and understanding. There's a language to graphicacy that has evolved and helps guide us regardless of what our individual motivation for making a map is.  And while we're on about rules and conventions here's where I get a little confused. Many new mapmakers make their maps using code. Now those who know me know my coding skills are not what they used to be and I'm impressed by people who want to code their maps and who do so. Code requires rules and conventions. If you don't follow the rules and conventions your code will likely not work or perform poorly. If everyone ignored the basic rules and conventions it'd be anarchy and you'd all be calling each other out on github over rotten, unusable and unhelpful code. There's a practice; a syntax; a best practice to writing elegant and purposeful code. Of course, not all code is written equally just as not every map is made the same, but whether you want to call them laws or simply see things as best practice it helps to have guides. So I wonder why the sort of people who are happy to abide by general rules that govern coding in a particular way seem more than happy to tell us that cartographic code, rules and best practice are simply there to be broken. If that isn't double standards then what is?

"like many other fields, cartography is changing fast"
Sure is! But what exactly is this change? Conventional commentary suggests it's that technology allows us to do more, more easily. But is it easier to make maps than before? Let me go back to Atlas Mapmaker...I could make a map in minutes using that from a simple text file out of Excel. Sound familiar? And that was over 20 years ago. Sure, now we have the web as a place to make maps but I think what characterizes the change in cartography isn't the's the people. More people make maps now. It's not just cartographers. It's everybody. It's fantastic. Let's not change cartography because, frankly, I don't believe it needs changing. Let's evolve it...together. What seems to be changing faster than anything is the number of people who seem intent on redefining it for little or no reason.

"the dogma of cartography is certain to be overturned by new discoveries"
I hope not. I hope that it's challenged and it evolves as it has done for centuries but why is everyone so hell-bent on revolution? Yes, people can explore cartography but it's not because they are doing so "outside the bounds of comfort for traditional cartography", It's because they are either unaware or not bothered to engage with it before they make their map. Of course new ways of seeing, talking about and doing cartography will come about because people have a natural desire to experiment. That's not the preserve of people new to map-making either. If I made every map the same I'd be bored to tears but if I come up with a new way of seeing a theme or how I wrestle with symbology or whatever I see it as adding to the mix...not as a way to overthrow cartography altogether. There's probably not a week goes by when I don't see something a little bit new...but rarely something that makes me reconsider the very nature of cartography itself. So why do some wish to set out to seek a way of destroying cartography with the implicit belief that new has to be better?

"welcome on board the journey for the new world of cartography, your old world criticism has missed the boat"
Probably, because no-one likes criticism and I'm acutely aware that everyone hates a critic...particularly one with opinions. Worse, one with opinions based on some understanding of what they're talking about. I'll likely get heat for this blog too and be painted as just that grumpy guy who thinks all new cartography should be immediately burned at the stake. The truth couldn't be more different. I embrace new stuff but it doesn't mean everything is awesome right? It's getting harder and harder to even suggest that something doesn't work and that's a problem generally and not specific to cartography or map critique.  Most people don't actually understand the role of critique anyway. They seem totally unable to separate a critique of their map from some perceived personal attack or unwarranted and unsupported criticism.  If I wanted to have a pop at someone I'd probably do it Jeremy Clarkson style (I never have by the way...). If I have something to say about your map I'm talking about your map and not you.  I may do it in a way that you don't care for but it's about the map, not the person and by the way, I enjoy people commenting on my maps because it makes them better.  Making a playful map shouldn't absolve it from critique anyway and the intent is not to provide critique as a way to reinforce established values. It's to provide a critical eye on what the author claims or what the technique claims...or more likely what the author claims about the technique. Does it hold up for instance? Yes, they often capture attention (because they shout very loudly) but dig a little deeper and they often hold little more than fleeting fascination. If critique is perceived as in any way negative then it's all too often explained away as a result of the guy offering the critique as being stuck in the past. Really? There's a chilling arrogance by far too many who seem unashamedly unwilling to learn anything from people who have been there and done it (I can see eyes rolling at the mere mention of that) or that critique is a valuable way of debating claims and techniques. But it's a sad state and it's spreading beyond the map and into map education. Even this week I was left totally bemused that someone (who I won't name) was openly proclaiming they had a eureka moment about something you'd likely learn in any high school geography class. And this person is paid by a mapping company to teach about geo and making maps. They self-proclaim they want to help people learn but frankly the evidence suggests they need to do some serious learning first. It's frightening. They are literally telling the world across social media that they know nothing about the very thing they profess to know everything about. And worse...there's an awful lot of people who gravitate to this type of person and mindset. Our whole basis for education about mapping and cartography is being challenged because the hacker mindset is now being extended beyond simply making maps to decrying any sort of formal education in geography, let alone any of the related mapping sciences. Is this how we want people to learn about mapping? I sincerely hope it's just a blip and sense will prevail because currently the world of new mapmakers is learning very little from other new mapmakers who know very little....and the more the new order creates this self-fulfilling agenda against a perceived 'old cartography' the less people will ever know or learn. Take a class. Take a MOOC. Attend workshops delivered by mapping societies and organisations. Search out any number of educational institutions who can point you in the right direction. DM or email me if you want any pointers from me.

"I think the Twitter maps are spot-on and achieve their objective better than you can imagine"
Twitter maps...nope,  I've made them myself. I've researched and published on their design and so on and I've used an awful lot of imagination but lots of flashing lights don't make a map that tells me anything. It's not the mapping that's necessarily the problem, it's the data and it's so full of holes that even if you put 35 million tweets on a map it doesn't cover up for the woeful bias, uncertainty and inability to make any sense of them. Technologically putting them on a map is a feat of engineering that is true but as a useful dataset or something you can visually make any sense of they fall short. And what's the objective anyway? I only ever see vacuous, unsubstantiated claims about what this or that map reveals. They don't. They really don't. Clarifying their real objective would be a good start. Finding a better use for animated flashing maps would also be useful. Making maps of millions of pieces of twitter drivel flashy simply gives me cognitive overload and short-term inattentional blindness and hinders change detection. Adding multiple colours for no apparent reason makes things even worse. Part of the art and science of cartography is taking lots of bits of information and wrangling them in a way that reveals something useful. Using a map as a canvas for a visual data dump doesn't work. It never has. Data art yes...a map. Not yet. And should such maps be immune from critique on the grounds that they are allegedly new and challenge cartography? I'd suggest these are the very maps that require critique to establish their real value.

"this is all a bit too handwavy for your quantified mapping practices"
Pitching some of us as quantifiers of mapping practice and others as do-ers; and marking the former out as old-fashioned and the latter as embodying the future is quite absurd. In that taxonomy I'm firmly labelled as one of those old-school quantifiers who prefers dogma. But then only yesterday I finished up a map that fits squarely within the mantra of exploratory playfulness and which challenged my own ideas. It was made way outside what you might call standard cartographic principles but it was made with a very clear understanding of the extent to which I could bend the rules. I worked with them, not against them...and I waved my hands a hell of a lot. You can't have it both ways. Yes, some of us have a professional history in the discipline that seems to mark us out as archaic buffoons from yesteryear and yes, many new map-makers don't have that history. I don't have a problem with that because the sand box is big enough for all of why the hell is there such a pressing need to want to be different and so distinct from anything that ever went before? I'll continue waving my hands like I just don't care. Every now and again it'd be lovely to read that some new mapmaker made a new map and said you know what, I saw this technique on an old map, read about it and built off it.

"I'm ready to learn"
Yes. So am I. I learn every day. I'd like to see many new map-makers do a bit of learning too. Learn that they're perhaps not as new as they think they are and that so much of what they say makes them look, well, rather immature to say the least. Perhaps learn a little humility too. I don't care who you are but if you're in your early twenties you cannot know everything. I'm double that in years and I don't know half of what I'd like to know. There's scope to learn every day and if anything, life as an academic taught me one very important's not necessarily about knowing everything; it's about knowing what questions to ask, when to ask them and of whom to ask them.

"Get ready to fall in love with mapping all over again"
Nope. I'm already in love with mapping. Have been for more than 40 years. I've seen a lot of change and new stuff always interests and fascinates me. I've no need for people new to mapping to portray this point in the history of cartography as so fundamentally new that we all need to re-boot. If it serves to justify how you like to be seen, as different, fresh and avant-garde then go ahead. I've seen more people than I care to remember herald the new cartography and ultimately it just adds to the soup. Of course there's new stuff...there always will be in a fast-paced technologically underpinned area such as cartography. I get as excited as the next person with the ability to create incredible looking work and thinking up ways of harnessing new technology for making maps. There's some incredible stuff going on but there was last year too...and the year before that...and ten years before that...and in the early 1900s etc. Carry on being in love...don't discard your previous lover because you think you've found a shiny new one.

I've said it before and I'll say it again...lines in the sand are unhelpful. Some of us are professional cartographers, some of us are amateur map-makers. We're different but we don't have to be divided by rhetoric. Some of us have a formal cartographic education, others have backgrounds in biology, computer science and journalism. For many, the emergence of tools they've gravitated towards have allowed them to make maps they want to make. Some are good. Some are crap. That's life. Some people who purport to be cartographers also make pretty shitty maps. I have made some shitty maps too.  Most of my favourite maps of all time were made by non-cartographers. And before you start mocking old-school technologies you know there would likely never be any new technologies without their existence. Plenty of people use them perfectly well so let's not start fighting because you think I use a crayon and you use a Macbook Air.

I disagree with much of what Andrew said in his blog and I hope he appreciates my right of reply and takes it in the spirit of conversation. As for his maps...shit they're good and he's a talented map-maker. We need more of his sort. Some people like my work too...and it's conceivable the same people might like both our work. Don't shout that too loud though eh?

Update: I'd recommend this excellent blog by Taylor Shelton who picked up the threads in this and Andrew's blog. He makes some very reasonable and pertinent comments about the value and purpose of critique. He also suggests it's not really about getting people to love maps but, rather, that we should get people to take maps more seriously. I agree. I also feel that this idea of rejecting established cartography in favour of playful exploration is really just an excuse for not applying thought or intent to encode, reveal and communicate meaning. James Cheshire also offered some thoughts on his blog and I particularly like his last comment that "[cartography] doesn't need to take the short term view and compromise its standards to remain relevant"

Tuesday, 17 March 2015

Drug crazed mapping

I had told myself I wasn't going to bite when @Amazing_maps screamed once more for my attention. But the more I tried to ignore, the more it reeled me in so eventually I thought it worth a few comments.

Here's the so-called amazing map:

I've no idea who made it. It doesn't really matter. What I feel matters is the impact maps like this have on those that view it. This is more about the consumption of maps but, of course, their design and construction goes a long way to underpinning the message people take away.

Quick look and take away: Holy drug barons Batman,...San Bernardino is full of crack-heads! So are a few smaller areas I don't even know....but they're really small so they can't be as important eh? Right, must be time for Alaska State Troopers, turn on the TV...

That's how a lot of people will look at this map. Message delivered. Warped view of reality perpetuated. Job done. Wait for the next Amazing Map.

Here's the longer look and take aways I formulated...

Hmm. Something's not quite right with this map. Let's talk it through. It's a choropleth. We can assume from the title...well, the line that doubles as the legend title, what the subject matter is. It's about the labs, not the population so it's about production, not consumption. And the colour scheme goes from light to dark so we see where there are more meth labs and where there are fewer. I'll not repeat myself like a cracked record about it being totals (but it is) and not normalised (but it isn't) suffice to say it needs to have the data transformed into per capita or something equally sensible to allow us to compare like for like. Though critical for a choropleth, let's ignore that for the purposes of this because there's other 'take aways' in this map.

Look at San Bernardino County again...jeesz, it's heaving with meth labs.

This makes me a little more interested (perhaps concerned) as it's where I live. Notwithstanding it's totals, look at that large, expansive area filled with loads of meth labs. How many?...there's about...errr, well, let me look at the legend. hmm. It's dark blue. Does that make it 300, 500, 1000 meth labs?

It's impossible to tell without doing some assessment of the actual RGB values. It's actually closest to the RGB value about 1/3 along the legend colour ramp which would make it about 330ish...though there are no RGB values in the legend that match those found in San Bernardino County so it's impossible to be certain and why am I having to do an RGB analysis of a legend anyway? It shouts out from the map yet is nearer the lower end of the legend. That doesn't seem right.

So San Bernardino leaps out because 1. it's the largest county in the US 2. It has a lot of meth labs (though possibly not per capita or in relation to counties with many more) and 3. It's dark blue and that means 'more' except there's virtually no differentiation between the blue used at 330 and that for 1000. All the variation in colour value is at the lower end.

The map uses an unclassified choropleth approach. That means every data value is given its own position along the chosen colour ramp. I'm not a huge fan of unclassified choropleths. Choropleths are generally used to show where places are similar and that relies on classifying your data into groups that display similar characteristics. All you can really see from an unclassed choropleth is the extremities...which areas tend to the maximum and which to the minimum. It's really difficult to assess where those in-between values might sit...and that's assuming the scale is linear and the colour scheme is applied linearly. Of course, you can stretch colour to be applied non-linearly but then it's an even more confusing picture that's arguably more difficult interpret visually. If you don't classify data before mapping it then you're painting by numbers and it's a bypass to considering your data and teasing out the message through careful classification and symbolisation.

I'm going to add a caveat here - if the map is for interactive web display and the user can hover or click an area to retrieve the value directly, then unclassed choropleths are, arguably, less problematic because people can retrieve values across the map. I'd still contend, however, that if we know the map is classified into, say 5 classes using natural breaks then every county symbolized in the same shade of blue is 'similar'. It's an important metric we can easily see in the map and it's a good default. Other classification schemes exist to suit alternative purposes. If we use, say, a quantile scheme of 5 classes then we know each class shows 20% of the data values in rank order - again, similarity between values, across the entire range values, can be easily seen and it's simple to see which areas are in the top 20% of values.  If you make two choropleths then using something like a quantile scheme allows you to compare the two maps on a comparable cognitive basis. Clicking to retrieve a value is an additional step in the map reading process. Trying to remember values from one hovered-over area to another is equally taxing because our short term recall is not our best cognitive function (think of memorizing and recalling a pack of cards in's not easy!). I like maps to 'show and tell' rather than require further processing or actions by the user to reveal the message.

Onto the colours. Because there are just so many different shades of blue across the map we get a sense of some overall pattern but we can't really tell which are similar to which. How similar is San Bernardion COunty's colour compared to the other dark blues across the other side of the map? It's called simultaneous contrast and is a problem for our map reading. Our perception of a colour (or shades of a colour) varies as we look across the map due to the colours that surround it. Look at the following two grey squares and how they are affected by the surrounding shades:

The grey square differs in perception depending on whether it's surrounded by dark or light.  A darker surround makes us see it lighter than if it has a lighter surround. Now look at how different colours modify the grey square:

The grey squares, despite being the same, take on a perceived tinge of colour based on what's around it. And when the image gets even more complex we have even more difficulty processing what we see. In the following animation, which grey square, A or B, is darker?


Of course, the greys in A and B are the same. In the above diagrams all the grey squares are seen differently simply because of their surroundings. The map of meth labs has over 3,000 counties, each shade of blue being surrounded by it's own different mix of blues.

These perceptual issues are also a problem in classed choropleths of course - but not nearly to the same degree because it's much easier to distinguish and differentiate 5 or 6 shades of blue across a map than it is to try and make sense of several hundred (thousands?) different shades of blue.

And what about labels? Yes we can probably all recognise it's the U.S. I know where my home is so I recognise San Bernardino County. I've no real way of describing where other patterns exist in language that makes sense. Giving people context is important. Interactive maps support this through basemap labels or, again, hover and click...but however you deliver the map, give people a way to reference the patterns they see.

So the take-aways for me...
  • It's totals. If you can't or won't change to a rate or ratio then use something other than choropleth like a dot density, proportional symbol, dasymetric or cartogram.
  • If you have to use unclassed choropleths then scale your data across the range of colour so that extremeties don't dictate the way values map onto the colours. Make the legend more useful by providing labels at key positions and make your map interactive so people can retrieve values.
  • Go with a classed choropleth if you want people to 'see' more than just the extremeties in your data and how different areas are similar to others for all values that display similar characteristics. Learn which classification techniques are going to manage your data most appropriately for the message you want to share.
  • Be aware of the problems of simultaneous contrast.
  • Include some form of labelling to give people a way of referencing the geographical patterns they see.

Other real title, no source, no credits, no dates, no contact details. Nothing. Like I said, I don't know where the map came from but as is, it's a fail in every respect.

Finally, I tried to get the data to recreate this as a per capita but after a quick search I wasn't able to find it at county level. Instead I came across this abomination on the Drug Enforcement Administration web site:

I don't even know where to start with this one, and they've made one per year for the last few years. They were clearly on something or other. And if we assume the DEA reporting is accurate (and the most current) AND that the Amazing_Maps one is broadly of the same time period (OK, a lot of assumptions) then what's with San Bernardino having over 300 meth labs given California as a whole has only 79?

Clearly something's wrong somewhere. Amazing map? Possibly. It's just poorly designed and constructed and gives a totally misleading impression of a dataset that cannot be verified. It's another potentially mildly interesting dataset that's poorly mapped.

And by the way, San Bernardino County is the 5th most populous county in the US so per capita...we may even have a paucity of meth labs so a different map might support the assertion we need more to get our supply increased*. Additionally, while the overall area of the county is about 20,000 sq miles, the populated areas are predominantly crammed into the south west corner in an area roughly 450 sq miles...which makes a choropleth map of totals covering mostly desert even less useful (unless the meth labs are in the desert). And all those less important smaller areas...Seattle, St Louis, Tulsa and Grand Rapids. But because of the way the boundaries lie, choropleths are always going to cause difficulties in interpretation. That's the Modifiable Areal Unit Problem...and a whole different blog entry.

* this is a joke

Tuesday, 24 February 2015

Messy heat map

My last blog post on heat maps was an attempt to persuade map-makers that the term actually means something other than what you might think it means...and that doing cluster analysis of some form or other on your data more than likely requires a better understanding of data and technique than a so-called heat map generator provides.

My Twitter feed lit up today as Manchester City played Barcelona in the Champions League Round of 16. Lionel Messi, the Barcelona forward, had a fine game by all accounts (to be fair he pretty much always does) and Squawka were on the ball with their live analysis during the game.

Squawka provide a web-based view of sport that collates and presents data as it happens. They tweeted the following:

Needless to say it brought on a nervous carto-twitch. If you read the previous blog you'll know by now that whatever the above is, it's not a 'heat map'. It's a density map of some form of cluster analysis but it illustrates far more than just another example of an inappropriately named map.

Here are some of the issues I see with this map and how similar issues are seen in almost all of these sorts of maps. They may help to understand that it isn't, in fact, a map of Messi leaking all over the pitch.

What is the data that was used?

Messi presumably ran about the pitch yet the splodges look like they are based on point data. Is this where he had the ball? Where he received the ball? Where he passed the ball to another player? Where he was stationary for a period or simply where he stood watching as Suarez scored the goals? Etc etc. While logic suggests that a map of Messi's running should be linear we are immediately confused in trying to decipher what this data actually represents because it looks like points that have been analysed to create a representation of clusters (more points = larger or more intense splodge). Without knowing what data the map represents we cannot decide whether he was all over the final third or not. If the data is indeed points then is that an appropriate metric and can it justifiably be used to show what they purport the map to be showing?

What are the fuzzy splodges?

The typical symbology on these sort of maps tends to go through some form of spectral colour scheme and this is no different. The hazy blue splodges are likely where less clustering occurred but is this a fleeting movement or pass or where he tripped over his laces? As Messi moves more or passes more (or whatever more) the intensity of the symbology increases. But what precisely does this represent? We have no legend to tell us what changes in colour mean and whether colour is mapped onto the clustering values linearly or logarithmically or...

Indeed - if you look at the overlap of two hazy blue splodges near the bottom centre of the map you'll notice that a simple overlap at the edge of two hazy blue splodges results in a bright, intense change in symbol. But if these hazy blue splodges are built from point data (presumably at the centre of the hazy blue splodge) then the overlap is simply an artifact of overlapping symbology...not necessarily overlapping data. These artifact overlaps occur everywhere on the map so it's unclear what the relationship is between data and symbology and how that then translates to Messi's actual movement or involvement.

The statement of being all over the final third also doesn't exactly stack up either. The main splodges are in a zone towards the top of the pitch graphic...a little left of centre but certainly not all in the final third. We'll assume Barcelona are attacking the left half of the pitch graphic and that even though teams switch sides at half-time the graphic maintains teams in the same half for mapping purposes.

All in all it's a graphic that reveals very little except gross error and uncertainty and which is utterly impossible to interpret in a way that reveals anything sensible about Messi's contribution to the game. 

These sort of back of an envelope 'heat maps' are unhelpful for any visual or analytic task. Quick to produce yes, but you can't make any sensible or quantifiable interpretation. Finally, we have no-one elses maps to look at so we simply have to presume that every other player's heat maps are in some way visually inferior to Messi's map.

Messy data. Messy clustering. Messy symbology. Messy map. Messy communication and very messy ability to interpret, compare or understand. Poor old Lionel Messi who is, quite literally, an innocent bystander in all of the map, sort of, shows.

Tuesday, 17 February 2015

When is a heat map not a heat map

Actually, the answer to the question in the blog post title is most of the time...

The term 'heat map' has gatecrashed the cartographic lexicon. It has seemingly replaced other, more established, more accurate and perfectly good terms. It's used as a catch all for any map that portrays a density of point-based pieces of information as a surface. Here, I try and explain why I find it unhelpful.

Heat maps have become a popularist way to label a surface representation of data that occurs at discrete points. On one hand the search for a better way of showing point based data which avoids death by push-pin is a sound cartographic approach. Imagine simply looking at a map of points and trying to make sense of the patterns. Chief Clarence 'Clancy' Wiggum would certainly struggle to make sense of the pattern of crime in Springfield just from coloured dots.

It's difficult to process patterns of dots (other than more here, less there) and even harder when you're looking at thousands of dots that overlap (death by pushpin) so...let's make a heat map!

In analytical terms there's a number of ways one might approach the problem. One way is to bin your data into regularly shaped containers like hexagons, effectively a spatial summary of the point data. Another way is to interpolate lines of equal value across the map to create a surface which then helps us to see areas that display similarly high or low values across the map. Of course it's important to remember that any interpolated surface is effectively inventing data values for the areas on the map for which you don't have data or sample points. It's therefore important to think whether you want data making up for the areas between your data points when you know damn well nothing exists.

For instance, make a map of temperature and you'll likely use sample points. It's perfectly reasonable to infer that temperature exists everywhere as a continuous surface so filling in the voids where you have no data is fine. If, on the other hand, you have accident data for road intersections and you interpolate a surface it makes much less sense. Intersections do not exist across space so filling in the voids with made up data values is not really appropriate.

Let's assume Chief Clancy is making a surface based on some sensible logic. These interpolation methods collectively result in an isarithmic map. That is, the planimetric mapping of a real or interpolated three-dimensional surface. So Chief Clancy might see the pattern of crime in Springfield a little more like this (and before someone suggests the interpolation is a little off..yes, it's just for illustrative purposes. I'm a cartonerd but seriously...):

These sorts of isarithmic maps are used everywhere for displaying temperature (isotherm) to atmospheric pressure (isobar); and from height (isohypse) to population distribution (isopleth). They're also commonly, and erroneously,  referred to as heat maps.

Technically speaking, the map of crime activity points that Chief Clancy is looking at might be termed isometric data since it shows locations of discrete events that do not necessarily exhaust space. The fact that the dots are coloured actually suggests more than one incident. Rather than a simple interpolation of discrete points, he might instead do a kernel density analysis that uses the values at each point as a weighting factor to end up with a map like this:

His analysis needs to very carefully decide the shape and size of the kernel (search area) used to compare nearest neighbours. A small kernel will create a map that looks much like the original...just discrete points on the map but displayed as splodges. Choosing a kernel that is too large will create an over-smoothed, highly generalised map.

A more advanced version of kernel density analysis might be used to calculate a K function which constructs zones around events as a way of summing and weighting values which end up on the interpolated surface. It goes beyond simply looking at nearest neighbours and can help map patterns across a wider area. They're also commonly, and erroneously,  referred to as heat maps.

Hopefully Clancy knows what he is doing and not simply using a slider in a haphazard manner to achieve a map he likes the look of. There's nothing fundamentally wrong with sliders but it's also useful to know what the slider is doing to make what you're seeing.

If Chief Clancy is feeling particularly brave he may even fire up his favourite geo-analytical powerhouse and calculate the Getis-Ord Gi* statistic for the variable in question. The resulting p-values are mapped to show where statistically significant high or low clusters occur spatially.

This is often termed a hot spot map (again, sometimes referred to as heat maps) and which typically use red to show 'hot' areas (or a lot of something) and blue to show 'cold' areas (or much less of the thing in question). It doesn't show hot places and cold places and, frankly, if you don't use the right data inputs and know what it's mapping it can distort your view of reality beyond comprehension. It's a map of statistically significant clusters of data based on a multitude of decisions taken in setting the parameters. It's a complex but perfectly good analytical technique though the fact we're now beginning to see the introduction of terms that reflect 'heat' and, often, colours that connotate temperature, it begins to form a basis for misinterpretation.

These techniques form a valuable collection of related methods that create interpolated surfaces from discrete data points. They create isarithmic maps; predominantly isopleths because they map distributions of populations of some variable or another. Referring to them as a heat map is wrong because a heat map is something else entirely. Badging them as isarithmic maps is fine but it's important to recognise that they have very different data demands, functionality, complexity and potential and knowing these differences helps you understand and interpret the maps they create.

Just for the cartonerds, let's decipher the isopleth is a form of isarithmic map and shows change in the quantifiable variable over space. They differ from choropleth maps in that the data is not grouped into a pre-defined region (e.g. countries, census areas). They also work particularly well for data that exists continuously and which doesn't necessarily change at an abrupt point or a pre-defined boundary. In this sense, any surface of population based data that can be interpolated from point data is an isopleth. That's ISOPLETH from iso- Ancient Greek ἴσος (ísos, equal) +‎ Ancient Greek πλῆθος (plêthos, a great number)....or more easily understood as a map that shows equal numbers. The equal numbers being demarcated by isolines of equal value which divide areas which display similar characteristics.

So I've hopefully established that isarithmic maps are not heat maps but if we turn our attention to the use of descriptors and colours in such maps it helps to understand the misrepresentation a little more.

In all of the example maps Chief Clancy looked at earlier the colour schemes suited the analytical technique. The kernel density maps used single hue colour schemes which went from light to dark. The Getis-Ord Gi* map used a diverging scheme to show clusters of positive p-values (red) and clusters of negative p-values (blue)...though there's no particular reason why red to blue should be used other than to reinforce the use of terms such as 'hot' and 'cold'.

In all of these maps you can clearly see the areas that represent higher data values and the areas that represent lower data values. It's easy for us because darker shades are interpreted as 'more'. We describe patterns using terms such as 'more' or 'less' rather than 'hot' or 'cold' because that helps us understand the data using language that refers to quantities., 'With diverging colour schemes we naturally interpret the data as diverging away from a central value.  Again, 'more' and 'less' is useful as is 'statistically significant cluster' if we're referring to the result of the Getis-Ord Gi* statistic. Using heat related terms adds unnecessary confusion which is further compounded by the colour schemes often used for heat mapping.

The idea of showing heat using a rainbow colour scheme is likely based on the sort of colours you get when you 'see' something through an infra-red camera like so:

This technique is specifically designed to measure heat and display hot and cold areas so the use of reds and blues makes sense as it matches the data and the phenomena being mapped. It matches our cognitive ability to process what the colour actually represents. Greens in the middle are still odd though because it doesn't really suggest a mid-heat value. Sometimes these colour schemes avoid green altogether.

While this colour scheme arguably works when we're talking of temperature, when the 'mapped' phenomena is actually hotter or colder, transposing this colour scheme for any mapped data variable causes problems in our ability to cognitively process patterns like so:

Here, Chief Clancy is looking at a beautiful rainbow. Blues for colder areas, reds for warmer and yellow and white for, well, white hot areas. The middle is glowing white hot with progressively 'cooler' colours radiating out in a beautifully smoothed gradient. Springfield is on fire and the interpolated values appear to show a lovely linear gradient reflected by the colours. Does the data Clancy is working with vary spatially according to a linear distribution away from source locations? Linear rarely happens in reality so such maps can present a distorted view of what's going on with the underlying data.

More likely, such a map presents the results of an over-generalised cluster algorithm that has been combined with the application of a linear spectral colour ramp. Given the point of this blog post, it's not a heat map's just a version of some form of density analysis that results in a surface that does away with isolines altogether and compounds the problems of interpretation by using an unhelpful colour scheme. It's averaged out the analysis and complicated the symbolisation which doesn't help the user task at hand. There's even some transparency thrown in for good measure to see a little of the basemap which also adds to the visual clutter. Of course,defaults can always be changed so let's not get too bogged down in colour choices though using spectral schemes to show quantitative data is never a good idea.

Clancy needs help but unfortunately the routine use of a red and blue diverging colour schemes and spectral colour schemes on such maps has become so ingrained in popular use that's he doesn't know it's not helping him understand the map particularly well. Why should more of something be 'hotter' and less of something be blue and 'colder'? We're talking about data here...not the weather, though given the dreadful use of colour schemes on weather maps it's unsurprising we've seen them re-purposed for the sort of maps people call heat maps.

Of course, if you move your heat map slider in a different way to define a different implementation of the cluster mapping method and choose different colours you may end up with this:

Or even this...

Different sized splodges with different colours that tickle your fancy. And all from the same data. What a cornucopia of colour that means, well... very little. And once you've made your data look like something and added some random can even animate it. Doh!

The key to understanding the utility of this type of map is having a good understanding of your data in the first place and choosing an appropriate technique and colour scheme to go with your analysis. While making these type of maps simply is helpful to many, it's also really important to support them in making better maps rather than in allowing them to fall into the trap of making basic mapping errors.

So, back to the main point... if none of the above are heat maps...and the misnomer has been reinforced through the use of poorly defined colour ramps...what exactly is a heat map? Does such a thing actually exist?

Yes...but strictly speaking it's not really a map; it's a visual representation of a data matrix.

A heat map has been around in statistical analysis for a good while. It's defines a graphical approach to code a matrix of data values into a graphical representation using colours. It's designed to reveal the hierarchy of row and column structure. Rows in the matrix are ordered so that similar rows are near each other and you see cluster trees on the axes. A heat map looks like this:

The closest spatial representation of data that might reasonably have similarities to a real heat map is a tree map...and that's a cartogram which has destroyed geography for the sake of creating an ordered matrix. At a stretch I understand why raster surfaces used to represent geography might be mistaken as heat maps because they are formed from a rectangular grid of pixels. But as I've set out here, I'd contest that the use of the mapping term isopleth already differentiates it from other map types and that we don't need to borrow a term from graph theory to simply replace one that already suits the technique. Proper heat maps go further than the mis-named cartographic versions anyway because the matrix is designed to illustrate correlations between variables through linked lines and other axis annotation.

Want to read more about real heat maps? Check out this rather good paper by Leland Wilkinson and Michael Friendly who explored the history of the term and of heat maps in 2008 (and from where the above illustration was sourced). It's prior art for heat maps in their true sense.

Tempted to make a heat map? I'd suggest doing a density analysis on your point data, experiment with different bandwidths (and understand what it is you're trying to map) and symbolize the resulting surface as an isopleth map using anything other than a rainbow or spectral colour scheme. If you're using a slider and default colours...think about what the slider is doing. Use it sensibly and try and understand how it's interpolating your data and, if you can...change the bloody colours from a rainbow palette! If you can't, then maybe try a different map type altogether.

More generally, I find that the introduction of replacement terminology where perfectly good terms already exists creates a further division between what we might call professional cartography and the wider world of map-making. I'd prefer to see those worlds come together under the same umbrella and for a better understanding to emerge through the use of a standard and accurate nomenclature. We don't need to dumb down mapping or how we talk about it to encourage better mapping.

PS - with resspect and apologies to The Simpsons whose image I have used and abused in this post.