Editorial: Big Data, Cities and Herodotus

About this issue

Issue number

Volume 42 – Number 3

200 pages

Summary

Big data is everywhere, largely generated by automated systems operating in real time that potentially tell us how cities are performing and changing. A product of the smart city, it is providing us with novel data sets that suggest ways in which we might plan better, and design more sustainable environments.

Share

There is a wonderful quote from The Histories by Herodotus where he says: ‘I will tell the story as I go along of small cities no less than of great. Most of those that were great once are small today; and those that in my own lifetime have grown to greatness, were small enough in the old days’.¹ You could say the same about big data. We have always had big data – data that are so voluminous that they tax our ability to represent them digitally or non-digitally, but as our technologies have improved, the data we had in the past now seem small by comparison with what we can handle today. Digital computers were invented during the war years, simultaneously in the USA, UK and Germany, and by 1945 a handful of big but cumbersome digital devices existed in scientific settings. These were then used for intensive numerical processing that we associate with weaponry and defence. In parallel, the process of their miniaturization began with the invention of the transistor at Bell Labs. By the mid-1960s, what came to be called Moore’s law led to super-exponential increases in speed and memory with equivalent ever decreasing costs of production until the present day, where we hold in our hands computable devices with the power of yesterday’s super-computers, a trend that appears to be continuing inexorably. Herodotus would have approved.

Data are following the same course with devices being planted in the environment controlling and generating massive amounts of data in real time. Until a decade ago, most data about cities were taken from one-off surveys conducted at cross-sections in time, often focused on samples from the population. When it came to physical issues, maps and photographs then dominated our visual media. All this has been changed by the move to digital information, and by the 1980s most traditional media were being replaced by computer-drawn maps, remotely-sensed images from satellites, and such like. One-off surveys, too, were being digitized and in the last 20 years, these have been augmented by digital entry and display, with geographic information systems and computer-aided design key examples of advances in such software. As miniaturization has continued, PCs have evolved into hand-held devices – first mobile phones and then smart phones, tablets and now small-scale sensors which are being embedded passively into the environment. All these new processing methods generate data that are available in real time and often at the individual level. We now have passive sensors, which are physically embedded in people and places, and active mobile sensors, which are operated by ourselves, typically through smart phones. This is giving rise to continuing streams of data about the environment, which will be generated as long as the relevant sensors are in operation. This is ‘big data’ in that it is voluminous, often with no known limit in time. It is generated continuously in real time at considerable velocity, and it is of great variety in that it comes from many diverse sources.

There is a good deal of hyperbole surrounding the production of big data with its enthusiasts proclaiming the emergence of such data as providing entirely new insights into the way our cities work. The implication is that a new understanding will come from its analysis. Combined with the notion that sensors in our cities are giving us new ways of automating urban functions, the notion of the ‘smart’ city is now writ-large in our thinking about the way computers and computation might enable us to produce more liveable and sustainable environments. There is a new optimism in the air about what big data and smart cities might do for urban planning and design which is part and parcel of society’s new found interest in cities and city living, particularly in large cities. One has to take much of this commentary with a ‘pinch of salt’ for although big data is providing us with a new focus on how cities function in time, particularly over very short time periods measured in seconds, minutes, hours and days, big data brings with it as many problems as it might solve. Big data is invariably unstructured. It is not collected with urban analysis in mind, unlike many one-off traditional surveys, and it requires considerable ingenuity in computer processing and data mining. It is certainly changing our focus on cities from the long and medium term to the very short term, and in this it is giving us opportunities to enrich our perspective on what makes a better city. But it is also confusing and problematic in that we have to work very hard to introduce structure into it and make it workable for our traditional purposes of understanding. It is exceptionally hard to link different data sets generated for different purposes to one another, for invariably there are no common keys to doing this, and far from dispensing with the need for theory, we need ever better theories for making sense of all this variety.

This special issue collects together a set of papers about big data and the city, providing us with a kaleidoscope of possible applications, which show the promise and pitfalls of big data. The first paper by myself extends this editorial and provides a wider context for big data. We define its various types from traditional sources to real-time streamed data from passive sensors such as that generated from smart cards, to social media which are generated by our active use of smart phones amongst other computable devices such as PCs and tablets. It sets the context for other papers in this issue that take many of these aspects of big data forward as part of the smart cities movement. Almost every facet of the city is touched by the digital revolution and, in terms of scope, there are many different issues about cities that are covered here, from simulation models to new data sources to new theories and methods in planning the city, to new ways in which we as citizens are both using and creating digital information and media. We will begin with the explosion of traditional data into big data, focusing on flows and then examining what these say about locations. We will explore how social media and mobile communications technologies can generate new insights about how we move and function in cities and then we will explore different methods for visualizing this type of data. Visualization is a key focus in making sense of big data and there are several tools that the authors of these papers illustrate here. We will also introduce issues of privacy, confidentiality, and veracity of data while some of these papers try to answer the question of how good such data are, and for what purposes they are being utilized.

We begin by examining data about how people move in cities which explode as we attempt to understand them. Ways of visualizing both traditional flow data from one-off surveys to new data generated from social media and communications require new forms of visualization. In the first paper, some of these are noted but in the second, by Claudel, Nagel, and Ratti, powerful new modes of visualization are introduced, in particular the Datacollider which is a ‘public, powerful, intuitive, and scalable’ device for exploring how flows can be related to one another. Lenormand and Ramasco then show how this kind of new flow data can be extracted and visualized and related to functions in large cities, introducing us to the new world of mobility studies which involves networks and big data. Social media in the form of short text messages make their appearance in this special issue very early and Lenormand and Ramasco illustrate some of the problems of extracting movement patterns from data such as this. The world of social media is plagued by difficulties when it comes to using such media as data to understand cities. The data are often impossible to geo-locate and interactions between users are usually implicit and have to be generated by implication from the raw data.

New data sources, based on using smart cards to pay for transit, are extremely enticing for they provide the possibility of extracting real-time movements which will supplement and complement traditional surveys of traffic. Reades, Zhong, Manley, Milton, and Batty explore how the Oyster Card data that are generated from automatic payment systems on public transport in London can be mined to explore how different stations produce different flows at different times. It is possible to generate inferences about land use at different locations from this kind of data but they show how this kind of analysis can become extremely problematic. This style of data exploration is in its infancy and the notion of inferring pattern in such data requires much more powerful theory about transport in cities than we have at the present time. This paper also notes substantial work in this area on extracting flows and it suggests how locational information can be linked to such movements. Alexiou, Singleton, and Longley also explore location in linking social data to the morphology of small areas in cities, and in this they develop classic methods of data mining, in particular self-organizing maps which enable them to generate new forms of social area analysis in cities. The link from location to morphology is also followed up by Crooks, Croitoru, Jenkins, Mahabir, Agouris and Stefanidis who focus on what has come to be called crowdsourcing or user-generated data which are produced from the population at large. Combined with more traditional data sets, such information enables us to produce very rich patterns of location in cities that help us understand their form and function.

In integrating these diverse sources, Thakuriah, Sila-Nowicka, and Paule introduce an integrated multimedia city data platform which lets them put many kinds of data together to begin the quest of integration, which is one of the long-standing goals of the smart cities movement. Much of these data are crowdsourced and in the next paper, Quercia shows how we can produce mental maps at scale using these new forms of generating data, arguing that it is digital media that enables us to move well beyond the construction of traditional mental maps. We then change tack a little and introduce how big data can enable us to produce more healthy urban environments. Miller and Tolle show how new sources of urban data allow for a ‘deeper understanding of the intricate relationships between individuals, environments and healthy places’, raising issues of privacy and confidentiality that also pervade many of the other contributions to this issue.

We then consider the quality of big data, posing the question ‘how good is it?’ McArdle and Kitchin address this problem directly and show how new and big data sets can be cleaned up using combinations of methods that involve everything from crowdsourcing to new developments in the statistics of data. Carrera then returns to the theme of big data from historic sources making the point that the ‘wise city’, in the form of the ‘smart citizen’, already contains a lot of ‘old’ big data which exists in ‘slow’ real time. Using the example of one of the world’s most historic and iconic cities – Venice – he illustrates how one can make remarkable progress in examining its form and function using the enormous archives of past data that are locked away in such cities. We finish with illustrations of the public face of big data in cities – through the concept of the dashboard. Gray, O’Brien, and Hugel present their work in London, which shows how real-time big data can be organized and classified, thus presenting an appropriate interface to how we can evaluate the quality of life in our cities on a continuing basis.

All of these papers represent the state of an art, which is changing dramatically at the present time. Thus our snapshot of what is happening is highly contingent on the particular times and places to which these articles relate. In the continuing and rapid automation of our cities, a fascinating story can be woven around these speculations into what our cities will be like in both the near and far futures.

Note

1. The quote was first drawn to my attention in Jane Jacobs (1969) which is part of the frontispiece to her book The Economy of Cities (Random House, New York).

Alexandrine Press