To find an answer to this question, I turned to the latest book on the subject, appropriately entitled Big Data, with one of those absolutely headline-grabbing subtitles that is designed to boggle the mind (and presumably make the casual observer pick up the book and, hopefully, buy it): A Revolution That Will Transform How We Live, Work, and Think. OK, I thought, so what kind of a transformation are we talking about here?
First let me say that the authors come well credentialed. Victor Mayer-Schönberger teaches at the Oxford Internet
Institute at Oxford University and, we are told, is the author of eight books and countless articles. He is a "widely recognized authority “on big data. His co-author, Kenneth Cukier, hails from the upper echelons of journalism: he's the data editor for The Economist and has written for other prominent publications as well, including Foreign
Affairs.
This was a good place to start, I thought, to learn about the story of big data and the kind of changes—oops, I
mean transformations—that it was inevitably going to produce in our world. The major transformation the authors predict is that soon computer systems will be replacing or at the very least augmenting human judgment in countless areas of our lives. The chief reason for this is the enormous amount of data that has recently become available. Digital technology now gives us access to, both easily and cheaply, large amounts of information, frequently collecting it passively, invisibly, and automatically.
The result is a major change in the general mindset. People are looking at data to find patterns and correlations
rather than setting up hypotheses to prove causality: "The ideal of identifying causal mechanisms is a self-congratulatory illusion; big data overturns this. Yet again we are at a historical impasse where 'god is dead.' That is to say, the certainties that we believed in are once again changing. But this time they are being replaced, ironically, by better evidence."
So there you have it. God is dead, yet again. Only this time the god is the god of the scientific method, of causality.
Out with the "why," in with the "what." If Google can identify an outbreak of the H1N1 flu and specify particular areas of significantly large instances of infection, is there any reason that we should worry about why this is occurring in such places, when we already know the what: there's an outbreak of flue and it is especially heavy in these locations, the authors ask.
We have, my friends, slid into the gentle valley of the "Good Enough." Correlation is good enough for now. It's
fast, it's cheap, it's here, let's use it. We'll get around to the why later, maybe, if it's not too complicated and expensive to find out. And here are some of the examples the authors use for proof of the good enough of correlations: “After all, Amazon can recommend the ideal book. Google can rank the most
relevant website, Facebook knows our likes, and LinkedIn divines whom we know.”
Such exaggerated attribution of insight and intuition to computer algorithms is
so common these days that it’s seldom even called out.
That's the transformation, according to the authors, that we have to look forward to. And behind their predictions lies a sense that the movement toward reliance on the results of big data to understand our world is not just inevitable but that the data itself, the vast invisible presence in our modern lives, also contains within itself a power and energy of incalculable value and ever-improving predictive powers. They call it “big-data consciousness”: Seeing the world as information, as oceans of data that can be explored at ever greater breadth and depth, offers us a perspective on reality that we did not have before. It is a mental outlook that may penetrate all areas of life. Today we are a numerate society because we presume hat the world is understandable with numbers and math. . . . Tomorrow, subsequent generations may have a “big-data consciousness”—the presumption that there is a quantitative component to all that we do, and that data is indispensible for society to learn from.”
And the heroes of this transformation? They are the people who can wield this data well—who can write the algorithms that will move us beyond our superstitions and preconceptions to new insights into the world in which we live. These are the new Galileos of our day because they will be confronting existing institutions and ways of
thinking. In a clever turn of what I like to call "The Grandiose Analogy," the authors compare the use of statistics by Billy Beane of Moneyball fame to Galileo's pioneering observations using a telescope to support Copernicus’s theory that the Earth was not the center of the universe: "Beane was challenging the dogma of the dugout, just as Galileo's heliocentric views had affronted the authority of the Catholic Church." It's another attempt to elevate by association the comparatively banal practices of putting a winning baseball team together on a shoestring to the level of the world-shattering scientific observation that the earth and by extension mankind is not at the center of God's universe after all.
If you can ignore the hyperboles in this book, however--and given the number of them this is no small challenge—you can come to see the reality of what big data actually is and what kinds of contributions its use might make to our lives. The scientific method isn't going away. The march of science to discover and explain its best hypotheses at any given time will continue. In fact the patterns and correlations unearthed by big-data methods may form the basis for new hypotheses and bring us even closer to understanding the "why" of many things to come.
Nonetheless, within some contexts, big data can produce actionable information. In marketing, Amazon, for example, can use knowing that people who read Civil War histories may also like a particular subset of mystery writers to boost sales through their customer recommendation algorithms. Google's ability to detect flu outbreaks
also produces actionable information. The NIH and other medical institutions can take actions based on such findings to make vaccines plentiful in certain areas, produce more vaccines if feasible, prepare hospitals and medical offices for the spike in needs, and publish other public health guidelines.
Still there some real problems with heralding the quantification of everything into digitally manipulatable form as
the answer to myriad issues. The supposition fails to take into account any fundamental issues except those obvious ones involving privacy and surveillance. First of all there are the insurmountable problems that complex algorithms
create. That very complexity produces higher and higher risks for errors in the writing and executing of the code. That same complexity makes it very difficult to judge whether the results reflect reality. The very fact that such algorithms may challenge our intuition makes it difficult to validate their results without having an understanding of the "why," or even a sense of the assumptions and content of the algorithms themselves.
Statistics can be powerful tools but there was also a wonderful book called How To Lie with Statistics that came out nearly sixty years ago and is no doubt still relevant today. The authors of Big Data claim that knowledge and experience may not be so important in the big data world: "When you are stuffed silly with data, you can tap that instead, and to greater effect. Thus those who can analyze big data may see past superstitions and conventional thinking not because they're smart, but because they have the data." The authors also suggest that a special team of “algorithmists” could oversee all the algorithms to ensure that they do not invade the privacy of individuals or cross other boundaries. I’m afraid Mayer-Schönberger and Cukier really ought to talk to the SEC about Wall Street
and its algorithms to see how well that’s been working out!
Finally, the proponents of big data want to discount intuition, common sense, experience, knowledge, insight, and even serendipity and ingenuity, never mind wisdom. In their quest to elevate the digitalization of everything, they neglect those very qualities, qualities which cannot be digitized. As Einstein once famously reminded us: "Not
everything that can be counted counts, and not everything that counts can be counted."