Whether in business, industry, academia or simply a consumer making better decisions through science, there is a good chance you've heard these two words used together: Big Data. Tonight, I write on this topic because there are literally gazillions of opinions out there about what Big Data is, why it's important, and apparently crucially why the name either is or isn't appropriate. Most of these opinions use way too many quotation marks around the term (e.g., "Big Data"). I hereby remove them and toss my hat into the ring to offer a, hopefully, different opinion on the topic.
If he were alive today, I'm sure Shakespeare would agree that data's neither big nor small, but usage makes it so. (If you've ever encountered Shakespeare in a data science blog, please, please point me to it - I think I may be onto something here!)
Taken as single data points, your Fitbit numbers, MyFitnessPal exercise and diet information and all those blood laboratory test results you get from your yearly physicals don't amount to too much. But it's the combined use of all this quantified self information - giving you a comprehensive picture of yourself and perhaps even more interestingly (importantly?) in aggregate with the same bits of data from a whole lot of other people out there with those same devices and trackers - that turns your data into Big Data. Putting all those puzzle pieces together is what allows us to learn from the big (and sometimes small) trends. It is today's emerging technologies which allow us to link the data informing those trends back to the individual that turns those pieces into Data Big. Stand up and be proud - all those gadgets you're so fond of may actually help contribute something to society after all.
So, it's really the connecting that makes data big. Years ago, you placed a surrogate id on a patient's pharmaceutical trial data, made it anonymous and aggregated it up with everyone else's data to calculate averages, medians and standard deviations of the distributions of scores. Additionally, a good deal of research relied on self-report data - still does in fact (I refer you here to Dr. House: "It's a basic truth of the human condition that everybody lies. The only variable is about what"). After assessing population trends, if a patient is behaving wildly different from average, the nice doctor can call him or her into the office to discuss options. Sometimes these research studies took years to complete. (Full disclosure here, I used to work in the pharmaceutical industry and while I've been out of the biz for nigh on a decade, I bet a good number of trials are still conducted similarly. Big Pharma moves slowly.)
Today, instead of relying on self reported "how were you feeling at 3am last night", we have technology to reliably and continuously monitor our vital signs, sleep patterns, blood glucose, and a whole host of other medical data points. Seamlessly synced with our smartphones, laptops and ultimately with computer databases anywhere around the world, we have the technology to be able to run those same analyses on a time scale orders of magnitude faster than previously and to send the results back to the individual patients themselves - to alert them if they are experiencing dangerous health conditions.
Now, there are plenty of people questioning the ethics and humanity of the kind of always-on, continuously-plugged-in society (and world) that this implies. But the fact remains that the technology exists to do it. And it's not simply the size that turns these Data Big. It's the connectivity, integration and speed at which they fly back and forth around the world that does it. The most important question now becomes how do we use these data to make the world a better place? Ahem, with great power comes great responsibility.
What if you could predict, an hour beforehand, when a loved one would have a stroke or a heart-attack? What if you could warn a community exactly when that earthquake would hit and the resulting tsunami would hit a town halfway around the world? What if your car - and everyone else's - were able to prevent you from getting into that fender bender?
On a smaller scale, what if you never had to go to the grocery store again? What if you never had to fill your car with gas again? What if, for the same price or less, these two mundane everyday activities that consume hours of your life each week suddenly vanished and your fridge was always stocked and you knew you had only to turn on your vehicle to see the fuel gauge register full?
We're not that far from living in that world. But it's how we use our technology and our data that will determine how soon we get there. I want to live in a Big World. Do you?
If he were alive today, I'm sure Shakespeare would agree that data's neither big nor small, but usage makes it so. (If you've ever encountered Shakespeare in a data science blog, please, please point me to it - I think I may be onto something here!)
Taken as single data points, your Fitbit numbers, MyFitnessPal exercise and diet information and all those blood laboratory test results you get from your yearly physicals don't amount to too much. But it's the combined use of all this quantified self information - giving you a comprehensive picture of yourself and perhaps even more interestingly (importantly?) in aggregate with the same bits of data from a whole lot of other people out there with those same devices and trackers - that turns your data into Big Data. Putting all those puzzle pieces together is what allows us to learn from the big (and sometimes small) trends. It is today's emerging technologies which allow us to link the data informing those trends back to the individual that turns those pieces into Data Big. Stand up and be proud - all those gadgets you're so fond of may actually help contribute something to society after all.
So, it's really the connecting that makes data big. Years ago, you placed a surrogate id on a patient's pharmaceutical trial data, made it anonymous and aggregated it up with everyone else's data to calculate averages, medians and standard deviations of the distributions of scores. Additionally, a good deal of research relied on self-report data - still does in fact (I refer you here to Dr. House: "It's a basic truth of the human condition that everybody lies. The only variable is about what"). After assessing population trends, if a patient is behaving wildly different from average, the nice doctor can call him or her into the office to discuss options. Sometimes these research studies took years to complete. (Full disclosure here, I used to work in the pharmaceutical industry and while I've been out of the biz for nigh on a decade, I bet a good number of trials are still conducted similarly. Big Pharma moves slowly.)
Today, instead of relying on self reported "how were you feeling at 3am last night", we have technology to reliably and continuously monitor our vital signs, sleep patterns, blood glucose, and a whole host of other medical data points. Seamlessly synced with our smartphones, laptops and ultimately with computer databases anywhere around the world, we have the technology to be able to run those same analyses on a time scale orders of magnitude faster than previously and to send the results back to the individual patients themselves - to alert them if they are experiencing dangerous health conditions.
Now, there are plenty of people questioning the ethics and humanity of the kind of always-on, continuously-plugged-in society (and world) that this implies. But the fact remains that the technology exists to do it. And it's not simply the size that turns these Data Big. It's the connectivity, integration and speed at which they fly back and forth around the world that does it. The most important question now becomes how do we use these data to make the world a better place? Ahem, with great power comes great responsibility.
What if you could predict, an hour beforehand, when a loved one would have a stroke or a heart-attack? What if you could warn a community exactly when that earthquake would hit and the resulting tsunami would hit a town halfway around the world? What if your car - and everyone else's - were able to prevent you from getting into that fender bender?
On a smaller scale, what if you never had to go to the grocery store again? What if you never had to fill your car with gas again? What if, for the same price or less, these two mundane everyday activities that consume hours of your life each week suddenly vanished and your fridge was always stocked and you knew you had only to turn on your vehicle to see the fuel gauge register full?
We're not that far from living in that world. But it's how we use our technology and our data that will determine how soon we get there. I want to live in a Big World. Do you?