I like big data and I cannot lie. You other brothers can’t deny. When a gal walks in and presents a legitimate case by shoving fact based statistics in my face, I get…
Wait. What am I talking about? Every time I turn around, a business or someone is talking about Big Data, and how it’s going to change the world. But then I read another article which tells me it won’t. Who do I believe, and what is Big Data anyway?
It turns out the answer is both simple and complex. The simple answer is that “big data’ is simply a large set of information too large for a single computer to handle. Examples might include the library of Chuck Norris jokes, things we thought caused cancer that are now good for you, but might cause cancer again someday, and Kanye West’s list of things he likes about himself.
People take “big data” and analyze it to come to certain conclusions. This data is characterized by four “V’s”. The first is volume, completely unrelated to the level your college age neighbors play their music on Friday nights. It refers to how much data the world has on various different subjects.
The speed that data is streamed via the internet is called velocity. While network connections get faster every year, they have yet to top my ex-wife, who could recall conversations from nine years before within half a second of the same subject being mentioned during an argument. She is still being studied.
Variety refers to the various subjects we have gathered big data about. As an autodidact (look it up) and a self-proclaimed nerd, I like this a lot, and I’ll talk about it more in a moment. Suffice it to say there are no statistics on the number of autodidacts in the country.
The final “V” referring to Big Data is Veracity. This refers to how much the data can be trusted. Much like the citizens with the Clintons or Fox news, one in three business leaders don’t trust the data they are given, and 27% of the respondents in one survey were uncertain how much of the data they received was inaccurate.
So what are we using this amazing capability for? Well, a recent study shared with us the most used Emoji in each state.
We’ve got data on everything from this list of 261,930 past Jeopardy questions, useful if you want to be the next Ken Jennings or try to beat Watson.
You may want to use the million song dataset which includes a metric for danceability to try to predict what songs a user will like and listen to. You can even enter a contest on Kaggle. Or to analyse the relationship of other songs to Jungle Boogie.
Photo credit: Ethanhein
As an author, my favorite set is Wordnet, not just your average dictionary. Oh, I am sure there are other useful data sets out there. I am just as sure they all rate high on the Veracity scale.,
Every now and then I just scan Google or wherever I happen to be, and see what the newest set of big data out there is. You never know when you might need a social graph of the Marvel Universe. Because I like big data.
I cannot lie.