What is Big Data?What is Big Data? Massive sets of unstructured/semi-structured data from Web traffic, social media, sensors, etc Petabytes, exabytes of data Volumes too great for typical DBMS Information from multiple internal and external sources: Transactions Social media Enterprise content Sensors Mobile devices In the last minute there were •204 million emails sent •61,000 hours of music listened to on Pandora •20 million photo views •100,000 tweets •6 million views and 277,000 Facebook Logins •2+ million Google searches •3 million uploads on Flickr
What is Big Data?•Companies leverage data to adapt products and services to: •Meet customer needs •Optimize operations •Optimize infrastructure •Find new sources of revenue •Can reveal more patterns and anomalies •IBM estimates that by 2015 4.4 million jobs will be created globally to support big data •1.9 million of these jobs will be in the United States What is Big Data?
Where does Big Data come from?Where does Big Data come from? Enterprise “Dark Data” Partner, Employee Customer, Supplier Public Commercial Social Media Transactions Monitoring Sensor Economic Population Sentiment Email Contracts Network Industry Credit Weather
Slide1220• When collecting or gathering data we collect data from individuals cases on particular variables. • A variable is a unit of data collection whose value can vary. • Variables can be defined into types according to the level of mathematical scaling that can be carried out on the data. • There are four types of data or levels of measurement: TYPES OF DATA
Slide1223• Nominal or categorical data is data that comprises of categories that cannot be rank ordered – each category is just different. • The categories available cannot be placed in any order and no judgement can be made about the relative size or distance from one category to another. Categories bear no quantitative relationship to one another Examples: - customer’s location (America, Europe, Asia) - employee classification (manager, supervisor, associate) • What does this mean? No mathematical operations can be performed on the data relative to each other. •Therefore, nominal data reflect qualitative differences rather than quantitative ones. Categorical (Nominal) data
Slide1225•Systems for measuring nominal data must ensure that each category is mutually exclusive and the system of measurement needs to be exhaustive. •Exhaustive: the system of categories system should have enough categories for all the observations • Variables that have only two responses i.e. Yes or No, are known as dichotomies. Nominal data Examples:
Slide1226• Ordinal data is data that comprises of categories that can be rank ordered. • Similarly with nominal data the distance between each category cannot be calculated but the categories can be ranked above or below each other. No fixed units of measurement Examples: - college football rankings - survey responses (poor, average, good, very good, excellent) • What does this mean? Can make statistical judgements and perform limited maths. Ordinal data Examples:
Slide1228• Both interval and ratio data are examples of scale data. • Scale data: • data is in numeric format ($50, $100, $150) •data that can be measured on a continuous scale • the distance between each can be observed and as a result measured • the data can be placed in rank order. Interval and ratio data
Slide1229Ordinal data but with constant differences between observations Ratios are not meaningful Examples: Time – moves along a continuous measure or seconds, minutes and so on and is without a zero point of time. Temperature – moves along a continuous measure of degrees and is without a true zero. SAT scores Interval data
Slide1230Ratio data measured on a continuous scale and does have a natural zero point. Ratios are meaningful Examples: monthly sales delivery times Weight Height Age Ratio data
Data for Business AnalyticsData for Business Analytics (continued) Classifying Data Elements in a Purchasing Database If there was field (column) for Supplier Rating (Excellent, Good, Acceptable, Bad), that data would be classified as Ordinal
Big Data CharacteristicsBig Data Characteristics Quickening speed of data e.g. smart meters, process monitoring Growing quantity of data e.g. social media, behavioral, video Increase in types of data e.g. app data, unstructured data VELOCITY VARIETY VOLUME Gartner, Feb 2001
VolumeVolume Volume Petabytes, exabytes of data Volumes too great for typical DBMS
Volume - Bytes DefinedVolume - Bytes Defined 5-19 eBay data warehouse (2010) = 10 PB eBay will increase this 2.5 times by 2011 Teradata > 10 PB Megabyte: 220 bytes or, loosely, one million bytes Gigabyte: 230 bytes or, loosely one billion bytes
VelocityVelocity Velocity Massive amount of streaming data
VarietyVariety Variety Massive sets of unstructured/semi -structured data from Web traffic, social media, sensors, and so on
Big Data OpportunitiesBig Data Opportunities Discovering hidden insights e.g. anomalies forensics, patterns, trends Making better informed decisions e.g. strategies, recommendations Automating business processes e.g. complex events, translation
Which is the biggest opportunity for Big Data in your organization?Source: Getting Value from Big Data, Gartner Webinar, May 2012 Which is the biggest opportunity for Big Data in your organization? Through 2017: 85% of Fortune 500 organizations will be unable to exploit big data for competitive advantage. Business analytics needs will drive 70% of investments in the expansion and modernization of information infrastructure.
Identifying Insurance FraudIdentifying Insurance Fraud Opportunity Save and make money by reducing fraudulent auto insurance claims Data & Analytics Predictive analytics against years of historical claims and coverage data Text mining adjuster reports for hidden clues, e.g. missing facts, inconsistencies, changed stories Results Improved success rate in pursuing fraudulent claims from 50% to 88%; reduced fraudulent claim investigation time by 95% Marketing to individuals with low propensity for fraud
Quality ImprovementQuality Improvement Opportunity Move from manual to automated inspection of burger bun production to ensure and improve quality Data & Analytics Photo-analyze over 1000 buns-per-minute for color, shape and seed distribution Continually adjust ovens and process automatically Result Eliminate 1000s of pounds of wasted product per year; speed production; save energy; Reduce manual labor costs Is the company using all of its “senses” to observe, measure and optimize business processes?
Improving Corporate ImageImproving Corporate Image Opportunity Improve reputation, brand and buzz by tapping social media Data & Analytics Continually scanning twitterverse for mentions of their business Integrating tweeters with their robust customer management system Results Saw tweet from a top customer lamenting late flight no time to dine at Morton’s Tuxedo-clad waiter waiting for him when he landed with a bag containing his favorite steak, prepared the way he normally likes it with all the fixin’s How can the company listen, analyze and respond in real-time?
Slide1217Read the case study “Big Data, Big Rewards” Big Data, Big Rewards