MC Press Online
Welcome to the MC Press Online!
Need help withour eBooks?
Click here, to go to our main store

  MC Press Online eBookStore  

Big Data Governance
preview of book Big Data Governance
text of book Big Data Governance

Big Data Governance

Publisher: MC Press Online
Publication Date: 2012
Subject: Computer: Information Technology
Number of Pages: 369

Free Preview    Email to Friend   Add to wish list
 Available as: (for format`s description click on its name)
Individual E-Version (PDF) Individual E-Version (PDF) ISBN: 9781583473771  
 Reg.: $
59.95 per N pages
 You Save: 
$24.28 (40%)
 Online  Open CopyPrint    
all time
Printed Edition   see MC Press Online    
About this title
With the advent of new technologies, enterprises are starting to discover the opportunities that exist with big data. Big data includes content from sources such as social media, telephone GPS signals, utility smart meters, RFID tags, weather monitors, and other sources. Such data tends to be operational in nature and is characterized by the three V's: large volume, high velocity, and a variety of formats, including structured, unstructured, and semi-structured.

Although a growing number of books address the topic of big data, none deals with the challenge of governing big data. Yet big data governance is a crucial enabler to derive maximum value from a big data program. In Big Data Governance, Sunil Soares addresses this knowledge gap, examining the industry imperatives that are driving the convergence of these two major trends in Information Management and explaining not only the why but the how of governing big data.

In a non-technical style geared toward business audiences, Sunil explains the importance of establishing appropriate governance over big data initiatives and discusses how to manage and govern big datahighlighting the relevant processes, procedures, and policies. Information-packed sections provide (I) An overview of the big data governance framework, maturity assessments, business cases, and roadmaps; (II) A discussion of the applicability of information governance disciplines to big data; (III) Best practices for governance of the various big data types; (IV) Best practices and case studies for big data governance within healthcare, utilities, and communications service providers; and (V) A reference architecture for big data and a description of the offerings from IBM, Oracle, SAP, Microsoft, Informatica, SAS, and others.

Businesses have learned the importance of governing other types of data, such as master data and reference data. Now, they are waking up to the importance of governing big data. Big Data Governance gives you the tools you need to be successful in meeting this emerging imperative.

Upon completion of Big Data Governance, you will be able to:
  • Understand how big data can fit within an information governance program
  • Identify key stakeholders for big data
  • Quantify the business value of big data
  • Apply information governance concepts such as stewardship, data quality, metadata, and organization structures to big data
  • Appreciate the business benefits of big data by industry and job function
  • Understand the emerging regulations relating to privacy and data retention
  • Establish a step-by-step process to implement big data governance
About author
Sunil Soares
Sunil Soares is the founder and managing partner of Information Asset, LLC, a consulting firm that specializes in data governance. Prior to this role, Sunil was director of information governance at IBM, where he worked with clients across six continents and multiple industries. Before joining IBM, Sunil consulted with major financial institutions at the Financial Services Strategy Consulting Practice of Booz Allen & Hamilton in New York. Sunil lives in New Jersey and holds an MBA in Finance and Marketing from the University of Chicago Booth School of Business.

The Chief Data Officer Handbook for Data Gocernance is Sunil's fifth book about data governance. His first book, The IBM Data Governance Unified Process, details the almost 100 steps to implement a data governance program. This book has been used by several organizations as the blueprint for their data governance programs and has been translated into Chinese. Sunil's second book, Selling Information Governance to the Business, reviews the best practices to approach information governance by industry and function. Sunil's third book, IBM InfoSphere: A Platform for Big Data Governance and Process Data Governance, focuses on IBM's InfoSphere product. Sunil's fourth book, Big Data Governance, addresses the specific issues associated with the governance of big data.


Foreword by Inderpal Bhandari
Foreword by Aaron Zornes

Chapter 1: An Introduction to Big Data Governance

Chapter 2: The Big Data Governance Framework
2.1 Big Data Types
2.2 Information Governance Disciplines
2.3 Industry and Functional Scenarios for Big Data Governance

Chapter 3: The Maturity Assessment
3.1 The IBM Information Governance Council Maturity Model
3.2 Sample Questions to Assess Maturity

Chapter 4: The Business Case
4.1 Improve On-Time Performance and Passenger Safety Through Big Data Governance
4.2 Quantify the Financial Impact of Big Data Governance on Customer Privacy
4.3 Reduce IT Costs by Governing the Lifecycle of Big Data
4.4 Estimate the Impact of Data Quality and Master Data on Big Data Initiatives

Chapter 5: The Roadmap
5.1 The Roadmap Case Studies

Chapter 6: Organizing for Big Data Governance
6.1 Map Key Processes and Establish a RACI Matrix to Identify Stakeholders in Big Data Governance
6.2 Determine the Appropriate Mix of New and Existing Roles for Information Governance
6.3 Appoint Big Data Stewards as Appropriate
6.4 Add Big Data Responsibilities to Traditional Information Governance Roles as Appropriate
6.5 Establish a Merged Information Governance Organization with Responsibilities That Include Big Data

Chapter 7: Metadata
7.1 Establish a Glossary That Represents the Business Definitions for Key Big Data Terms
7.2 Understand the Ongoing Support for Metadata Within Apache Hadoop
7.3 Tag Sensitive Big Data Within the Business Glossary
7.4 Import Technical Metadata from the Relevant Big Data Stores
7.5 Link the Relevant Data Sources to the Terms in the Business Glossary
7.6 Leverage Operational Metadata to Monitor the Movement of Big Data
7.7 Maintain Technical Metadata to Support Data Lineage and Impact Analysis
7.8 Gather Metadata from Unstructured Documents to Support Enterprise Search
7.9 Extend Existing Metadata Roles to Include Big Data

Chapter 8: Big Data Privacy
8.1 Identify Sensitive Big Data
8.2 Flag Sensitive Big Data Within the Metadata Repository
8.3 Address Privacy Laws and Regulations by Country, State, and Province
8.4 Manage Situations Where Personal Data Crosses International Boundaries
8.5 Monitor Access to Sensitive Big Data by Privileged Users

Chapter 9: Big Data Quality
9.1 Work with Business Stakeholders to Establish and Measure Confidence Intervals for the Quality of Big Data
9.2 Leverage Semi-Structured and Unstructured Data to Improve the Quality of Sparsely Populated Structured Data
9.3 Use Streaming Analytics to Address Data Quality Issues In-Memory Without Landing Interim Results to Disk
9.4 Appoint Data Stewards Accountable to the Information Governance Council for Improving the Metrics Over Time

Chapter 10: Business Process Integration
10.1 Identify the Key Processes That Will Be Impacted by Big Data Governance
10.2 Build a Process Map with Key Activities
10.3 Map Big Data Governance Policies to the Key Steps in the Process

Chapter 11: Master Data Integration
11.1 Improve the Quality of Master Data to Support Big Data Analytics
11.2 Leverage Big Data to Improve the Quality of Master Data
11.3 Improve the Quality and Consistency of Key Reference Data to Support the Big Data Governance Program
11.4 Consider Social Media Platform Policies to Determine the Level of Integration with Master Data Management
11.5 Extract Meaning from Unstructured Text to Enrich Master Data

Chapter 12: Managing the Lifecycle of Big Data
12.1 Expand the Retention Schedule to Include Big Data Based on Local Regulations and Business Needs
12.2 Document Legal Holds and Support eDiscovery Requests
12.3 Compress and Archive Big Data to Reduce IT Costs and Improve Application Performance
12.4 Manage the Lifecycle of Real-Time, Streaming Data
12.5 Retain Social Media Records to Comply with Regulations and Support eDiscovery Requests
12.6 Defensibly Dispose of Big Data No Longer Required Based on Regulations and Business Needs

Chapter 13: Web and Social Media
13.1 Consider Evolving Regulations and Customs When Establishing Policies Regarding the Acceptable Use of Social Media Data About Customers
13.2 Set Up Policies Regarding the Acceptable Use of Social Media Data About Employees and Job Candidates
13.3 Leverage Confidence Intervals to Assess the Quality of Social Media Data
13.4 Establish Policies Regarding the Acceptable Use of Cookies and Other Web Tracking Devices
13.5 Define Policies to Link Online and Offline Data in a Way That Does Not Violate Privacy Concerns and Regulations
13.6 Ensure the Consistency of Web Metrics

Chapter 14: Machine-to-Machine Data
14.1 Assess the Types of Geolocation Data Currently Available
14.2 Establish Policies Regarding the Acceptable Use of Geolocation Data Pertaining to Customers
14.3 Establish Policies Regarding the Acceptable Use of Geolocation Data Pertaining to Employees
14.4 Ensure the Privacy of RFID Data
14.5 Define Policies Relating to the Privacy of Other Types of M2M Data
14.6 Address the Metadata and Quality of M2M Data
14.7 Establish Policies Regarding the Retention Period for M2M Data
14.8 Improve the Quality of Master Data to Support M2M Initiatives
14.9 Secure the SCADA Infrastructure from Vulnerability to Cyber Attacks

Chapter 15: Big Transaction Data

Chapter 16: Biometrics
16.1 Assess the Privacy Implications Relating to the Acceptable Use of Biometric Data
16.2 Work with Legal Counsel to Determine the Impact of Evolving Regulations on the Use of Biometric Data for Customers and Employees

Chapter 17: Human-Generated Data
17.1 Establish Policies to Mask Sensitive Human-Generated Data
17.2 Use Unstructured Human-Generated Data to Improve the Quality of Structured Data
17.3 Manage the Lifecycle of Human-Generated Data to Reduce Costs and Comply with Regulations
17.4 Extract Insights from Unstructured Human-Generated Data to Enrich MDM

Chapter 18: Healthcare
18.1 Leverage Unstructured Data to Improve the Quality of Sparsely Populated Structured Data
18.2 Extract Additional Relevant Clinical Factors Not Available Within Structured Data
18.3 Define Consistent Definitions for Key Business Terms
18.4 Ensure Consistency in Patient Master Data Across Facilities
18.5 Adhere to Privacy Requirements for Protected Health Information in Accordance with HIPAA
18.6 Creatively Manage Reference Data to Yield Effective Clinical Insights

Chapter 19: Utilities
19.1 Duplicate Meter Readings
19.2 Referential Integrity of the Primary Key
19.3 Anomalous Meter Readings
19.4 Data Quality for Customer Addresses
19.5 Information Lifecycle Management
19.6 Database Monitoring
19.7 Technical Architecture

Chapter 20: Communications Service Providers
20.1 Big Data Types
20.2 Integrating Big Data with Master Data
20.3 Big Data Privacy
20.4 Big Data Quality
20.5 Big Data Lifecycle Management

Chapter 21: Big Data Reference Architecture
21.1 Big Data Sources
21.2 Open Source Foundational Components
21.3 Hadoop Distributions
21.4 Streaming Analytics
21.5 Databases
21.6 Big Data Integration
21.7 Text Analytics
21.8 Big Data Discovery
21.9 Big Data Quality
21.10 Metadata for Big Data
21.11 Information Policy Management
21.12 Master Data Management
21.13 Data Warehouses and Data Marts
21.14 Big Data Analytics and Reporting
21.15 Big Data Security and Policy
21.16 Big Data Lifecycle Management
21.17 The Cloud

Chapter 22: Big Data Platforms
22.1 IBM
22.2 Oracle
22.3 SAP
22.4 The Microsoft Big Data Platform
22.5 HP
22.6 Informatica
22.7 SAS
22.8 Teradata
22.9 EMC
22.10 Amazon
22.11 Google
22.12 Pentaho
22.13 Talend

Appendix A: List of Acronyms
Appendix B: Glossary
Appendix C: Reviewer Profiles
Appendix D: Contributor Profiles
Related titles
5 Keys to Business Analytics Program Success5 Keys to Business Analytics Program Success
Big Data AnalyticsBig Data Analytics
Data Governance ToolsData Governance Tools
DB2 11: The Database for Big Data & AnalyticsDB2 11: The Database for Big Data & Analytics
IBM InfoSphereIBM InfoSphere
Selling Information Governance to the BusinessSelling Information Governance to the Business
Chief Data Officer Handbook for Data Governance, TheChief Data Officer Handbook for Data Governance, The
IBM Data Governance Unified Process, TheIBM Data Governance Unified Process, The
  Special Offer Code  
Enter your Special Offer Code here:
  Search for  

  Our Products  
Browse all »»
Free-Format RPG IV, 3rd Edition
MDM for Customer Data
DB2 10.1 Fundamentals: Certification Study Guide (Exam 610)

If download option is selected, Adobe Acrobat 5.0 or lateris requiredto read our e-books*

*Windows PC, Mac OS9/OSX, and Linux