| About this title |
With the advent of new technologies, enterprises are starting to discover the opportunities that exist with big data. Big data includes content from sources such as social media, telephone GPS signals, utility smart meters, RFID tags, weather monitors, and other sources. Such data tends to be operational in nature and is characterized by the “three V's”: large volume, high velocity, and a variety of formats, including structured, unstructured, and semi-structured.
Although a growing number of books address the topic of big data, none deals with the challenge of governing big data. Yet big data governance is a crucial enabler to derive maximum value from a big data program. In Big Data Governance, Sunil Soares addresses this knowledge gap, examining the industry imperatives that are driving the convergence of these two major trends in Information Management and explaining not only the why but the how of governing big data.
In a non-technical style geared toward business audiences, Sunil explains the importance of establishing appropriate governance over big data initiatives and discusses how to manage and govern big data—highlighting the relevant processes, procedures, and policies. Information-packed sections provide (I) An overview of the big data governance framework, maturity assessments, business cases, and roadmaps; (II) A discussion of the applicability of information governance disciplines to big data; (III) Best practices for governance of the various big data types; (IV) Best practices and case studies for big data governance within healthcare, utilities, and communications service providers; and (V) A reference architecture for big data and a description of the offerings from IBM, Oracle, SAP, Microsoft, Informatica, SAS, and others.
Businesses have learned the importance of governing other types of data, such as master data and reference data. Now, they are waking up to the importance of governing big data. Big Data Governance gives you the tools you need to be successful in meeting this emerging imperative.
Upon completion of Big Data Governance, you will be able to:
- Understand how big data can fit within an information governance program
- Identify key stakeholders for big data
- Quantify the business value of big data
- Apply information governance concepts such as stewardship, data quality, metadata, and organization structures to big data
- Appreciate the business benefits of big data by industry and job function
- Understand the emerging regulations relating to privacy and data retention
- Establish a step-by-step process to implement big data governance
|
About author |
Sunil Soares — Sunil Soares is the founder and managing partner of Information Asset, LLC, a consulting firm that specializes in data governance. Prior to this role, Sunil was director of information governance at IBM, where he worked with clients across six continents and multiple industries. Before joining IBM, Sunil consulted with major financial institutions at the Financial Services Strategy Consulting Practice of Booz Allen & Hamilton in New York. Sunil lives in New Jersey and holds an MBA in Finance and Marketing from the University of Chicago Booth School of Business.
The Chief Data Officer Handbook for Data Gocernance is Sunil's fifth book about data governance. His first book, The IBM Data Governance Unified Process, details the almost 100 steps to implement a data governance program. This book has been used by several organizations as the blueprint for their data governance programs and has been translated into Chinese. Sunil's second book, Selling Information Governance to the Business, reviews the best practices to approach information governance by industry and function. Sunil's third book, IBM InfoSphere: A Platform for Big Data Governance and Process Data Governance, focuses on IBM's InfoSphere product. Sunil's fourth book, Big Data Governance, addresses the specific issues associated with the governance of big data.
|
Contents |
CONTENTS
Foreword by Inderpal Bhandari Foreword by Aaron Zornes Preface
PART ONE: GETTING STARTED Chapter 1: An Introduction to Big Data Governance
Chapter 2: The Big Data Governance Framework 2.1 Big Data Types 2.2 Information Governance Disciplines 2.3 Industry and Functional Scenarios for Big Data Governance Summary
Chapter 3: The Maturity Assessment 3.1 The IBM Information Governance Council Maturity Model 3.2 Sample Questions to Assess Maturity Summary
Chapter 4: The Business Case 4.1 Improve On-Time Performance and Passenger Safety Through Big Data Governance 4.2 Quantify the Financial Impact of Big Data Governance on Customer Privacy 4.3 Reduce IT Costs by Governing the Lifecycle of Big Data 4.4 Estimate the Impact of Data Quality and Master Data on Big Data Initiatives Summary
Chapter 5: The Roadmap 5.1 The Roadmap Case Studies Summary
PART TWO: BIG DATA GOVERNANCE DISCIPLINES Chapter 6: Organizing for Big Data Governance 6.1 Map Key Processes and Establish a RACI Matrix to Identify Stakeholders in Big Data Governance 6.2 Determine the Appropriate Mix of New and Existing Roles for Information Governance 6.3 Appoint Big Data Stewards as Appropriate 6.4 Add Big Data Responsibilities to Traditional Information Governance Roles as Appropriate 6.5 Establish a Merged Information Governance Organization with Responsibilities That Include Big Data Summary
Chapter 7: Metadata 7.1 Establish a Glossary That Represents the Business Definitions for Key Big Data Terms 7.2 Understand the Ongoing Support for Metadata Within Apache Hadoop 7.3 Tag Sensitive Big Data Within the Business Glossary 7.4 Import Technical Metadata from the Relevant Big Data Stores 7.5 Link the Relevant Data Sources to the Terms in the Business Glossary 7.6 Leverage Operational Metadata to Monitor the Movement of Big Data 7.7 Maintain Technical Metadata to Support Data Lineage and Impact Analysis 7.8 Gather Metadata from Unstructured Documents to Support Enterprise Search 7.9 Extend Existing Metadata Roles to Include Big Data Summary
Chapter 8: Big Data Privacy 8.1 Identify Sensitive Big Data 8.2 Flag Sensitive Big Data Within the Metadata Repository 8.3 Address Privacy Laws and Regulations by Country, State, and Province 8.4 Manage Situations Where Personal Data Crosses International Boundaries 8.5 Monitor Access to Sensitive Big Data by Privileged Users Summary
Chapter 9: Big Data Quality 9.1 Work with Business Stakeholders to Establish and Measure Confidence Intervals for the Quality of Big Data 9.2 Leverage Semi-Structured and Unstructured Data to Improve the Quality of Sparsely Populated Structured Data 9.3 Use Streaming Analytics to Address Data Quality Issues In-Memory Without Landing Interim Results to Disk 9.4 Appoint Data Stewards Accountable to the Information Governance Council for Improving the Metrics Over Time Summary
Chapter 10: Business Process Integration 10.1 Identify the Key Processes That Will Be Impacted by Big Data Governance 10.2 Build a Process Map with Key Activities 10.3 Map Big Data Governance Policies to the Key Steps in the Process Summary
Chapter 11: Master Data Integration 11.1 Improve the Quality of Master Data to Support Big Data Analytics 11.2 Leverage Big Data to Improve the Quality of Master Data 11.3 Improve the Quality and Consistency of Key Reference Data to Support the Big Data Governance Program 11.4 Consider Social Media Platform Policies to Determine the Level of Integration with Master Data Management 11.5 Extract Meaning from Unstructured Text to Enrich Master Data Summary
Chapter 12: Managing the Lifecycle of Big Data 12.1 Expand the Retention Schedule to Include Big Data Based on Local Regulations and Business Needs 12.2 Document Legal Holds and Support eDiscovery Requests 12.3 Compress and Archive Big Data to Reduce IT Costs and Improve Application Performance 12.4 Manage the Lifecycle of Real-Time, Streaming Data 12.5 Retain Social Media Records to Comply with Regulations and Support eDiscovery Requests 12.6 Defensibly Dispose of Big Data No Longer Required Based on Regulations and Business Needs Summary
PART THREE: THE GOVERNANCE OF BIG DATA TYPES Chapter 13: Web and Social Media 13.1 Consider Evolving Regulations and Customs When Establishing Policies Regarding the Acceptable Use of Social Media Data About Customers 13.2 Set Up Policies Regarding the Acceptable Use of Social Media Data About Employees and Job Candidates 13.3 Leverage Confidence Intervals to Assess the Quality of Social Media Data 13.4 Establish Policies Regarding the Acceptable Use of Cookies and Other Web Tracking Devices 13.5 Define Policies to Link Online and Offline Data in a Way That Does Not Violate Privacy Concerns and Regulations 13.6 Ensure the Consistency of Web Metrics Summary
Chapter 14: Machine-to-Machine Data 14.1 Assess the Types of Geolocation Data Currently Available 14.2 Establish Policies Regarding the Acceptable Use of Geolocation Data Pertaining to Customers 14.3 Establish Policies Regarding the Acceptable Use of Geolocation Data Pertaining to Employees 14.4 Ensure the Privacy of RFID Data 14.5 Define Policies Relating to the Privacy of Other Types of M2M Data 14.6 Address the Metadata and Quality of M2M Data 14.7 Establish Policies Regarding the Retention Period for M2M Data 14.8 Improve the Quality of Master Data to Support M2M Initiatives 14.9 Secure the SCADA Infrastructure from Vulnerability to Cyber Attacks Summary
Chapter 15: Big Transaction Data Summary
Chapter 16: Biometrics 16.1 Assess the Privacy Implications Relating to the Acceptable Use of Biometric Data 16.2 Work with Legal Counsel to Determine the Impact of Evolving Regulations on the Use of Biometric Data for Customers and Employees Summary
Chapter 17: Human-Generated Data 17.1 Establish Policies to Mask Sensitive Human-Generated Data 17.2 Use Unstructured Human-Generated Data to Improve the Quality of Structured Data 17.3 Manage the Lifecycle of Human-Generated Data to Reduce Costs and Comply with Regulations 17.4 Extract Insights from Unstructured Human-Generated Data to Enrich MDM Summary
PART FOUR: INDUSTRY PERSPECTIVES Chapter 18: Healthcare 18.1 Leverage Unstructured Data to Improve the Quality of Sparsely Populated Structured Data 18.2 Extract Additional Relevant Clinical Factors Not Available Within Structured Data 18.3 Define Consistent Definitions for Key Business Terms 18.4 Ensure Consistency in Patient Master Data Across Facilities 18.5 Adhere to Privacy Requirements for Protected Health Information in Accordance with HIPAA 18.6 Creatively Manage Reference Data to Yield Effective Clinical Insights Summary
Chapter 19: Utilities 19.1 Duplicate Meter Readings 19.2 Referential Integrity of the Primary Key 19.3 Anomalous Meter Readings 19.4 Data Quality for Customer Addresses 19.5 Information Lifecycle Management 19.6 Database Monitoring 19.7 Technical Architecture Summary
Chapter 20: Communications Service Providers 20.1 Big Data Types 20.2 Integrating Big Data with Master Data 20.3 Big Data Privacy 20.4 Big Data Quality 20.5 Big Data Lifecycle Management Summary
PART FIVE: BIG DATA TECHNOLOGY Chapter 21: Big Data Reference Architecture 21.1 Big Data Sources 21.2 Open Source Foundational Components 21.3 Hadoop Distributions 21.4 Streaming Analytics 21.5 Databases 21.6 Big Data Integration 21.7 Text Analytics 21.8 Big Data Discovery 21.9 Big Data Quality 21.10 Metadata for Big Data 21.11 Information Policy Management 21.12 Master Data Management 21.13 Data Warehouses and Data Marts 21.14 Big Data Analytics and Reporting 21.15 Big Data Security and Policy 21.16 Big Data Lifecycle Management 21.17 The Cloud Summary
Chapter 22: Big Data Platforms 22.1 IBM 22.2 Oracle 22.3 SAP 22.4 The Microsoft Big Data Platform 22.5 HP 22.6 Informatica 22.7 SAS 22.8 Teradata 22.9 EMC 22.10 Amazon 22.11 Google 22.12 Pentaho 22.13 Talend Summary
Appendix A: List of Acronyms Appendix B: Glossary Appendix C: Reviewer Profiles Appendix D: Contributor Profiles Index |
Related titles |
|
|