| About this title |
Data governance is the formulation of policy to optimize, secure, and leverage information as an enterprise asset by aligning the objectives of multiple functions. Data governance programs have traditionally focused on people and process. Cost has historically been a key consideration because data governance programs have often started from scratch, with little to no funding. As a result, Microsoft Excel and SharePoint have been the tools of choice to document and share data governance artifacts. While the marginal cost of these tools is zero, they are often missing critical functionality. Meanwhile, vendors have matured their data governance offerings to the extent that organizations need to consider tools as a critical component of their data governance programs.
It is not always clear, however, what "data governance tools" really mean. In this book, data governance expert Sunil Soares reviews a reference architecture for data governance software tools. He seeks to define the category called "data governance," as well as lay out evaluation criteria for software tools, the vendor landscape, and the alignment with big data.
The book contains five sections:
- Introduction (to Data Governance and EDM) introduces data governance and the Enterprise Data Management (EDM) reference architecture.
- Categories of Data Governance Tools discusses key data governance tasks that can be automated by tools for business glossaries, metadata management, data profiling, data quality management, master data management, reference data management, and information policy management.
- The Integration Between Enterprise Data Management and Data Governance Tools provides an overview of the integration points between EDM tools and data governance. EDM tools relate to data modeling, data integration, analytics and reporting, business process management, data security and privacy, and information lifecycle management.
- Big Data Governance Tools looks at how data governance tools interact with big data technologies, including Hadoop, NoSQL, stream computing, and text analytics.
- Evaluation Criteria and the Vendor Landscape discusses evaluation criteria for data governance tools and provides an overview of key vendor platforms, including ASG, Collibra, Global IDs, IBM, Informatica, Orchestra Networks, SAP, and Talend.
  |
About author |
Sunil Soares — Sunil Soares is the founder and managing partner of Information Asset, LLC, a consulting firm that specializes in data governance. Prior to this role, Sunil was director of information governance at IBM, where he worked with clients across six continents and multiple industries. Before joining IBM, Sunil consulted with major financial institutions at the Financial Services Strategy Consulting Practice of Booz Allen & Hamilton in New York. Sunil lives in New Jersey and holds an MBA in Finance and Marketing from the University of Chicago Booth School of Business.
The Chief Data Officer Handbook for Data Gocernance is Sunil's fifth book about data governance. His first book, The IBM Data Governance Unified Process, details the almost 100 steps to implement a data governance program. This book has been used by several organizations as the blueprint for their data governance programs and has been translated into Chinese. Sunil's second book, Selling Information Governance to the Business, reviews the best practices to approach information governance by industry and function. Sunil's third book, IBM InfoSphere: A Platform for Big Data Governance and Process Data Governance, focuses on IBM's InfoSphere product. Sunil's fourth book, Big Data Governance, addresses the specific issues associated with the governance of big data.
|
Contents |
CONTENTS About the Author Forewords Preface
PART I--INTRODUCTION Chapter 1: An Introduction to Data Governance Definition Case Study The Pillars of Data Governance Summary
Chapter 2: Enterprise Data Management Reference Architecture EDM Categories Big Data Data Governance Tools Summary
PART II--CATEGORIES OF DATA GOVERNANCE TOOLS Chapter 3: The Business Glossary Bulk-Load Business Terms in Excel, CSV, or XML Format Create Categories of Business Terms Facilitate Social Collaboration Automatically Hyperlink Embedded Business Terms Add Custom Attributes to Business Terms and Other Data Artifacts Add Custom Relationships to Business Terms and Other Data Artifacts Add Custom Roles to Business Terms and Other Data Artifacts Link Business Terms and Column Names to the Associated Reference Data Link Business Terms to Technical Metadata Support the Creation of Custom Asset Types Flag Critical Data Elements Provide OOTB and Custom Workflows to Manage Business Terms and Other Data Artifacts Review the History of Changes to Business Terms and Other Data Artifacts Allow Business Users to Link to the Glossary Directly from Reporting Tools Search for Business Terms Integrate Business Terms with Associated Unstructured Data Summary
Chapter 4: Metadata Management Pull Logical Models from Data Modeling Tools Pull Physical Models from Data Modeling Tools Ingest Metadata from Relational Databases Pull in Metadata from Data Warehouse Appliances Integrate Metadata from Legacy Data Sources Pull Metadata from ETL Tools Pull Metadata from Reporting Tools Reflect Custom Code in the Metadata Tool Pull Metadata from Analytics Tools Link Business Terms with Column Names Pull Metadata from Data Quality Tools Pull Metadata from Big Data Sources Provide Detailed Views on Data Lineage Customize Data Lineage Reporting Manage Permissions in the Metadata Repository Support the Search for Assets in the Metadata Repository Summary
Chapter 5: Data Profiling Conduct Column Analysis Discover the Values Distribution of a Column Discover the Patterns Distribution of a Column Discover the Length Frequencies of a Column Discover Hidden Sensitive Data Discover Values with Similar Sounds in a Column Agree on the Data Quality Dimensions for the Data Governance Program Develop Business Rules Relating to the Data Quality Dimensions Profile Data Relating to the Completeness Dimension of Data Quality Profile Data Relating to the Conformity Dimension of Data Quality Profile Data Relating to the Consistency Dimension of Data Quality Profile Data Relating to the Synchronization Dimension of Data Quality Profile Data Relating to the Uniqueness Dimension of Data Quality Profile Data Relating to the Timeliness Dimension of Data Quality Profile Data Relating to the Accuracy Dimension of Data Quality Discover Data Overlaps Across Columns Discover Hidden Relationships Between Columns Discover Dependencies Discover Data Transformations Create Virtual Joins or Logical Data Objects That Can Be Profiled Summary
Chapter 6: Data Quality Management Transform Data into a Standardized Format Improve the Quality of Address Data Match and Merge Duplicate Records Create a Data Quality Scorecard View the Data Quality Scorecard Highlight the Financial Impact Associated with Poor Data Quality Conduct Time Series Analysis Manage Data Quality Exceptions Summary
Chapter 7: Master Data Management Define Business Terms Consumed by the MDM Hub Manage Entity Relationships Manage Master Data Enrichment Rules Manage Master Data Validation Rules Manage Record Matching Rules Manage Record Consolidation Rules View a List of Outstanding Data Stewardship Tasks Manage Duplicates View the Data Stewardship Dashboard Manage Hierarchies Improve the Quality of Master Data Integrate Social Media with MDM Manage Master Data Workflows Compare Snapshots of Master Data Provide a History of Changes to Master Data Offload MDM Tasks to Hadoop for Faster Processing Summary
Chapter 8: Reference Data Management Build an Inventory of Code Tables Agree on the Master List of Values for Each Code Table Build Simple Mappings Between Master Values and Related Code Tables Build Complex Mappings Between Code Values Manage Hierarchies of Code Values Build and Compare Snapshots of Reference Data Visualize Inter-Temporal Crosswalks Between Reference Data Snapshots Summary
Chapter 9: Information Policy Management Manage Information Policies, Standards, and Processes Within the Business Glossary Manage Business Rules Leverage Data Governance Tools to Monitor and Report on Compliance Manage Data Issues Summary
PART III--THE INTEGRATION BETWEEN ENTERPRISE DATA MANAGEMENT AND DATA GOVERNANCE TOOLS Chapter 10: Data Modeling Integrate the Logical and Physical Data Models with the Metadata Repository Expose Ontologies in the Metadata Repository Prototype a Unified Schema Across Data Domains Using Data Discovery Tools Establish a Data Model to Support Master Data Management Summary
Chapter 11: Data Integration Deploy Data Quality Jobs in an Integrated Manner with Data Integration Move Data Between the MDM or Reference Data Hub and the Source Systems Leverage Reference Data for Use by the Data Integration Tool Integrate Data Integration Tools into the Metadata Repository Automate the Production of Data Integration Jobs by Leveraging the Metadata Repository Summary
Chapter 12: Analytics and Reporting Export Data Profiling Results to a Reporting Tool for Further Visual Analysis Export Data Artifacts to a Reporting Tool for the Visualization of Data Governance Metrics Integrate Analytics and Reporting Tools with the Business Glossary for Semantic Context Summary
Chapter 13: Business Process Management Data Governance Workflows Should Leverage BPM Capabilities Master Data Workflows Should Leverage BPM Capabilities Data Governance Tools Should Map to BPM Tools Summary
Chapter 14: Data Security and Privacy Determine Privacy Obligations Discover Sensitive Data Using Data Discovery Tools Flag Sensitive Data in the Metadata Repository Mask Sensitive Data in Production Environments Mask Sensitive Data in Non-Production Environments Monitor Database Access by Privileged Users Document Information Policies Implemented by Data Masking and Database Monitoring Tools Create a Complete Business Object Using Data Discovery Tools That Can Be Acted Upon by Data Masking Tools Summary
Chapter 15: Information Lifecycle Management Document Information Policies in the Business Glossary That Are Implemented by ILM Tools Discover Complete Business Objects That Can Be Acted on Efficiently by ILM Tools Summary
PART IV--BIG DATA GOVERNANCE TOOLS Chapter 16: Hadoop and NoSQL Conduct an Inventory of Data in Hadoop Assign Ownership for Data in Hadoop Provision a Semantic Layer for Analytics in Hadoop View the Lineage of Data In and Out of Hadoop Manage Reference Data for Hadoop Profile Data Natively in Hadoop Discover Data Natively in Hadoop Execute Data Quality Rules Natively in Hadoop Integrate Hadoop with Master Data Management Port Data Governance Tools to Hadoop for Improved Performance Govern Data in NoSQL Databases Mask Sensitive Data in Hadoop Summary
Chapter 17: Stream Computing Use Data Profiling Tools to Understand a Sample Set of Input Data Govern Reference Data to Be Used by the Stream Computing Application Govern Business Terms to Be Used by the Stream Computing Application Summary
Chapter 18: Text Analytics Big Data Governance to Reduce the Readmission Rate for Patients with Congestive Heart Failure Leverage Unstructured Data to Improve the Quality of Sparsely Populated Structured Data Extract Additional Relevant Predictive Variables Not Available in Structured Data Define Consistent Definitions for Key Business Terms Ensure Consistency in Patient Master Data Across Facilities Adhere to Privacy Requirements Manage Reference Data Summary
PART V--EVALUATION CRITERIA AND THE VENDOR LANDSCAPE Chapter 19: The Evaluation Criteria for Data Governance Platforms The Total Cost of Ownership Data Stewardship Approval Workflows The Hierarchy of Data Artifacts Data Governance Metrics The Cloud Summary
Chapter 20: ASG ASG-metaGlossary ASG-Rochade ASG-becubic
Chapter 21: Collibra Business Glossary Reference Data Management Data Stewardship Workflows Metadata Data Profiling
Chapter 22: Global IDs Data Profiling Data Quality Metadata
Chapter 23: IBM Metadata Information Integration Data Quality Master Data Management Data Lifecycle Management Privacy and Security
Chapter 24: Informatica Data Profiling and Data Quality Metadata and Business Glossary Master Data Management Information Lifecycle Management Security and Privacy Cloud
Chapter 25: Orchestra Networks Workflows Data Modeling Master Data Management Reference Data Management Business Glossary
Chapter 26: SAP An In-Memory Database Data Quality and Metadata Management Master Data Management Content Management Information Lifecycle Management Enterprise Modeling Data Integration
Chapter 27: Talend The Extended Ecosystem Big Data Data Integration Data Quality Master Data Management Enterprise Service Bus (ESB) Business Process Management (BPM)
Chapter 28: Notable Vendors Adaptive BackOffice Associates Data Advantage Group Diaku Embarcadero Technologies Global Data Excellence Harte-Hanks Trillium Oracle SAS
Appendix A: List of Acronyms Appendix B: Glossary Appendix C: Potential Data Governance Tasks to Be Automated with Tools Index |
Related titles |
|
|