SAP Data Intelligence – The Comprehensive Guide:
Welcome to the first comprehensive guide on SAP Data Intelligence. SAP constantly is on an innovation journey to provide the best of services to the organizations and aligned with that vision was SAP Data Hub which was released in 2017 with the primary objective of data orchestration and management based on Kubernetes deployment. With the introduction of much more enhanced features along with as a managed service on SAP BTP and On-Premises, SAP Data Hub evolved into what is now known as SAP Data Intelligence. In 2019, SAP made the move to make SAP Data Intelligence as a single stop shop for integration, metadata management (data profiling, data lineage), connectivity, orchestration along with the Machine Learning services which can support both built in and BYOM based scenarios. This is critical for organizations to be able to have one tool to support multiple personas such as a Data Engineer, Data Steward, Analyst, Data Scientist and Administrator. This allowed organizations to reduce their implementation, maintenance and ongoing run costs with AI/Machine Learning scenarios while increasing the productivity and flexibility when it comes to cloud-based, and on-premises based deployment options.
With “Data being the new Oil”, most of the organizations have been focusing on Data & Analytics strategy to establish a data driven platform which can not only help with the consumption of SAP and Non-SAP data sets (structured & un-structured) but also can process it real time based on the data consumed the data insights as well to be able to support the business users for making insight driven decisions to take advantage from the market model than living on traditional multi-platform approach for load and process. With this shift and prioritization from the organizations towards all the changes introduced in the past years with SAP Data Hub to start with and SAP Data Intelligence in the current context along with the rapid pace of innovations that are coming alongside those changes, there is a need for more detailed information about and experience with SAP Data Intelligence from an architectural point of view which would be critical to achieve the desired strategy. We know that there is information available for certain aspects of SAP Data Hub and SAP Data Intelligence, but we believe that other sources only cover a subset of the overall information and that only address a specific audience. With this book, we provide a comprehensive overview of the different aspects of SAP Data Intelligence from industry, business, and technical perspectives.
We also share the experiences we’ve gathered in different client scenarios, as well as challenges and how to address them, with best practices listed along with the business content being defined for SAP Data Intelligence. In addition, we’ve identified a need to prepare customers for the use cases around where SAP Data Intelligence stands out compared to other ETL tools available and to provide them with the right setup to put together the business case.
This overview should help both new SAP clients and experienced SAP customers irrespective of the type of source systems and type of data to understand the different aspects of components available and to get a full breakdown of the benefits you can realize by adopting SAP Data Intelligence down to technical knowledge needed for installation, set up, configuration, implementation and monitoring with right automation levers.
1.2 Audience
This book is geared toward CxOs, Data Stakeholders, Data Scientists, Data Engineers, Administrators, Business data owners, Enterprise Architects, and project managers who want to get a consolidated overview of the value of SAP Data Intelligence and the other products such as SAP S/4HANA, SAP BW/4HANA, Enterprise HANA, SAP Analytics Cloud, SAP DWC, Non-SAP solutions for data lake to support the digital transformation. We cover the industry use cases along with the benefits realized with business content and where not feasible with use cases implemented for the clients globally. Because of the comprehensive nature of the content, we believe the book is relevant for all SAP-interested readers across all industries irrespective of being in the Data and Analytics space to understand its influence on the eco-system.
We have also ensured to put in information around maintenance and administration activities including the security policies to help the different personas understand SAP Data Intelligence capabilities (e.g., Connectivity, integration, metadata management, monitoring, orchestration and AI/Machine Learning Services), as well as how the value scenarios improve day-to-day work with defined day in a life related activity list as well. We also have covered the inputs around migration both from SAP Data Hub to SAP Data Intelligence along with migration of both data and models from SAP Leonardo Machine Learning Foundation to SAP Data Intelligence including inputs around sizing and deployment options that are useful for enterprise architects and project managers.
We want to encourage the readers of the book to take a deep dive at the concepts we introduce and consider them for adoption, building the skills, implementing and for migrating / upgrading as per the scenario needed.
1.3 Structure of the Book
This reference book on SAP Data Intelligence provides an overview and also the details on each of the topic listed. The structure guides you from a strategical level to a practical level and covers all key aspects of each of the chapters. You can also on an ongoing basis decide to easily pick and choose different chapters as per your need and be able to get all the required details by reading through the chapter not having to jump across to other chapters unless there are more details either already covered earlier or later on. However, we recommend that you start from the beginning and work through the different chapters because some of the topics do have more details covered in later chapters with an introduction and context set to begin with and then building on each other as the book progresses.
Part 1: Getting Started
Chapter 1 provides an overview of SAP Data Intelligence in the context of intelligent enterprise by leveraging SAP Business Technology Platform and the framework of Data Fabric based approach. The readers will get introduced to the basics of democratizing enterprise data assets and the role SAP Data Intelligence has to play in making it happen. We wrap up with a summary on different roles around Data orchestration, scalable analytics and machine learning play in developing a data-centric digital transformation strategy.
In Chapter 2, we provide an overview of what SAP Data Hub and SAP Leonardo Machine Learning Foundation have been as a predecessor for SAP Data Intelligence and how the features have been merged and enhanced with new functionalities to create a combined product capable of connecting, orchestrating, enriching and applying machine learning for better decision making with data sets. This chapter also provides an insight into the architecture, deployment options and licensing models including sizing for SAP Data Intelligence while introducing you to the concepts of Kubernetes and why it has been chosen as a platform for setting up SAP Data Intelligence. We wrap up the chapter by walking you through the introduction of SAP Data Intelligence Launchpad before heading into more details in the following chapters.
Chapter 3 provides step by step details of the procedure to build and configure the SAP Data Intelligence system on an on-premises system. We will also provide additional inputs as a follow up to Chapter 2 discussing additional details about the required planning and sizing based activities for the platform components to set up an SAP Data Intelligence lab on the SAP Cloud Appliance Library for cloud deployment. You will also be able to review the pre-requisites for installing the instance in a real word like scenario when dealing with hybrid landscape of cloud and on-premises based instances / systems.
Chapter 4 covers the day in a life from a Data Science project perspective and different personas which are involved in delivery insight and data driven platform to the organizations. We then provide an overview of what is accessible for the users based on their business roles or the persona which the user holds and the process to add new applications needed by the business into SAP Data Intellligence Launchpad. We conclude the chapter by walking you through each application within the SAP Data Intelligence Launchpad and the different activities carried out based on the Persona and how they interface with each other. We expand on each of these applications in the subsequent chapters.
Part II: Data Management, Orchestration and Machine Learning
Chapter 5 is the first set of chapters where we start deep diving into the applications within SAP Data Intelligence as a gateway to discuss the applications around Data Management, Orchestration and Machine Learning. We get started with the use of Metadata Explorer for Data Ingestion, Preparation and Governance along with Data Quality assessment and Scorecards based on the data ingested before the data is available for End User reporting for Data Engineers and Data Stewards to provide high quality of data consistent across your organization. We walk you through how you can define the business rules to assess the data quality which results in the KPIs required for the data quality to be measured and monitored including the data lineage related information to obtain and end-to-end view of data usage.
In Chapter 6, it is time to go into details about Data Orchestration where we provide details around the available connection types and how you can consume the connected sources within a Data Pipeline in order to carry out the Extract, Transform and Loaded using the business logic with different operators available for consumption. We also provide details around the Modeler environment/screen which is used to build the data pipelines called graphs focusing on predefined and built-in operators (custom operators are covered later in the book in Chapter 7). We wrap it up with details around how to schedule, tack and trace data processing pipelines within SAP Data Intelligence Launchpad.
Chapter 7 is an extension to Chapter 6 to go into details about how if the requirements are not met with the Pre-defined and built-in operators, how you can create your own custom operators using different execution environments and consume them into your data pipelines. We walk you through the different runtime engines, which provide the flexibility to combine these multiple environments into a single pipeline. Finally, we wrap up with how to create new data types and to address complex data requirements at hand with the appropriate operators.
Chapter 8 explains the usage of Docker in SAP Data Intelligence, which is one of the major features to implement the pipeline or graph-based flow-based programming. In extension to Chapter 2, where we have introduced the concept of Pods in Kubernetes, we will walk you through Pods and package manager within Kubernetes in relation to SAP Data Intelligence and how all these components are brought together within SAP Data Intelligence platform to enable the Docker. We will wrap up the chapter by providing a hands-on walk through to build the Docker file and using the same within a Data Pipeline.
In Chapter 9, we look at the Machine Learning capabilities provided by SAP Data Intelligence platform. We will primarily be focusing on ML Scenario Manager application available within the SAP Data Intelligence Launchpad, which collects and organizes different data science related artifacts (e.g., models, notebooks, pipelines and datasets). We will also walk you through a simple use case to build your own Machine Learning training artifact/pipeline using Python and deploy it for inference (end user consumption) to explain the functions like data preparation and feature selection as per the model consumed in the use case. We will wrap by looking at another related application called as ML Data Manager, which registers the dataset from ML Scenario Manager to make it available for other Machine Learning scenarios.
Chapter 10 looks at the primary component of a data science project – Jupyter Notebook (Open-source project) and how Jupyter Notebooks can be used within SAP Data Intelligence. In this chapter, we walk you through the fundamental concepts of the Jupyter Lab environment along with SAP HANA Cloud platform (also known as SAP Business Technology Platform – BTP) mainly due to its machine learning capabilities, data persistence and tight integration with SAP Data Intelligence. Jupyter Notebook acts as a pivot which can easily access SAP Data Intelligence and SAP HANA Cloud for different use case scenarios. We wrap up this chapter by looking at different machine learning operators and exploring the Jupyter Lab environment by creating a simple application including the usage of SAP HANA PAL and SAP HANA APL algorithms.
After we introduce Jupyter Notebook usage and the usage of Python and its capabilities, we broaden your view in Chapter 11 by introducing the concepts of SAP Data Intelligence Python based SDK and how it empowers the user to programmatically control the orchestration capabilities. We will walk you through on how to bind models and data pipelines using the SDK and how to access the machine learning artifacts along with the associated metrics based on Machine Learning Tracking SDK package to review the performance of the Machine Learning models.
Part III: Integration
In Chapter 12, we explain the integration scenarios from a connectivity point of view between SAP Data Intelligence and various SAP ABAP based systems. The chapter will cover the ABAP connection creation, prerequisites of the connection creation, its associated authorizations for the ABAP source system user to create the connections for the data consumption along with the data crawling and the transforms for the data. We will walk you through a step-by-step hands-on demo for understanding the different aspects of ABAP system related integration scenarios possible with SAP Data Intelligence.
Based on the fundamentals of the SAP ABAP based integration covered earlier, Chapter 13 focusses on the integration with Non-SAP Cloud databases like AWS S3, Google Storage, IBM Storage, AWS Redshift, Microsoft Azure SQL data warehouse and also traditional on-premises relational databases like Oracle and Microsoft SQL Server. We will provide details around the connection creation with some of the major data sources listed above and the pre-requisites to build these connections.
Chapter 14 walks you through SAP Vora and how its capabilities can be used to integrate Big Data scenarios with SAP Data Intelligence platform using SAP Vora models like tables and views along with SQL. We will also provide details around how we can use its powerful data processing engines to process large sets of complex data formats along with text analytics as per the requirement.
In Chapter 15, post the extraction, transformation and the data ingestion we will walk you through the downstream process on consuming the ingested data for data modeling and reporting. We will provide quick overview of SAP Data Warehouse Cloud around its data warehousing and integration capabilities. We will also provide you insight into the capabilities available when integrating SAP Data Intelligence with SAP Data Warehouse Cloud.
Chapter 16 concludes the deep dive from a modeling and reporting perspective by providing details around how SAP Analytics Cloud can be leveraged to report on the data ingested by SAP Data Intelligence either with stored data onto another DB or through direct integration with SAP Data Intelligence. We will provide a brief overview of what are the different features available within SAP Analytics Cloud and their applicability from a reporting point of view. We will also provide a walk-through of the different operators within SAP Data Intelligence which are useful for reading / decoding data and formatting and pushing datasets to SAP Analytics Cloud.
In Chapter 17, we will discuss the administrative applications in SAP Data Intelligence and their designated administrative tasks. We will focus on System Management, License Management and Connection Management applications which are user interface related applications within the Launchpad. We will also cover different Monitoring application based features available along with the related dashboards to show the pipeline monitoring, other job monitoring and system health status within SAP Data Intelligence.
In Chapter 18 we talk about how SAP Data Intelligence supports data security and compliance requirements for different organizations and the role of an administrator and the user to ensure the right checks are in place to avoid any exposure of Sensitive / Personal information within the landscape given different data sets from different source systems is processed and ingested into target systems. We will also cover the security based approach to be adopted when rolling out the usage of SAP Data Intelligence within your organization including the user authentication setup along with the associated policies and the secure connectivity to the on-premises systems.
Chapter 19 discusses maintenance of SAP Data Intelligence covering different operation modes of the system along with the backup procedure of the system and the steps related to memory extension when needed. Our discussion will also cover the command line interface for different maintenance tasks and their associated parameters to be able to schedule and automate some of the house keeping activities to be carried out in SAP Data Intelligence system.
So far, we have covered different SAP Data Intelligence applications and have seen how to work with the system and artifacts from the initial installation to the maintenance activities to be carried out for ensuring optimal performance of the system.
Chapter 20 provides details around what are the different components and approach involved when it comes to application lifecycle management within SAP Data Intelligence for AI / Machine Learning based Data Science projects with either semi-automated or automated activities based on different parameters to be passed as input. We provide a walk through of the Git and Jenkins set up along with the integration activities to be carried out to put together the CI/CD process leveraging SAP Data Intelligence. We will also walk you through what are the different maturity levels when it comes to MLOps adoption post introducing the different components within SAP space along with their contribution to the process. We finally wrap up by walking you through the steps involved to migrate the models and data from SAP Leonardo Machine Learning Foundation into SAP Data Intelligence both from initial migration and from an ongoing BAU usage.
Last but not the least, Chapter 21 helps you understand the context of how digital transformation is disrupting the global business model for the businesses globally and how the business content made available by SAP and the partner vendors can be adapted by organizations instead of starting the projects from scratch across different industries. We provide some of the benefits realized as well to help you with the creation of the business case and how SAP Data Intelligence can be the one stop shop to address all of your data driven platform requirements around Data Management, Orchestration, Monitoring and AI / Machine Learning applications to consume the enterprise data and for insights to be generated.
Finally, in Chapter 22 we wrap it up with what’s in store for SAP Data Intelligence releases and some of the recent innovations which the audience needs to be aware of.
1.4 Conclusion
Reading this book will provide you with a comprehensive overview of SAP Data Intelligence platform and enable you to be able to converse with both your clients and within organization to showcase the use cases along with the benefits possible and the architecture including the sizing, deployment and other maintenance activities. This book will provide you with A-to-Z of SAP Data Intelligence and what all needs to be considered when it comes to building your skillset for both ongoing and future projects. Let’s get started with the journey with an overview of Intelligent Enterprise and the role of SAP Data Intelligence.
About the Author
Mr. Dharma Teja Atluri is an Associate Partner, Executive Architect, Inventor and artificial intelligence/machine learning evangelist at IBM. He has more than 18 years of experience working in advanced analytics with both SAP and non-SAP product lines. He has provided strategic direction to clients globally regarding the adoption of SAP and non-SAP advanced analytics products for artificial intelligence/machine learning operationalization, data management, information management, and analytics. He has also carried out multiple platform comparison initiatives for reporting, ETL, data warehousing, and data science products across IBM, Microsoft Azure, Google, Amazon Web Services, and SAP. He has led the SAP analytics (reporting and enterprise information management) portfolio for IBM India, and designed client architectures for analytics with SAP and IBM capabilities. Dharma is an IBM master certified data scientist, architect, and technical specialist, and also an IBM thought leader certified consultant. His most recent SAP Data Intelligence sprint was featured for global consumption by clients and nominated for SAP Innovation Awards.
He can be Contacted at
Email : [email protected]
LinkedIn : https://www.linkedin.com/in/dharma/
Twitter : https://twitter.com/darmateja
Skype : darma.teja
https://www.credly.com/users/dharma-teja-atluri/badges
https://www.sap-press.com/sap-data-intelligence_5369/