Integrate SAP with “anyCloud”
Ten years ago, our customers who were (and still are) running SAP asked me about this new thing which was suddenly appearing everywhere: Big Data. Then, five years ago, this turned into Data Lakes and Machine Learning. Now it’s all about cloud integration and – of course – AI. Sometimes the fundamental approach to data has changed or evolved. Sometimes its just a change in names. For example, over the last decade, Data Lakes morphed into Data Meshes, which when you think about it are nothing but an interconnected slightly more sophisticated data lake.
Managing Director & CTO DEV OWN
Still a big question: how do you integrate your SAP data
One topic which was hot ten years ago is still hot: how do you actually integrate SAP data with (back then) your Hadoop cluster or (now) with your corporate multi cloud? Each hyperscaler has an out-of-the-box answer, but if you are an SAP customer, you will not be 100% happy with each of these. That's why we built our solution SNP Glue.
In this new 2023 blog series I will deep dive into the challenges of integrating SAP with the cloud technologies of your choice and how we tackle the challenge. There are good reasons why we built our own technology as middleware, ranging from delta capture over various SAP data sources to performance and flexibility.
In this first article of the new blog series I will go into the backgrounds while in the subsequent articles I will go into detail about individual hyperscalers and what SNP Glue offers to integrate with their technologies.
Why would you want to integrate SAP with your cloud data warehouse or data lake?
The obvious answer today would be along the lines of “so that you can unleash the power of modern Artificial Intelligence on that data”…well duh...
However, to be fair, while that may be true in some way, the reality is often much simpler!
At SNP, we see two kinds of customer cases:
- The first is a single integration scenario, like a dedicated application to help users with a single aspect of thesupply chain. Or a dashboard. Or to provide data to external auditors in a highly selective way.
- The second is a massive SAP integration where pretty much all SAP application tables from each production system need to be made available to a data lake (excuse me, data mesh) in the cloud.
Use cases are manyfold, and include reporting, dashboarding, auditing, supply chain optimization, predictive maintenance, churn reduction, and many more. Clearly, implementing such use cases on a copy of the SAP data (where maybe some personal data is anonymized for GDPR reasons!) has many advantages, the most obvious being that you avoid any performance impact on SAP. Other advantages are that it's easier to blend data from multiple data sources, and use cloud specific features for AI/ML.
You need power? SNP Glue has it.
Clearly, when a technology scales well enough for the second case, you can also cover the first one. However, looking at the massive scope of SAP’s ERP, the technology to integrate needs to be very powerful.
Such a software needs to cover these different aspects and features. The obvious one would be achieving true scalability. SNP Glue can replicate tens of thousands of SAP tables from multiple SAP production systems to the cloud to make the data available for a wide range of use cases. Both initial full load and delta, ideally in a massively parallelized way. With SNP Glue, SNP customers can achieve exactly that, e.g. replicate 50.000 SAP tables from a long list of SAP systems (both ERP and BW) in near-real time to a data lake in the cloud.
Without good CDC (Change Data Capture), i.e. delta replication, you would need to perform full loads periodically. Data would constantly be outdated. CDC can be achieved using different technologies for different SAP data sources, e.g. database triggers work well for ERP, but not for SAP BW.
To achieve good performance and minimize the potential “points of failure” in the chain of different hops your data has to pass through, you will want to aim at a very direct connection between your SAP data sources and the data store (e.g. Snowflake). By cutting out the middleman you also make the integration robust, simply because there are less potential points of failure.
Depending on source of data, SAP module or application, and scenarios you will want to distinguish between “Application integration” and “Data integration”. Application integration will be more event-driven on the source side (e.g. through SAP Business Events, SAP BAPIs), and event broker driven on the receiving end (e.g. Kafka). For Data Integration scenarios you will want to consider raw SAP tables to scale the integration solution and cover as much ground as possible.
SAP data – critical yet complex
SAP data tends to be “special”. This is the case on a technical level and on an application level. For example, once you have to insert the decimal dot into SAP amount fields based on a currency key which is not to be found in the same database table you know what I mean about “technical difficulties”. An integration solution needs to not only scale and perform, but also be able to cover such technicalities (needless to say, SNP Glue does). Also, for any SAP integration solution I would recommend to not forget the need to integrate with data catalogues.
On application level, the nature of SAP data differs heavily between structured data (e.g. SAP ERP tables), unstructured data (e.g. SAP archives or attachments to SAP postings), non-ERP data such as SAP BW Queries, or the direct access to the output of SAP transaction codes. A good integration solution should be able to handle many of these data sources above and beyond mere tables.
ETL to ELT
Finally, for massive scenarios, you also will want to change away from the age-old ETL paradigm (Extract-Transform-Load) to the ELT paradigm where data is first moved (Extracted and Loaded) and only then transformed (i.e. filtered, cleaned, enriched, combined with other sources and more). However, that does not mean you won’t need any ETL features in the SAP integration solution. For example, masking some personal data should be possible directly in the extraction of data (again, needless to say, SNP Glue does not only that, but also much more through the native SAP integration).
SNP Glue covers all kinds of data targets. Initially built to integrate SAP with Hadoop-based data lakes, our solution has grown to cover anyDB and anyCloud:
- anyDB, i.e. Oracle, SAP DB, SAP ASE, SAP IQ, Microsoft SQL Server, DB2, DB4, DB6, …
- anyCloud, i.e. AWS (e.g. redshift), Azure (e.g. Azure SQL), GCP (e.g. BigQuery), and Snowflake
- anyTargets, more details about data sources and data targets deserve new blogs for the information.
Lastly, some words about how to install and deploy SNP Glue. This is fairly simple: SNP Glue is deployed as an ABAP-based addon, which works in all kinds of SAP deployments:
- classical on premise SAP installations
- hosted SAP installations as well as private cloud-based SAP
- SAP Rise
In parallel, we are launching additional features that adds native cloud-based deployment with a minimized SAP footprint, plus native non-SAP capabilities to be able to tap into data sources beyond SAP Netweaver, with great flexibility and scalability.
Managing Director & CTO DEV OWN