Big Data

Integration

ETL (Extract Transform Load) is the ability to extract data, transform it, and then load it to a database.

Any data integration solution must be able to collect, transform, and distribute large volumes of data and support data structures that range from simple to highly complex. A successful business needs to ensure that it can manage data collections, transformation, and delivery with agility, supporting their specific requirements. Bulk data delivery is a very successful method for providing data, although it’s certainly not the only method. There are different ways of managing bulk data delivery, and they include:

  • ETL (Extract Transform Load) – the ability to extract data, transform it, and then load it to a database.
  • ELT (Extract Load Transform) – the ability to extract data, load it and then transform it inside a database engine. ELT is also sometimes referred to as ETL Pushdown.
  • TELT (Transform Extract Load Transform) – the ability to transform and extract data at the source, load it and then transform it inside a database engine.
  • TETLT (Tranform Extract Transform Load Transform) – the ability to transform and extract data at the source, transform it again within the data integration engine, and then load it and transform it inside a database engine.

ETL is not efficient in all circumstances, because it has some limitations – which are more pronounced when dealing with big data. For example, ETL Pushdown is not an appropriate option when:

  • Data is not stored in relational tables.
  • Data has not been cleansed.
  • Data must be integrated from heterogeneous data stores. (SAP, flat files, etc.)

Additionally, some integration logic – such as that for data cleansing, data profiling, and capturing changed data, to name just a few – simply cannot be pushed into the database. And database performance will not always run faster than a fully scalable (MPP-based) ETL engine: For some data integration processes, the database will run faster, while for other processes the database will run much slower.

  • Flexibility – In this era of Big Data, it’s important to have a data integration solution that supports all of the bulk data integration styles – not just one. As your data requirements change – and then change again – it’s important for organizations to ensure that they options that can help them easily keep up with their business requirements. This flexibility is important to any business, but it adds even greater value as organizations decide how, for example, they want to manage data transformation and processing with big data platforms.
  • Scalability & performance – It is also not enough to be able to simply perform ETL (or ELT, etc). Organizations need a high performance data integration platform that’s able to scale up and down (and then back up again) seamlessly, without disruption to the business, whether you are supporting batch or real-time data requirements.
  • Productivity – No matter the method for data transformation and delivery, work exists for developers. Businesses need a data integration solution that provides transformation components, including prebuilt objects that act on data to satisfy simple and complex data integration requirements.
  • Extensive enterprise connectivity – Successful enterprise-class information integration requires access to a full range of data sources (as well as targets and applications) – whether structured, semi-structured or unstructured – within and outside of the enterprise.
  • Even more flexibility – There are use cases for which businesses need to augment their bulk data delivery strategy with federated or incremental (and real-time) data. Businesses need a data integration platform that can adapt to also include and manage these requirements, as well.

IBM’s enterprise-class data integration solution is MPP based, providing a high degree of performance, scalability, and flexibility. It also provides pre-built transformation components and extensive enterprise connectivity to support varied data integration requirements.

Learn more about the ETL and IBM :

  • Data integration.
  • InfoSphere DataStage.
  • InfoSphere Information Server.

BUSINESS
INTELLIGENCE

PREDICTIVE
ANALYTICS

DATA
WAREHOUSE

Contact us for more information on Integration