An Introduction to Big Data Engines and Frameworks for Building Machine Learning Data Pipelines
Data Engineers supply massive datasets to Data Scientists so they can train and build models that drive great business outcomes.
Today’s Data Engineer not only builds data pipelines that support traditional data warehouses but also builds more technically demanding continuous data pipelines that feed today’s Artificial Intelligence and Machine Learning applications.
Building cost-effective, fast, and reliable data pipelines regardless of the type of workload and use case, is no small feat.
This white paper introduces common big data engines for building data pipelines and takes a deep dive into how these engines are used for exploring and preparing data, building pipelines for batch processing and streaming data, orchestrating data pipelines, and delivering data sets to Machine Learning or Advanced Analytics applications.