? Learn more about the talk and download the slides at
? Sign up to our newsletter so you wont miss the updates about the next Crunch Data Conference:
We introduce Dagster an open source Python library for building ETL processes ML pipelines and similar software systems all of which we call data applications.
Data applications are graphs of functional computations that consume and produce data assets.
Dagster provides abstractions and tools for modeling the semantics of these applications by providing a unified type system a data dependency graph a configuration system a structured API for emitting events such as data quality tests and materializations and high-quality developer tools built on those abstractions.
Builders can use the tools they know -- e.g.
Spark jobs for data engineers SQL statements for analysts Python for data scientists -- and the application can be deployed to arbitrary orchestration engines -- such as Airflow Dask or Kubernetes-based execution -- in a pluggable fashion.
- Captured Live on Ustream at
Источник: rutube.ru