Apache Airflow 2.10.0 introduces several enhancements, including a revamped scheduler with better performance and reduced latency. This release also features improved observability with new metrics, enhanced integration with Kubernetes, and upgraded security measures. Users will benefit from the updated UI/UX, making it more intuitive and responsive. The introduction of deferrable operators and the flexibility to handle more complex DAGs provide a powerful toolset for developers. With these improvements, Airflow 2.10.0 is set to offer more robust and efficient workflow management for modern data engineering needs.
Table of Content
- Introduction
- Dataset Enhancements: A Leap Forward in Flexibility
- User Interface Improvements: A New Look and Feel
- Lineage Enhancements: Greater Insights with OpenLineage
- Multiple Executor Configuration: The Best of Both Worlds
- Airflow 2.10: Additional Noteworthy Features
- Conclusion
- FAQs
- Q1. Which new feature in Apache Airflow 2.10 is the most noteworthy?
- Q2. How does Airflow 2.10 improve the user interface?
- Q3. What improvements have been made to lineage tracking in this release?
- Q4. Can I use multiple executors in Airflow 2.10?
- Q5. How has task instance history retention changed in this release?
- Q6. Does Airflow 2.10 offer any enhancements for error logging?
- Learn more about related or other topics
Introduction
The Apache Airflow community excitedly awaits the introduction of the newest version every few months, and with good reason. This update brings a slew of new features, enhancements, and bug fixes that push the boundaries of what’s possible with this powerful orchestration tool. The release of Apache Airflow 2.10 is no exception, offering a remarkable 40+ new features, over 80 improvements, and more than 40 bug fixes. But what truly sets this release apart is its emphasis on flexibility and user experience, especially with the enhancements made to the widely used dataset feature.
Dataset Enhancements: A Leap Forward in Flexibility
Datasets have been a cornerstone of Airflow’s functionality since their introduction in version 2.4, allowing Directed Acyclic Graphs (DAGs) that interact with the same data to have clear, visible relationships. This capability has proven invaluable for implementing use cases in MLOps and GenAI, where data-driven decisions are paramount. With the release of Airflow 2.10, datasets are enhanced in terms of flexibility and manageability.
Dynamic Dataset Definition
One of the standout features is the Dynamic Dataset Definition. Previously, datasets were static, with inlets and outlets defined during DAG parsing time. While this approach ensured well-formed dataset URIs, it lacked the flexibility to handle dynamic scenarios where dataset information becomes available only during task execution. Enter the new DatasetAlias
class in Airflow 2.10, which allows for dynamic resolution of datasets at runtime. This means you can now define downstream schedules or inlets without knowing the exact dataset name in advance, offering unparalleled flexibility.
Add Metadata to Dataset Events
Another exciting update is the ability to Add Metadata to Dataset Events. With the introduction of metadata, you can now attach additional context to dataset events, such as the number of records processed or a new model accuracy score. This metadata can be leveraged by downstream tasks, enabling more informed and efficient workflows.
Dataset UI
To complement these enhancements, the Dataset UI has received a significant overhaul. The datasets page now focuses on dataset events, providing richer information and a more intuitive interface. The updated UI also includes separate tabs for the dependency graph and the list of all datasets, making navigation cleaner and more straightforward.
User Interface Improvements: A New Look and Feel
The Dark Mode
Speaking of UI improvements, Airflow 2.10 introduces several user-friendly features that enhance the overall experience. One of the most anticipated additions is the Dark Mode. With a simple toggle, users can now switch between light and dark themes, catering to their personal preferences or work environments.
Reparse DAGs on Demand
The UI also gains a new button to Reparse DAGs on Demand, thanks to the addition of a DAG reparsing endpoint to the API. This feature streamlines the process of updating and troubleshooting DAGs, saving users valuable time.
Visibility improvements
Visibility improvements are another key highlight. Airflow 2.10 now displays task failed dependencies more clearly on the details page, and the XCom display has been revamped for better readability with a proper JSON react view.
Lineage Enhancements: Greater Insights with OpenLineage
Lineage enhancements mark another crucial improvement. With the implementation of AIP 62, popular operators like PythonOperator can now emit lineage information, closing gaps in Airflow’s lineage capabilities and offering real-world benefits to users.
OpenLineage
Data lineage is critical for understanding data flows, ensuring compliance, and troubleshooting. The integration with OpenLineage, the industry-standard framework for data lineage, has been a vital feature of Airflow for some time. However, earlier versions had limitations, particularly with the widely used PythonOperator
.
Airflow 2.10 addresses this with AIP 62, which introduces instrumentation to capture lineage information from key operators like PythonOperator
, TaskFlow API
, and Object Storage API
. This enhancement closes significant gaps in lineage tracking, offering users deeper insights into their data ecosystems.
Multiple Executor Configuration: The Best of Both Worlds
For those grappling with executor choices, Airflow 2.10 introduces the ability to configure multiple executors concurrently. This feature allows users to assign specific tasks to the executor that best optimizes resource utilization and meets custom execution requirements.
Choosing the right executor is a critical decision when setting up an Airflow instance, as it impacts performance, isolation, and resource utilization. Users used to have to decide on a single executor, which might have involved trade-offs. With Airflow 2.10, that’s no longer the case.
The new Multiple Executor Configuration allows users to configure more than one executor simultaneously. This means you can assign tasks to the most suitable executor based on specific requirements, optimizing performance and resource management.
Airflow 2.10: Additional Noteworthy Features
Airflow 2.10 is packed with other notable updates that cater to a broad range of use cases:
Deferrable Operators
Deferrable operators can now start execution directly from the triggerer, bypassing the worker. This is especially efficient for operators like sensors, reducing latency and costs.
Task Instance History
Task instance history is now retained for all attempts, providing detailed information that supports DAG versioning—a feature to look forward to in future releases.
Executor Logs
Important executor logs are now integrated into task logs, making debugging easier by centralizing error messages.
Improved Try Number Handling
The try_number
is now determined when a task is scheduled, rather than at the beginning of task execution. This change resolves issues with incorrect try numbers during task resumption or deferral. The try number remains constant during task execution and is only incremented for new tries.
Enhanced Logout Security
The /logout
endpoint in the FAB Auth Manager now uses the POST method instead of GET and includes CSRF protection. This change applies to all existing AuthViews, improving security and preventing unauthorized logouts.
OpenTelemetry Tracing
Airflow 2.10 introduces the ability to emit OpenTelemetry traces for both Airflow system components (scheduler, triggerer, executor, processor) and DAG runs. This feature complements the existing OpenTelemetry metrics support, providing richer observability data.
New Task Flow Decorators
The release introduces @skip_if
and @run_if
decorators for the Task Flow API. These decorators simplify the process of conditionally skipping or running tasks, enhancing the flexibility of workflow definitions.
These features further demonstrate Airflow 2.10’s focus on improving security, observability, and ease of use. The changes to try number handling and logout security address important edge cases and potential vulnerabilities. The addition of OpenTelemetry tracing capabilities significantly enhances Airflow’s observability features, while the new Task Flow decorators provide developers with more intuitive ways to control task execution.
Conclusion
Apache Airflow 2.10 is a significant milestone in the evolution of this powerful orchestration tool. Whether you’re a data engineer, a machine learning practitioner, or a DevOps professional, this release offers something to enhance your workflows. From dynamic datasets to UI improvements and lineage tracking, Airflow 2.10 is poised to be a game-changer in the way we manage and orchestrate complex data pipelines. Don’t miss out—dive into the new features and see how they can transform your projects today.
FAQs
Q1. Which new feature in Apache Airflow 2.10 is the most noteworthy?
The introduction of DatasetAlias for dynamic dataset definition is one of the most significant features, allowing for more flexible dataset handling and runtime resolution.
Q2. How does Airflow 2.10 improve the user interface?
Airflow 2.10 introduces a dark mode option, refreshes the datasets page with a focus on events, and adds convenient features like a button to reparse DAGs on demand.
Q3. What improvements have been made to lineage tracking in this release?
AIP 62 has added instrumentation to gather lineage information from important hooks, allowing popular operators like PythonOperator to emit lineage data.
Q4. Can I use multiple executors in Airflow 2.10?
Yes, Airflow 2.10 supports configuring multiple executors concurrently, allowing you to assign tasks to different executors based on specific requirements.
Q5. How has task instance history retention changed in this release?
Task instance history is now kept for all task instance tries, not just the most recent attempt, providing more comprehensive historical data.
Q6. Does Airflow 2.10 offer any enhancements for error logging?
Yes, important executor logs are now sent to the task logs, making it easier to debug issues when executors fail to start tasks.
Learn more about related or other topics
- Airflow 2.10.0 Full release details
- What’s new in Apache Airflow 2.9.0
- Oracle Database 23ai: AI to the Heart of the Database
- Data Warehouse: A Beginner’s Guide To The New World
- Snowflake Copilot: How to Get the Most Out of it?
- How to Distinguish Data Analytics & Business Intelligence
- Snowflake Time Travel: How to Make It Work for You?
- NoSQL Vs SQL Databases: An Ultimate Guide To Choose
- AWS Redshift Vs Snowflake: How To Choose?