Discover the power of technology and learning with TechyBuddy

What’s New in Airflow 2.10? Everything You Need to Know

Spread the knowledge
airflow-2-10

Apache Airflow 2.10.0 introduces several enhancements, including a revamped scheduler with better performance and reduced latency. This release also features improved observability with new metrics, enhanced integration with Kubernetes, and upgraded security measures. Users will benefit from the updated UI/UX, making it more intuitive and responsive. The introduction of deferrable operators and the flexibility to handle more complex DAGs provide a powerful toolset for developers. With these improvements, Airflow 2.10.0 is set to offer more robust and efficient workflow management for modern data engineering needs.

Table of Content

Introduction

The Apache Airflow community excitedly awaits the introduction of the newest version every few months, and with good reason. This update brings a slew of new features, enhancements, and bug fixes that push the boundaries of what’s possible with this powerful orchestration tool. The release of Apache Airflow 2.10 is no exception, offering a remarkable 40+ new features, over 80 improvements, and more than 40 bug fixes. But what truly sets this release apart is its emphasis on flexibility and user experience, especially with the enhancements made to the widely used dataset feature.

Apache Airflow 2.10: New Features Introduction

Dataset Enhancements: A Leap Forward in Flexibility

Datasets have been a cornerstone of Airflow’s functionality since their introduction in version 2.4, allowing Directed Acyclic Graphs (DAGs) that interact with the same data to have clear, visible relationships. This capability has proven invaluable for implementing use cases in MLOps and GenAI, where data-driven decisions are paramount. With the release of Airflow 2.10, datasets are enhanced in terms of flexibility and manageability.

Dynamic Dataset Definition

One of the standout features is the Dynamic Dataset Definition. Previously, datasets were static, with inlets and outlets defined during DAG parsing time. While this approach ensured well-formed dataset URIs, it lacked the flexibility to handle dynamic scenarios where dataset information becomes available only during task execution. Enter the new DatasetAlias class in Airflow 2.10, which allows for dynamic resolution of datasets at runtime. This means you can now define downstream schedules or inlets without knowing the exact dataset name in advance, offering unparalleled flexibility.

Add Metadata to Dataset Events

Another exciting update is the ability to Add Metadata to Dataset Events. With the introduction of metadata, you can now attach additional context to dataset events, such as the number of records processed or a new model accuracy score. This metadata can be leveraged by downstream tasks, enabling more informed and efficient workflows.

Dataset UI

To complement these enhancements, the Dataset UI has received a significant overhaul. The datasets page now focuses on dataset events, providing richer information and a more intuitive interface. The updated UI also includes separate tabs for the dependency graph and the list of all datasets, making navigation cleaner and more straightforward.

User Interface Improvements: A New Look and Feel

The Dark Mode

Speaking of UI improvements, Airflow 2.10 introduces several user-friendly features that enhance the overall experience. One of the most anticipated additions is the Dark Mode. With a simple toggle, users can now switch between light and dark themes, catering to their personal preferences or work environments.

Reparse DAGs on Demand

The UI also gains a new button to Reparse DAGs on Demand, thanks to the addition of a DAG reparsing endpoint to the API. This feature streamlines the process of updating and troubleshooting DAGs, saving users valuable time.

Visibility improvements

Visibility improvements are another key highlight. Airflow 2.10 now displays task failed dependencies more clearly on the details page, and the XCom display has been revamped for better readability with a proper JSON react view.

Lineage Enhancements: Greater Insights with OpenLineage

Lineage enhancements mark another crucial improvement. With the implementation of AIP 62, popular operators like PythonOperator can now emit lineage information, closing gaps in Airflow’s lineage capabilities and offering real-world benefits to users.

OpenLineage

Data lineage is critical for understanding data flows, ensuring compliance, and troubleshooting. The integration with OpenLineage, the industry-standard framework for data lineage, has been a vital feature of Airflow for some time. However, earlier versions had limitations, particularly with the widely used PythonOperator.

Airflow 2.10 addresses this with AIP 62, which introduces instrumentation to capture lineage information from key operators like PythonOperator, TaskFlow API, and Object Storage API. This enhancement closes significant gaps in lineage tracking, offering users deeper insights into their data ecosystems.

Multiple Executor Configuration: The Best of Both Worlds

For those grappling with executor choices, Airflow 2.10 introduces the ability to configure multiple executors concurrently. This feature allows users to assign specific tasks to the executor that best optimizes resource utilization and meets custom execution requirements.

Choosing the right executor is a critical decision when setting up an Airflow instance, as it impacts performance, isolation, and resource utilization. Users used to have to decide on a single executor, which might have involved trade-offs. With Airflow 2.10, that’s no longer the case.

The new Multiple Executor Configuration allows users to configure more than one executor simultaneously. This means you can assign tasks to the most suitable executor based on specific requirements, optimizing performance and resource management.

Airflow 2.10: Additional Noteworthy Features

Airflow 2.10 is packed with other notable updates that cater to a broad range of use cases:

Deferrable Operators

Deferrable operators can now start execution directly from the triggerer, bypassing the worker. This is especially efficient for operators like sensors, reducing latency and costs.

Task Instance History

Task instance history is now retained for all attempts, providing detailed information that supports DAG versioning—a feature to look forward to in future releases.

Executor Logs

Important executor logs are now integrated into task logs, making debugging easier by centralizing error messages.

Improved Try Number Handling

The try_number is now determined when a task is scheduled, rather than at the beginning of task execution. This change resolves issues with incorrect try numbers during task resumption or deferral. The try number remains constant during task execution and is only incremented for new tries.

Enhanced Logout Security

The /logout endpoint in the FAB Auth Manager now uses the POST method instead of GET and includes CSRF protection. This change applies to all existing AuthViews, improving security and preventing unauthorized logouts.

OpenTelemetry Tracing

Airflow 2.10 introduces the ability to emit OpenTelemetry traces for both Airflow system components (scheduler, triggerer, executor, processor) and DAG runs. This feature complements the existing OpenTelemetry metrics support, providing richer observability data.

New Task Flow Decorators

The release introduces @skip_if and @run_if decorators for the Task Flow API. These decorators simplify the process of conditionally skipping or running tasks, enhancing the flexibility of workflow definitions.

These features further demonstrate Airflow 2.10’s focus on improving security, observability, and ease of use. The changes to try number handling and logout security address important edge cases and potential vulnerabilities. The addition of OpenTelemetry tracing capabilities significantly enhances Airflow’s observability features, while the new Task Flow decorators provide developers with more intuitive ways to control task execution.

Conclusion

Apache Airflow 2.10 is a significant milestone in the evolution of this powerful orchestration tool. Whether you’re a data engineer, a machine learning practitioner, or a DevOps professional, this release offers something to enhance your workflows. From dynamic datasets to UI improvements and lineage tracking, Airflow 2.10 is poised to be a game-changer in the way we manage and orchestrate complex data pipelines. Don’t miss out—dive into the new features and see how they can transform your projects today.

FAQs

Q1. Which new feature in Apache Airflow 2.10 is the most noteworthy?

The introduction of DatasetAlias for dynamic dataset definition is one of the most significant features, allowing for more flexible dataset handling and runtime resolution.

Q2. How does Airflow 2.10 improve the user interface?

Airflow 2.10 introduces a dark mode option, refreshes the datasets page with a focus on events, and adds convenient features like a button to reparse DAGs on demand.

Q3. What improvements have been made to lineage tracking in this release?

AIP 62 has added instrumentation to gather lineage information from important hooks, allowing popular operators like PythonOperator to emit lineage data.

Q4. Can I use multiple executors in Airflow 2.10?

Yes, Airflow 2.10 supports configuring multiple executors concurrently, allowing you to assign tasks to different executors based on specific requirements.

Q5. How has task instance history retention changed in this release?

Task instance history is now kept for all task instance tries, not just the most recent attempt, providing more comprehensive historical data.

Q6. Does Airflow 2.10 offer any enhancements for error logging?

Yes, important executor logs are now sent to the task logs, making it easier to debug issues when executors fail to start tasks.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top