Apache Airflow 專案
透過集合功能整理內容
你可以依據偏好儲存及分類內容。
本頁面包含 Google 技術文件季度接受的技術寫作專案詳細資料。
專案摘要
- 開放原始碼組織:
- Apache Airflow
- 技術文件撰稿者:
- kartik khare
- 專案名稱:
- 如何建立工作流程
- 專案長度:
- 標準長度 (3 個月)
Project description
我會著手編寫說明文件,說明如何輕鬆有效地建立新工作流程。工作流程包含以下步驟:
- 讀取
- 預先處理
- 處理中
- 後續處理中
- 儲存/動作
- 監控
每個步驟可能涉及多項工作,您可以在每個步驟後採取多項動作。舉例來說,如果有 2 項以上的工作在階段中失敗,則需要取消工作,或者在工作失敗至少 2 次時,重新執行工作。
工作流程的其他部分包括同時執行 2 項以上的作業,然後在下一個階段使用合併後的結果。
工作流程的另一個層面,是透過電子郵件、Slack 或 PagerDuty 通知使用者發生任何錯誤。
我也打算納入一些不簡單的使用工作流程方式,例如在下游 Kafka 主題中,針對任何缺少的資料重新啟動即時串流工作。
我會與導師合作,進一步明確界定專案範圍,然後完成相關工作。
期待接下來幾個月的驚奇體驗。
除非另有註明,否則本頁面中的內容是採用創用 CC 姓名標示 4.0 授權。詳情請參閱《Google Developers 網站政策》。Java 是 Oracle 和/或其關聯企業的註冊商標。
上次更新時間:2025-07-25 (世界標準時間)。
[null,null,["上次更新時間:2025-07-25 (世界標準時間)。"],[[["\u003cp\u003eThis Google Season of Docs project focuses on creating documentation for Apache Airflow, specifically on how to easily and effectively create new workflows.\u003c/p\u003e\n"],["\u003cp\u003eThe documentation will cover the steps involved in a workflow, including reading, preprocessing, processing, postprocessing, saving/action, and monitoring, as well as handling task failures and parallel job execution.\u003c/p\u003e\n"],["\u003cp\u003eThe project aims to provide guidance on using workflows for various scenarios, including real-time streaming jobs and restarting workflows based on missing data, and incorporating alerting mechanisms.\u003c/p\u003e\n"],["\u003cp\u003eThe project scope will be refined in collaboration with mentors throughout its three-month duration.\u003c/p\u003e\n"]]],["The project focuses on documenting the creation of new workflows for Apache Airflow. Key steps in workflows include reading, pre-processing, processing, post-processing, saving/action, and monitoring, each potentially involving multiple tasks. Workflows can handle task failures, parallel job execution, and combined result utilization. Alerting users via mail, Slack, or PagerDuty in case of errors is also part of workflows. The project will also include workflows for running real-time streaming jobs and restarting them on missing data.\n"],null,["# Apache Airflow project\n\nThis page contains the details of a technical writing project accepted for\nGoogle Season of Docs.\n\nProject summary\n---------------\n\nOpen source organization:\n: Apache Airflow\n\nTechnical writer:\n: kartik khare\n\nProject name:\n: How to create a workflow\n\nProject length:\n: Standard length (3 months)\n\nProject description\n-------------------\n\nI'll be working on creating documentation for How to create new workflows easily and effectively.\nThere are some of the steps involved in workflows which are -\n\n1. Read\n2. Pre processing\n3. Processing\n4. Post processing\n5. Save/Action\n6. Monitoring\n\nEach step can involve multiple tasks and multitude of actions can be taken after each step such as aborting the job if 2 or more tasks fail in a stage or re run a task if it fails for at least 2 times.\n\nOther part of the workflows include executing 2 or more jobs in parallel then utilising their combined result for the next stage.\n\nAnother aspect of a workflow is to alert the user in case anything goes wrong either through mail or slack or pager duty.\n\nI also plan on including some non-trivial ways in which workflows can be used such as to run realtime streaming jobs on restart them on any missing data in downstream Kafka topics.\n\nI'll be working with mentors to make the scope of the project much more refined and then complete the tasks from there on.\n\nLooking forward to amazing few months ahead."]]