Data Engineering Digest, September 2025
Hello Data Engineers,
Welcome back to another edition of the data engineering digest - a monthly newsletter containing updates, inspiration, and insights from the community.
Here are a few things that happened this month in the community:
Rumors swirl: is Fivetran about to acquire dbt?
Microsoft Fabric under fire - how it stacks up against open-source alternatives
What makes a “good” data warehouse? The community weighs in
Fivetran doubles down with another acquisition (SQLMesh)
Uber open-sources Starlark Worker for Cadence workflows
Cloudflare launches a full data platform: ingest, store, and query at scale
Community Discussions
Here are the top posts you may have missed:
1. Fivetran to buy dbt? Spill the Tea
The community is buzzing right now about a rumor that Fivetran may be in talks to acquire dbt. A report by The Information on Saturday said the talks underscore Fivetran’s ongoing strategic effort to revamp its offerings and establish itself as a more important player at a time when data is becoming increasingly vital for enterprises’ artificial intelligence initiatives.
💡 Key Insight
While it’s unclear whether these talks are actually underway, Fivetran did acquire dbt’s largest competitor this month (more below). Meanwhile, dbt Labs recently announced a move away from dbt Core toward their new Fusion Engine - a blend of source-available, proprietary, and open-source code.
Overall the community is split on whether or not this makes sense for Fivetran and some are already looking for forks of dbt core. Several folks speculate that dbt may be being pushed to exit due to being VC-funded and thus need to get acquired or IPO to make returns for investors. We will keep an eye on this and see if there are any updates next month - stay tuned!
2. Microsoft Fabric vs. Open Source Alternatives for a Data Platform
One community member is building a new data platform using Microsoft Fabric and while it seems promising they realized many features are still in beta and aren’t fully reliable yet. This led them to the question whether or not to fully commit to Fabric or consider switching parts of or their entire stack to open source.
Microsoft Fabric was announced over two years ago and marketed itself as an end-to-end unified analytics platform that brings together all of the tools and technologies you need. It’s been enough time for the community to form an opinion on it and share the tradeoffs.
💡 Key Insight
Fabric hasn’t gained much traction, which isn’t surprising given data engineers’ general dislike of vendor lock-in. The main complaints: it’s expensive, many features remain in preview or unreliable, and the working parts don’t stand out compared to alternatives. That being said, very few recommend going fully self-hosted with open-source because it can be painful and also not likely to solve all of your problems either. There are so many open-source alternatives these days it’s quite easy to use managed services where you can and self-host when you need to.
3. Have you ever built a good Data Warehouse?
A good data warehouse being one that:
Doesn’t break every day
Has meaningful data quality tests
Code that is well written (efficient) from DB perspective
Is well documented
Brings real business value
💡 Key Insight
Many data initiatives fail because delivering business value comes as an afterthought. Engineers often focus on outputs - tables, jobs, pipelines - rather than outcomes that drive productivity or revenue. The first comment sums it up nicely:
“There’s no such thing as a perfect data warehouse. I thought the same way as you did and regret for thinking like that cause it’s becoming too rigid. What matters is whether it delivers business value.
It’s like the meme where the CEO wants faster reports, a junior suggests Spark, but a senior just reschedules the job earlier. The real win is productivity and business outcomes, not tech for tech’s sake.“
Industry Pulse
Here are some industry topics and trends from this month:
1. Fivetran acquires Tobiko Data (SQLMesh)
The summer of data acquisitions continues: SQLMesh became the latest tool to be acquired, marking Fivetran’s second deal this year after buying Census in May. “We built Tobiko Data to make data transformation more collaborative, transparent, and predictable,” said Tyson Mao, co-founder of Tobiko Data. “By joining forces with Fivetran, we can scale these capabilities globally and help customers turn transformation into a strength.”
2. Open-Sourcing Starlark Worker: Define Cadence Workflows with Starlark
Uber strives to build platforms that enable their engineering teams to move faster while maintaining reliability at scale. This month they announced the open-source release of Starlark Worker, a powerful integration between Cadence workflow orchestration and the Starlark™ scripting language that simplifies how teams define and run workflows.
3. Announcing the Cloudflare Data Platform: ingest, store, and query your data directly on Cloudflare
This month, Cloudflare launched three new products that make up the Cloudflare Data Platform, a complete solution for ingesting, storing, and querying analytical data tables.
Cloudflare Pipelines receives events sent via Workers or HTTP, transforms them with SQL, and ingests them into Iceberg or as files on R2.
R2 Data Catalog manages the Iceberg metadata and now performs ongoing maintenance, including compaction, to improve query performance.
R2 SQL is our in-house distributed SQL engine, designed to perform petabyte-scale queries over your data in R2.
🎁 Bonus:
🗳️ Metabase Community Data Stack Report 2025: What’s working (and what’s not)
🥇 My first DE project: Kafka, Airflow, ClickHouse, Spark, and more!
⚖️ Building RAG Systems at Enterprise Scale: Our Lessons and Challenges
📅 Upcoming Events
10/2-10/3: DataConnect
10/7-10/9: Airflow Summit
10/13-10/16: Flink Forward 2025
Share an event with the community here or view the full calendar
Opportunities to get involved:
Want to get involved in the community? Sign up here.
What did you think of today’s newsletter?
Your feedback helps us deliver the best newsletter possible.
If you are reading this and are not subscribed, subscribe here.
Want more Data Engineering? Join our community.
Want to contribute? Learn how you can get involved.
Stay tuned next month for more updates, and thanks for being a part of the Data Engineering community.




++ Good Post. Also, start here : 500+ LLM, AI Agents, RAG, ML System Design Case Studies, 300+ Implemented Projects, Research papers in detail
https://open.substack.com/pub/naina0405/p/most-important-llm-system-design-77e?r=14q3sp&utm_campaign=post&utm_medium=web&showWelcomeOnShare=false