Data Engineering Digest, December 2023 & Annual Recap
Hello Data Engineers,
Welcome back to another edition of the data engineering digest - a monthly newsletter containing updates, inspiration, and insights from the community. This edition contains a special end of year recap of highlights from the community.
Data Engineering 2023 Recap
Community Growth
It’s been incredible to see the growth of the community but this year has been even greater than anticipated. For everyone who joined this year, thank you for being here with us ❤️.
22.1 million views (+7.6 million from the previous year)
206 thousand monthly uniques on average (+87.5 thousand from the previous year)
71.9 thousand new members (+41 thousand from the previous year)
Top 10 Articles This Year
We filtered through all of the memes to bring you the top 10 posts in the community. If you haven’t already, bookmark these!
Step-by-step tutorial: Building a Kimball dimensional model with dbt
Tencent Data Engineer: Why We Went from ClickHouse to Apache Doris?
Secret To Optimizing SQL Queries - Understand The SQL Execution Order
Acknowledgements and Thank Yous
It’s all about the people and we couldn’t have done it without you.
Open Source Contributors
Thank you to everyone who contributed to our open source community projects like the wiki and the data engineering salary app. Contributing to the greater body of knowledge and helping others is one of the core values we love the most about this community.
Sponsors
Thank you to our generous sponsors for allowing us to help cover organizing costs for data engineering meetups around the world! Since the Covid pandemic hit we saw many meetup groups fizzle out but this year we’ve been able to help revive several and hope to see even more in-person community run meetups next year.
Meetup Organizers
Thank you to all of the meetup organizers for all of the work that goes into planning an engaging event and bringing the community together. While Zoom might be great for some situations, nothing can beat meeting and engaging with each other in person.
Jospeh Machado
Monica Miller
Jonathan Porter
Rigerta Demiri
Georg Heiler
Ed Pearson
Cameron Cyr
Dustin Dorsey
Online Community Builders
Thank you to the folks who work to foster an inclusive online space for data engineers around the world. Your work has helped create a unique place where hundreds of thousands of people come to discuss data engineering and keep up with the latest tools and technologies.
Felipe Hoffa
Georg Heiler
Jonathan Porter
Andrew Exley
And thank you to everyone else who played a role in the community's success by welcoming newcomers, sharing your knowledge, and pushing the field forward.
Looking towards the future, what’s the #1 thing you’d like to see the community accomplish together in 2024?
Comment on the post or reply to this email 👇
Now, back to our regularly scheduled newsletter!
Here are a few things that happened this month in the community:
Should data engineers make dashboards?
A career ladder after Senior Data Engineer.
What was data engineering like a decade ago?
Trends to watch out for in 2024.
How to make working with SQL…enjoyable?
And the best SQL environments to work in.
Community Discussions
Here are the top posts you may have missed:
1. How much dashboarding/viz do you do?
It’s unclear how often data engineers are involved in creating dashboards and visualizations but should it really be part of the job at all? The author of the question doesn't think so, but they find themselves building dashboards regularly at their new job.
While making basic data visualizations may sound straightforward, they can also get complicated quickly and it’s a different skill set than data engineers are typically prepared for. So, is it common or should you be strictly focusing on other data engineering responsibilities?
💡 Key Insight
The community agrees that it should not be part of a data engineer’s job responsibilities and the majority of data engineers don’t build dashboards regularly. That being said, several members point out that it’s important to still have a basic understanding of how it works so that you can better collaborate cross-functionally with those who are building visualizations (typically data analysts and data scientists).
Also, while it might not be a core duty, practical considerations like team size and resource availability can influence your involvement in dashboard creation. In smaller teams and smaller companies, individuals often wear multiple hats and are expected to pitch in wherever they can.
Our advice? Embrace the opportunity to diversify your skills. Acquiring visualization skills can enhance your ability to communicate insights effectively and great communicators will grow faster in their careers. At the same time, advocate for clear role definitions within the team to ensure that responsibilities align with expertise. If you find yourself spending the majority of your time on visualizations, it might be a good idea to re-evaluate your current role and ensure that the skills you’re developing will be advantageous for your career.
2. What role did you go into after Senior Data Engineer?
“Curious to hear what your current role is now and how you made the decision that you were ready for a change.”
It’s important for engineers to think ahead about their career, no matter where they are in their journey. Whether you're contemplating managing people or sticking to technical work, it's smart to plan carefully and make conscious decisions about your future.
💡 Key Insight
While data engineering is a constantly evolving field, there are already established career tracks you can follow. The traditional software engineering career track has many parallels and there are several options you can pursue after senior DE.
Many common career moves that data engineers made were promotions to the standard senior or lead data engineer positions. Similar to software engineering career progressions, typically the path is from jr data engineer -> data engineer -> sr data engineer -> lead/staff data engineer and then principal. Other common paths are moving into roles like data architect (still an IC role) or tech lead which is typically a precursor to full-time management.
3. What was data engineering like circa 2011-2013?
This member wanted to know what data engineering was really like a decade ago and what tools/technologies that were used back then are still being used today. Exploring the journeys of the early data engineers who shaped the field gives us a good grasp of the tough issues they had to tackle. Plus, it can offer us insights into the possible directions the profession might take in the future.
💡 Key Insight
Here’s the TL;DR version of the history of data engineering:
The start of using software for data processing started around the 1970s/1980s under the term information engineering methodology. Over the years those techniques started to focus more on using these information systems to inform strategic business planning under the term business intelligence which is a term still used today. In the early 2010s is when large data-driven tech companies (FAANG) started dealing with massive increases in data volumes which forced them to move away from traditional ETL techniques and develop a version of software engineering focused on data which we now call data engineering.
As for tooling, many early solutions were disrupted but at the same time many are still around due to the stickiness of enterprise software and on-prem setups. For example, the Microsoft SQL Server stack was very popular among enterprises and now Azure makes it easier to transition those workloads to the cloud while staying within the Microsoft ecosystem (for example SSIS to Azure Data Factory.)
Popular tools mentioned:
Microsoft SQL Server Stack (SSIS, SSMS, SSAS, SSRS)
Oracle stack (data integrator, oracle dac, data sync, warehouse builder)
Python, Stored Procedures
Informatica
Windows tools (Windows Scheduler, batch scripts)
Pentaho
Hadoop & Hive
Data engineering is constantly evolving for a reason - data engineering still faces challenges to transform data into a valuable business asset.
4. Data trends for 2024
“What are the new trends/buzzwords that the data industry will lean into this year?”
Even though you may roll your eyes at the latest trendy phrase or idea flooding social media, it's still important for your career to stay in the loop with what's going on. These days, it feels like there's a new game-changing tool every week, but time is limited. So, which trends are the community keeping an eye on?
💡 Key Insight
Unsurprisingly, generative ai is the most popular trend at the moment. Even if you’re not personally using generative ai for data engineering work, many companies are trying to incorporate it into their products/services which will lean on data engineers for support.
Here are a few other trends the community is talking about:
Zero ETL/HTAP databases - Integrations that eliminate/minimize the need to build data pipelines between OLTP and OLAP storage. Related to this are HTAP databases that can run both OLTP and OLAP workloads in one place. Examples include:
Amazon Aurora zero-ETL w/Redshift
Snowflake Unistore
SinglestoreDB
DataOps - Processes and technology that help automate and improve data quality and speed of development.
Streaming databases - Databases that allow you to use SQL for stream processing to provide real-time analytics typically with high concurrent queries. Examples include:
ksqlDB
Materialize
Rockset
StarRocks
Clickhouse
Firebolt
Data activation/reverse ETL - Taking your enriched data from your data warehouse and sending it back into your operational systems such as marketing, finance, and analytics software. Examples include:
Hightouch
Segment
Census
Which trends do you think will make it in 2024?
5. How do you make working with SQL enjoyable (or less tedious)
This question comes from a Machine Learning engineer who feels that SQL can be complex and unreadable compared to a traditional programming language.
Readability and understandability is something that every data engineer should prioritize when writing SQL because it will be read many more times than it is updated. Understanding what makes SQL unreadable or incomprehensible may help us write better SQL in the future.
💡 Key Insight
One of the challenges of writing SQL is that there is no standard for SQL projects. In other words, there is no “correct” way to write and format SQL statements which still causes plenty of debates. You can also often find quirks in legacy written SQL that stemmed from previous database limitations such as naming conventions like t1, t2, t3. Another large factor that wasn’t mentioned is how well database tables are designed in the first place.
That being said, data engineers now do have some tools to make life easier. There are SQL formatters and linters like SQLFluff that will help automatically format it into something more readable. Data engineers also suggested using common table expressions (CTEs) as a common strategy to break down complex queries into manageable chunks.
6. What is the best SQL environment you have ever worked in?
If you work with SQL, you might find many tools and IDEs lacking in features.
Since SQL isn’t going away anytime soon, let’s all make our lives easier by using tools that make working with SQL more enjoyable.
💡 Key Insight
DataGrip by Jetbrains was the most popular choice with some data engineers even paying out of pocket for the license. The second most popular choice was DBeaver Community edition which is a free open-source database tool. In 3rd place were data engineers who already use an IDE like VSCode and choose instead to install an extension that allows them to query their data from there instead of having a separate tool.
DataGrip (paid)
DBeaver (free)
VSCode with extensions (free)
🎁 Bonus:
📅 Upcoming Events
1/27: Data Day Texas 2024
Share an event with the community here or view the full calendar
Opportunities to get involved
Share your story on how you got started in DE
Want to get involved in the community? Sign up here.
What did you think of today’s newsletter?
Your feedback helps us deliver the best newsletter possible.
If you are reading this and are not subscribed, subscribe here.
Want more Data Engineering? Join our community.
Want to contribute? Learn how you can get involved.
Stay tuned next month for more updates, and thanks for being a part of the Data Engineering community.