Data Engineering Digest, January 2025

Jan 31, 2025

Hello Data Engineers,

Welcome back to another edition of the data engineering digest - a monthly newsletter containing updates, inspiration, and insights from the community.

Here are a few things that happened this month in the community:

What’s considered DE “overkill” in small companies?
The top industries where DE isn’t just a cost center.
Which DE skills are the most in demand in 2025?
DE acquisitions that you may have missed.
Nielsen’s story on processing more data while reducing costs.
DB Engines announces the database of the year 2024.

Community Discussions

Here are the top posts you may have missed:

1. What DE practices do you consider overkill for a small company?

“Several months earlier, my small team thought that we needed an orchestrator like Prefect, a cloud like Neon, and dbt. But now I think developing and deploying a data pipeline inside Snowflake alone is more than enough to move sales and marketing data into it.”

Most organizations don’t deal with large volumes of data. Despite this, we are often marketed tools and frameworks that will quickly process large volumes of data and “scale” for future growth. How can data engineers cut through the hype and choose the actual services and products they need?

💡 Key Insight

At small companies and startups the name of the game is moving fast and delivering value with limited resources. Many best practices like CI/CD, testing, and infrastructure as code deliver zero business value and thus are not prioritized. Until data is ready to be analyzed it might as well be worthless to the organization. With that in mind, here are a few highlights that data engineers shared:

Things to avoid:

Over-fixating on tools and architecture.
Build a real-time pipeline only for it to power a report that is looked at once a day.
Using distributed processing engines when your data can fit on a single machine.
Build custom tools that will take up all of your time and be difficult to maintain/probably be trashed when you leave.

Things to do:

Focusing on delivering business value.
Keep your stack simple.
- Consider your team’s skillset and how easy it would be to hire to maintain it.
Buy over build in most situations.

2. Which industries view data engineering as a revenue driver rather than just a compliance expense?

“While working for an IT consultancy firm, I got opportunities to work with an investment bank and an insurance company where data engineering was often viewed as an expense and a compliance burden, with minimal direct impact on business growth.”

Data engineering is commonly seen as a cost center for most businesses. It’s not necessarily a bad thing, but when data is a revenue driver or core part of the business strategy it can offer the opportunity for much higher impact and more resources for your team.

💡 Key Insight

The biggest industries mentioned were big tech (e.g., Google, Meta, Amazon) and finance (banks, fintech, investment banks, high-frequency traders). In these industries, data is core to the product or service, and it directly generates revenue, which can lead to higher compensation for data engineers. While technically any company can turn data into a revenue driver, it must be core to the business strategy. This is often driven by leadership and organizational culture, which are not easy to change.

3. What skills are most in demand in 2025?

💡 Key Insight

For data engineering, the core skills are still the same:

SQL
Python
Cloud (pick any, they are all the same)
Data modeling

One skill that may be of increasing value is a basic understanding of Retrieval-Augmented Generation (RAG) and LLM architectures. RAG will be an increasingly adopted technique to allow businesses to use LLMs while reducing hallucinations and produce better answers from their own data.

Industry Pulse

Here are some industry topics and trends from this month:

1. January Acquisitions

Jan 14th:

dbt Labs acquires SDF Labs: expected to improve performance, developer experience, and lineage. SDF website.
Qlik acquires Upsolver: expands their real-time data integration capabilities for open lakehouses. Upsolver website.

Jan 15th:

Tobiko acquires Quary: expected to improve developer experience and improve compile times. It’s unclear if the BI features of Quary will be integrated or cannibalized.

2. How Nielsen uses serverless concepts on Amazon EKS for big data processing with Spark workloads

Nielsen Marketing Cloud, a leading ad tech company, processes in one of their pipelines 25 TB of data and 30 billion events daily. This post shares Nielsen’s journey to build a robust and scalable architecture while enjoying linear scaling. It starts by examining their initial challenges and the root causes behind these issues. Then, it explores Nielsen’s solution: running Spark on Amazon Elastic Kubernetes Service (Amazon EKS) while adopting serverless concepts.