AI News, Engineers Shouldn’t Write ETL: A Guide to Building a High Functioning Data Science Department

Engineers Shouldn’t Write ETL: A Guide to Building a High Functioning Data Science Department

“What is the relationship like between your team and the data scientists?” This is, without a doubt, the question I’m most frequently asked when conducting interviews for data platform engineers.

It’s a fine question – one that, given the state of engineering jobs in the data space, is essential to ask as part of doing due diligence in evaluating new opportunities.

If you read the recruiting propaganda of data science and algorithm development departments in the valley, you might be convinced that the relationship between data scientists and engineers is highly collaborative, organic, and creative.

Most companies structure their data science departments into 3 groups: Data scientists are often frustrated that engineers are slow to put their ideas into production and that work cycles, road maps, and motivations are not aligned.

Data engineers are often frustrated that data scientists produce inefficient and poorly written code, have little consideration for the maintenance cost of productionizing ideas, demand unrealistic features that skew implementation effort for little gain… The list goes on, but you get the point.

They are kept at arm’s length from the scientists and engineers, which means they never gain a solid context into how the infrastructure is being used, or the business and technical problems that it needs to be used to solve.

blame two things, offered here in the form of a couple observations: Data processing tools and technologies have evolved massively over the last five years.

Unless you need to process over many petabytes of data, or you’re ingesting hundreds of billions of events a day, most technologies have evolved to a point where they can trivially scale to your needs.

Unless you need to push the boundaries of what these technologies are capable of, you probably don’t need a highly specialized team of dedicated engineers to build solutions on top of them.

These data scientists occasionally manage to create some pretty cool and effective solutions, but by and large they focus on performing slightly higher level Report Developer-ing back to the business (which largely ignores their advice).

Thus was born the traditional, modern day data science department: data scientists (Report developers aka “thinkers”), data engineers (ETL engineers aka “doers”), and infrastructure engineers (DBAs aka “plumbers”).

The fundamental flaw that prevents the Thinker and Doer model from living up to its recruiting hype is the assumption that there exists an army of soulless non-mediocre Doer engineers who eagerly implement the ideas and vision of data scientists.

In order to attract talented engineers into a role like that, you need some really big scaling problems to serve as a distraction to the soulless, subservient role you have hired them into.

The end result is a team of data scientists who are empowered to be little more than report developers because they lack the support of a solid, innovative data platform.

What follows is a blueprint for building a data science team that can pivot and react quickly, so as to lead and innovate through the production of thought-leadership, APIs, and code, rather than react to changes and throw together some PowerPoint presentations in a desperate attempt to redirect gut feelings and intuitions.

However, it is important to recognize that engineers and data scientists are impassioned by very different tasks: Data Scientists: Data scientists love working on problems that are vertically aligned with the business and make a big impact on the success of projects/organization through their efforts.

They require a good overall understanding of how the business operates, but the abstracted nature of solutions mean they are light on business logic and do not require a heavy partnership with or deep understanding of verticals within the business.

common fear of engineers in the data space is that, regardless of the job description or recruiting hype you produce, you are secretly searching for an ETL engineer.

Rather than a report, dashboard, or PowerPoint presentation, it is some sort of algorithm or API that is integrated into the engineering stack – something that fundamentally changes the operation of the business.

To sum it up, engineers must deploy platforms, services, abstractions, and frameworks that allow the data scientists to conceive of, develop, and deploy their ideas with autonomy (such as a tool, framework, or service used to build, schedule, and execute ETL).

Rather, the engineering challenge becomes one of building self-service components such that the data scientists can iterate autonomously on the business logic and algorithms that deliver their ideas to the business.

You need very sharp platform engineers who can make intuitive decisions about what services, frameworks, and capabilities need to be in place before they are desperately needed.

consequence of empowering data scientists to take on such a breadth of the stack is that they will be unlikely to produce code and solutions that are as technically efficient as an engineer’s.

For example, they can decide to sample data in certain places, use approximate methods where they make sense, and make decisions to nix or punt features that may produce only very marginal business impacts but come with extremely high development or support costs.

In aggregate, it is hoped that the benefits of autonomy and the innovation that can be produced as a result will outweigh the technical inefficiencies of the lack of technical specialization in allowing data scientists to own their full stack.

It’s my sincere hope that in sharing what we have done that it will encourage others with a non-traditionally structured department to do the same, inspire leaders of data science departments that are in a formative stage to think outside the box and find the courage to challenge tradition, and inform engineers and data scientists who are frustrated by traditional roles that there are different types of environments available to operate in.

Data Scientist: Reality vs Expectations ($100k+ Starting Salary 2018)

Skillshare might not like this. You can sign up for a 2 month trial for Skillshare, complete the data science course and then cancel your membership before being ...

Katharine Jarmul | Keynote: Ethical Machine Learning: Creating Fair Models in an Unjust World

PyData Amsterdam 2017 The increased use of machine learning (both simple and advanced) to make decisions that impact business and at times, our culture ...

Careers in Data Science & Engineering and Great Lakes PGP-DSE

Learn More: Great Lakes Post Graduate Program in Data Science and Engineering is for early career professionals looking to expedite ..

Knowledge Management and Big Data in Business | HKPolyUx on edX | Course About Video

Take this course for free on edX: ↓ More info below. ↓ Follow on ..

The 4 Sentence Cover Letter That Gets You The Job Interview

Join career expert and award-winning author Andrew LaCivita as he teaches you exactly how to write the 4 sentence cover letter that gets you the job interview!

The single biggest reason why startups succeed | Bill Gross

Bill Gross has founded a lot of startups, and incubated many others — and he got curious about why some succeeded and others failed. So he gathered data ...

Keira Zhou - Batch and Streaming Processing in the World of Data Engineering and Data Science

Description Streaming or batch is an ongoing debate with the large-scale adoption of “Big Data”. In this talk, we discuss the pros & cons of batch vs. streaming ...

Using big data, the cloud, and AI to enable intelligence at scale

Wee Hyong Tok and Danielle Dean explain how the global and trusted Microsoft Azure platform can enable you to do intelligence at scale, describing real-life ...

The beauty of data visualization - David McCandless

View full lesson: David McCandless turns complex data sets, like worldwide military ..

The Expert (Short Comedy Sketch)

Subscribe for more short comedy sketches & films: Funny business meeting illustrating how hard it is for an engineer to fit into the corporate ..