Six Questions for Jesse C. Daniel, author of Data Science with Python and Dask

Jesse Daniel is a software developer (Python, Scala, JavaScript, C#) who leads a team of data scientists at a media technology company. He has taught Python for Data Science at the University of Denver.

 


You can get his book for 39% off by entering intdaniel into the discount code box at checkout at manning.com.


Was Dask created expressly for use in data science? What about it makes it well-suited for working with data?
 

No, Dask was created as a general purpose distributed computing framework, and that’s what makes it so powerful and flexible! Any application that can take advantage of parallel processing can generally be implemented using Dask’s distributed APIs, and can scale seamlessly from a laptop to a large cluster without requiring any sort of refactoring. Analytics workloads tend to chew up a lot of computing resources, so I think it was natural for folks in the data science community to turn to solutions like Dask to help speed up processing massive datasets. To make things even easier for these sorts of users, the developers of Dask implemented special APIs that mimic other commonly used libraries like NumPy and Pandas to allow them to interact with Dask in familiar ways.

 

What are some other fields where Dask is being used?
 

Great question. Dask is also being used in many scientific fields at academic institutions and research labs, where they are relying on enormous models and simulations. These users are in the fields of atmospheric research, geophysics, chemical engineering, and so forth. I believe the original motivation for developing Dask was to support folks in such areas of computational science, but, over time, data scientists began discovering that Dask was also very useful for their purposes too (typically after getting frustrated with Spark).

 

Are most data scientists using Python?
 

Hard to say. The exact developer demographics probably depend on the industry, market, and region of the world you’re working in; other contenders include R, Julia, and MATLAB. However, there’s no argument that Python is very popular with data scientists, mostly thanks to–in my opinion–the simplicity of the language, powerful library support, and the wealth of great, practical books on learning data science with Python.

 

What are DAGs and how are they important?
 

DAG stands for directed acyclic graph, and it’s a key concept to understand how Dask schedules distributed tasks. You can essentially think of it as a flow chart (with a couple special restrictions) that dictates the steps, dependencies, and order of operations required to complete a multi-step process. It serves as the “blueprint” for how to get a job done, and Dask generates a DAG for every compute operation before sending out marching orders to individual worker processes.

 

As a team manager, you practice “servant leadership.” Can you tell us what that is and what it looks like in practice?
 

Effectively, it’s the belief that a leader should use their position of power to be an advocate for the team that they are leading rather than using that power for self-serving purposes. In practice, this means empowering people on my team to make decisions, giving them a platform to provide frank and honest feedback, mentoring and developing their skills, and putting their well-being before all else. I think it’s extremely important to view people who work for you as stakeholders in decisions rather than simply saying “I’m in charge, so we’re going to do everything my way.”

Why do you have a rather mouth-watering recipe for a pasta dish, bucatini all’Amatriciana, in the first chapter of your book? I take it you like to cook?
 

Yes, I love to cook! In fact, bucatini all’Amatriciana is my favorite pasta dish–which, for a foodie like me, is an incredibly difficult decision! I used a recipe in chapter 1 because I wanted an accessible way to describe what DAGs are and how they work. Since DAGs are used to describe multi-step processes, it seemed like a perfect fit to use a recipe as an example. Plus, cooking is something that I felt is universal enough that most people won’t have a hard time understanding why, for example, you have to boil dried pasta before you can toss it in sauce and eat it.

Other than pasta, I enjoy cooking dishes from all around the world–posole rojo from Mexico, mapo tofu from Sichuan, sauerbraten from Germany–you name it. Currently, my wife and I live in south Louisiana, so I’ve been exploring the world of Creole and Cajun cuisine local to us. One thing you can rely on is there’s always something good simmering on my stove!