So You Want to Be a Data Scientist?

Target Audience:

Database pros or anyone working with a data scientist, or wants to become a data scientist


Does R seem like an alien language to you? Does data science terminology seem overly confusing? Would you like to learn more about data science but are scared of the math? Fact is, you’re probably doing “data science” today. You don’t need to know a lot of R or python to be an effective.

In this 90 minute session we’ll focus on terminology, process, some tooling, what makes the best data for predictive models, and some real uses anyone can use today.  The goal is to get you comfortable working with a data scientist so you can collaborate for better results.  Finally, we’ll put it all together and create a predictive web service using Azure Machine Learning.

Why I Want to Present This Session:

Data science is the hot skill in the market, but it carries a lot of misconceptions.  Traditional data professionals think that data scientists sit around all day tweaking model parameters.  This isn’t true. Data scientists spend a lot of time struggling with data wrangling, which is a fancy way of saying ETL.  This is where the traditional data professional can help.

Likewise, a lot of data scientists are coming out of college without a good grounding in relational data, which means they have a hard time communicating their data needs to us.  Wouldn’t it be great if we understood each other a little better?

We need a lingua franca.  Each side has a lot to learn about the other.  I work with customers daily that have these problems.  I’ve developed tricks that I think help the traditional data professional understand their data better and prepare it for “actionable intelligence” faster.

This is not a tool-based problem.  SQL R Services won’t magically fix this problem.  It’s a mindset change.

Additional Resources:

An older version of this session:

Session History

2017/05 – This session was chosen by attendees for GroupBy June, but Dave withdrew it from contention after voting finished.

The following two tabs change content below.
Dave Wentzel is a Data Solution Architect with Microsoft. He helps customers with their Digital Transformation using Azure. He brings relational data, big data, fast data, and unstructured data together to create modern data ecosystems.

Latest posts by dwentzel (see all)

Previous Post
SQL R Services: Start Working WITH Your Data Scientists
Next Post
Measuring the Overhead of the Query Store

1 Comment. Leave new

Thanks for the abstract.

The target audience needs to be boiled down. Here’s an example post on how to do it: Think about how many people in that post would include themselves as database pros, and how many people would improperly exclude themselves. I have a hunch that you want to include developers, for example, but I know a lot of developers who wouldn’t call themselves database pros. I also know a lot of senior production DBAs who would call themselves database pros, but have never written a line of application-purposed T-SQL, and don’t have the permissions to see their data. Finally, there’s a good chunk of “database pros” who don’t use SQL Server, let alone SQL Server 2016.

Once you’ve refined the target attendee, then I think the abstract is going to end up shifting. (For example, if you really are targeting everyone who calls themselves a “database pro,” then you’re going to need to do a hell of a lot of foundational education for the wide variety of attendees, and you’re not going to have time to do a demo. On the other hand, if you focus in on, say, app code developers who are comfortable writing T-SQL and have access to customer-facing data, then you might be able to cut a lot of corners.)

Finally, capitalize the P in Python. C’mon, man, I don’t even do analytics and I know that. 😉


Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.