Will your Data Scientists pass this test?

I am amazed these days how many people are assuming the title ‘Data Scientist’. It is the sexiest job, so everyone wants to be one. But are they real Data Scientists or wannabes?
 

Questions to Ask Your Data Scientists

 
While Eularis was building our team of Data Scientists, we found that some people applying for the role were very far from being a Data Scientist, and possessing a PhD in mathematics or statistics does not make you a Data Scientist. I am sure these highly qualified individuals are not aiming to deceive and really believe that they are working with data, so they are Data Scientists… but this is not the case.
 
A real Data Scientist must have a unique skill set that includes specific mathematical expertise, statistics and programming, as well as the ability to apply the right techniques to the right problem.
 

Given we had to create a set of skills needed for this position, we thought perhaps we should share those with our clients so they can also understand what expertise they need to be looking for in Data Scientists.

Here are some questions you can ask your prospective Data Scientists to sort out which are real and which are fake. A good Data Scientist is able to answer all of these questions accurately and quickly.
 

Questions:
    1.    Explain the difference between artificial neural networks with softmax activation function, logistical regression and Max Entropy classifier.


    2.    What software packages do you use for data visualization? Explain pros and cons of each.


    3.    How would you efficiently represent more than 3 dimensions in a chart?
    4.    What is Edward Tufte’s concept of ‘chartjunk’?


    5.    What language do you prefer for analysis and model creation (Python, R, etc.)? Explain your choice.


    6.    What are artificial neural networks and support vector machines? What kind of data is appropriate for each? Please include subtypes.


    7.    What is a principal components analysis (PCA) and what is it used for?


    8.    What is cluster analysis and how does it differ from principal component analysis?


    9.    How do you determine what methods to use for a specific set of data?


    10.    Do you think the imputation of missing data is acceptable? Explain your answer.


    11.    Why is it important in certain methods to split your dataset into training sets and test sets?


    12.    What is a recommendation engine or suggestion engine? How does it work?


    13.    Can you provide an example of how you would use experimental design to answer a question about user behavior?


    14.    What is a false positive and a false negative? Give examples of situations where a false positive is more important than a false negative, a false negative is more important than a false positive, and when these two types of errors are equally important.

 
Bonus Question:

    1.    You have data on duration of calls to your call centre. Create a plan of how you would code and analyze this data. Plausibly explain what the distribution of these durations might look like. How could you test whether your explanations are accurate?

Conclusion

Can your Data Scientists answer these questions? If not, perhaps you need to supplement your team. Take a look at the infographic for the key skills we require in our data science team.
 

For assistance on creating the perfect job interview for prospective Data Scientists for a specific role, and scoring their answers, Eularis can help.


Found this article interesting?

To learn more about how Eularis can help you find the best solutions to the challenges faced by healthcare teams, please drop us a note or email the author at abates@eularis.com.

Contact Us

Write you name and email and enquiry and we will get right back to you as soon as we can.