We’ve been recently looking at how to introduce data science concepts to the wider team, including business analysts, management and engineers.

When trying to sell a concept like this, especially to management teams or senior stakeholders, a term that means nothing and is difficult to explain will simply just be ignored.

These answers range from using mathematical models to solve problems with data, investigating data to find insights, using machine learning to solve complex problems… and the list goes on. What

Funnily enough - we don’t think it is, it’s a rebranding using some techniques that are exactly the same and others that are slightly matured due to the introduction of machine/deep learning techniques and higher computational power/Big Data/.

You collect some data, explore it and run some “experiments” on a small sample, take your findings to a larger sample, learn something new and then represent your findings in a way that provides insight for the end user.

Someone with an analytic mind that can think outside the box, apply mathematics and statistics and convey their findings in a user friendly way.

Now everyone is on board with what Data Science is and understands why it is popular - it’s time to share the core concepts so people can start to explore the domain further.

If validation is poor on the collection side, columns you expect to contain numeric values may contain unexpected values, you should never presume the data is correct unless it’s being collected from a structured database that validates its input.

If you are dealing with a huge data set, it’s probably worth sampling the data into a sensible size to improve the speed of the exploritary analytics.

You supply the algorithm some data and with the data you provide the actual answer - for example, give the algorithm a bunch of sentences and label them as either positive or negative.

There are a number of analytics techniques including simple mathematics: max, mean, average, etc., standard sql statements to join and/or perform calculations on the data, building functions programatically using languages such as R or Scala or using analytics tools such as SAS, Business Objects, Pentaho or many more.

