# AI News, Open Machine Learning Course. Topic 2. Visual Data Analysis with Python

- On Sunday, September 30, 2018
## Open Machine Learning Course. Topic 2. Visual Data Analysis with Python

It is also possible to plot a distribution of observations with seaborn's distplot().

Its components are a box (obviously, this is why it is called a box plot), the so-called whiskers, and a number of individual points (outliers).

They represent the entire scatter of data points, specifically the points that fall within the interval (Q1−1.5⋅IQR, Q3+1.5⋅IQR), where IQR=Q3−Q1 is the interquartile range.

The difference between the box and violin plots is that the former illustrates certain statistics concerning individual examples in a dataset while the violin plot concentrates more on the smoothed distribution as a whole.

describe() In addition to graphical tools, in order to get the exact numerical statistics of the distribution, we can use the method describe() of a DataFrame: Its output is mostly self-explanatory.

As we will see in the following articles, this fact may imply some restrictions on measuring the classification performance, and, in the future, we may want to additionaly penalize our model errors in predicting the minority “Churn” class.

There is another function in seaborn that is somewhat confusingly called barplot() and is mostly used for representation of some basic statistics of a numerical variable grouped by a categorical feature.

