AI News, Top 15 Python Libraries for Data Science in 2017

Top 15 Python Libraries for Data Science in 2017

  As Python has gained a lot of traction in the recent years in Data Science industry, we wanted to outline some of its most useful libraries for data scientists and engineers, based on our experience.

When starting to deal with the scientific task in Python, one inevitably comes for help to Python’s SciPy Stack, which is a collection of software specifically designed for scientific computing in Python (do not confuse with SciPy library, which is part of this stack, and the community around this stack).

However, the stack is pretty vast, there is more than a dozen of libraries in it, and we want to put a focal point on the core packages (particularly the most essential ones).

However, the library is pretty low-level, meaning that you will need to write more code to reach the advanced levels of visualizations and you will generally put more effort, than if using more high-level tools, but the overall effort is worth a shot.

The scikit-learn exposes a concise and consistent interface to the common machine learning algorithms, making it simple to bring ML into production systems.

The library combines quality code and good documentation, ease of use and high performance and is a de-facto industry standard for machine learning with Python.

Virtual Machines for data science

Efficiency and stability tweaks allow for much more precise results with even very small values, for example, computation of log(1+x) will give cognizant results for even smallest values of x.

The key feature of TensorFlow is their multi-layered nodes system that enables quick training of artificial neural networks on large datasets.

The functionality of NLTK allows a lot of operations such as text tagging, classification, and tokenizing, name entities identification, building corpus tree that reveals inter and intra-sentence dependencies, stemming, semantic reasoning.

All of the building blocks allow for building complex research systems for different tasks, for example, sentiment analytics, automatic summarization.

Gensim implements algorithms such as hierarchical Dirichlet processes (HDP), latent semantic analysis (LSA) and latent Dirichlet allocation (LDA), as well as tf-idf, random projections, word2vec and document2vec facilitate examination of texts for recurring patterns of words in the set of documents (often referred as a corpus).

It was originally designed strictly for scraping, as its name indicate, but it has evolved in the full-fledged framework with the ability to gather data from APIs and act as general-purpose crawlers.

The library follows famous Don’t Repeat Yourself in the interface design - it prompts its users to write the general, universal code that is going to be reusable, thus making building and scaling large crawlers.

As you have probably guessed from the name, statsmodels is a library for Python that enables its users to conduct data exploration via the use of various methods of estimation of statistical models and performing statistical assertions and analysis.

Among many useful features are descriptive and result in statistics via the use of linear regression models, generalized linear models, discrete choice models, robust linear models, time series analysis models, various estimators.

The library also provides extensive plotting functions that are designed specifically for the use in statistical analysis and tweaked for good performance with big data sets of statistical data.

Python: Top 5 Data Science Libraries

Top 5 Python data science analysis modules for developers.

Iris & Cartopy: Python packages for Atmospheric and Oceanographic science; SciPy 2013 Presentation

Iris & Cartopy: Open source Python packages for Atmospheric and Oceanographic science Authors: Elson, Philip, UK Met Office; Track: Meteorology, ...

Machine Learning for Time Series Data in Python | SciPy 2016 | Brett Naul

The analysis of time series data is a fundamental part of many scientific disciplines, but there are few resources meant to help domain scientists to easily explore ...

2015-12-14 "LMFIT: A Python tool for model fitting", by Alireza Hojjati

How to download and install Python Packages and Modules with Pip

This tutorial covers how to download and install packages using pip. Pip comes with newer versions of Python, and makes installing packages a breeze.

Dask Parallel and Distributed Computing | SciPy 2016 | Matthew Rocklin

Dask is a pure Python library for parallel and distributed computing. Last year Dask parallelized NumPy and Pandas computations on multi-core workstations.

Thomas Wiecki - Probablistic Programming Data Science with PyMC3

PyData London 2016 Probabilistic programming is a new paradigm that greatly increases the number of people who can successfully build statistical models ...

Arduino and Python LESSON 2: Installing the Software and Libraries

This tutorial shows you step-by-step instructions on how to download the free Python software and Libraries. These programs are the ones that will allow you to ...

Thomas Reineking - Plumbing in Python: Pipelines for Data Science Applications

PyData Berlin 2016 Bringing data science models from development to production can be a daunting task. To reduce the overhead in this process and to ...

Geospatial Data with Open Source Tools in Python | SciPy 2015 Tutorial | Kelsey Jordahl