AI News, The Startup Phenomena:  Making the next Netflix of Machine Learning, Uber of Data Modeling, or Chipotle of Data?

The Startup Phenomena:  Making the next Netflix of Machine Learning, Uber of Data Modeling, or Chipotle of Data?

They took NYC Data Science Academy 12 week full time Data Science Bootcamp program between Sept 23 to Dec 18, 2015. The post was based on their second class project(due at 4th week of the program).

Our project is centralized around the development of an open source workbench that is focused on providing data scientists with automated tools for exploratory analysis and model selection.

Before getting into the low-level details, let's take a step back and think about the trending term 'Data Science.”   It seems that everywhere you turn these days there’s someone starting a “Data Science” company.

There’s been a lot of attention on data science platforms and workbenches that attempt to improve the data scientist’s workflow or allow non data-scientists to perform data science through an immersive user interface.

CPU Prices: [caption id='attachment_8306' align='aligncenter' width='622'] The collapse in prices of Hard Disk Space, Memory, and Network Capacity.[/caption]   Hard Disk Space:       Memory Prices:   [caption id='attachment_8309' align='aligncenter' width='476'] Negative trend between the year and memory price.[/caption]   Network Prices:     It’s not just CPU’s that are dropping in price, but *every part of the PC*.

    Assuming that you live in the NorthEastern corridor, California, or the midwest, and assuming that your computer draws at least 1 kWh, renting server space from amazon is less expensive than just paying for computer electricity in your home state.

Old Software Pricing: SAS enterprise miner ($140,000 for the first year) IBM SPSS Statistics ($51,000 per user) Alteryx Server ($58,500 per user) H2o.ai, Dataiku DSS, etc ($10,000 per user per year) You get the idea.

 These companies have a business model of taking freely available open source tools, building a GUI on top of the system, and charging tens of thousands of dollars per year for support.

New Software Pricing: Scikit-Learn - Python libraries for Machine Learning (Free) Weka - Java libraries for Machine Learning (Free) TensorFlow (Google) Open Source Machine Learning (Free) FAIR (Facebook) Open Source Machine Learning (Free) R, Python, Spark, Hadoop, Caret (Free) The Machine Learning servers and tools that used to be exclusively the domain of Hedge Funds, Fortune 1000 companies, and large drug manufacturers, are now accessible to anyone.

The data science workbench tool that we built over a week is meant to illustrate how easy it is to duplicate the features of the more expensive institutional packages, using completely free software.

  Our project is focused on the full stack implementation of R in order to not only explore its computational nuances in a data science setting, but also explore how R’s UI capabilities can contribute to a positive workflow and user experience.

Model Class Package Caret Function Syntax lda MASS predict(obj) (no options needed) glm stats predict(obj, type = 'response') gbm gbm predict(obj, type = 'response', n.trees) made mda predict(obj, type = 'posterior') rpart rpart predict(obj, type = 'prob') Weka RWeka predict(obj, type = 'probability') LogitBoost caTools predict(obj, type = 'raw', nIter)     The Open Source Alternative - Shiny Shiny is not the only new tool for computer visualizations, but is a fully functional web app development package that can streamline R code directly into an interactive frame without the need to know know javascript or html.

 Using a combination of Shiny, caret, and other great open source tools, we made a fairly workable platform that can perform basic data analysis, preprocessing, modeling, and validation.

 Keep in mind that our main intention was to create a functional prototype to showcase a small fraction of creative possibilities available to us through the open source community.

 Essentially, reactive functions in shiny means that you are creating smaller “pseudo-functions” that automatically receives user input when interacting with features such as a check-box or slider.

 For the modeling sub-feature, we made four algorithms available for the user to choose and tune: KNN, logit boost, gradient boosting method, and neural networks.

Data Science & Machine Learning -Creating a Shiny App- DIY- 43 -of-50

Data Science & Machine Learning -Creating a Shiny App- DIY- 43 -of-50 Do it yourself Tutorial by Bharati DW Consultancy cell: +1-562-646-6746 (Cell ...

Using Shiny to Demo Your Machine Learning Model

GitHub Repo: Blog:

Machine Learning Real-time - Stock Prediction Application using Shiny & R

Real-time Scenarios - Stock Prediction Application Data Science & Machine Learning Do it yourself Tutorial by Bharati DW Consultancy cell: +1-562-646-6746 ...

Data Science & Machine Learning -Introduction to Shiny App- DIY- 42 -of-50

Data Science & Machine Learning -Introduction to Shiny App- DIY- 42 -of-50 Do it yourself Tutorial by Bharati DW Consultancy cell: +1-562-646-6746 (Cell ...

Semi-automated rainfall prediction models using Shiny

Here, I used Shiny, an R package that makes it easy to build interactive web applications (apps) straight from R, to develop semi-automated machine learning ...

R/Shiny - Machine Learning for insurance claims

Shiny application that uses machine learning to predict payments on individual insurance claims Live App: Blog Post: ..

Data Science with Python and the Shiny Tools of Anaconda

An ideal Python distribution for data analysis and data science is Anaconda which not only comes with the most important machine learning, data wrangling, and ...

Inventory Management and Demand Forecast - Using R and Shiny | Insight Hub

Inventory Management is considered a nightmare in the retail industry since the demand and supply balance keeps changing dynamically. With Machine ...

Build scalable Shiny applications for employee attrition prediction on Azure cloud

Voluntary employee attrition may negatively affect a company in various aspects. Identifying employees with inclination of leaving is therefore pivotal to save ...

Introduction to Forecasting in Machine Learning and Deep Learning

Forecasts are critical in many fields, including finance, manufacturing, and meteorology. At Uber, probabilistic time series forecasting is essential for marketplace ...