AI News, Sign Up Successful – Please Check Your Inbox
- On Sunday, September 30, 2018
- By Read More
Sign Up Successful – Please Check Your Inbox
Some of H2O’s mission critical applications include predictive maintenance, operational intelligence, security, fraud, auditing, churn, credit scoring, user based insurance, sepsis, ICU transfers and others in over 5,000 organizations.
Products:H2O: H2O makes it possible for anyone to easily apply machine learning and predictive analytics to solve today’s most challenging business problems.
Now data scientists and developers can launch turnkey compute environments for collaboratively training and deploying predictive models and integrate those models into real-time smart applications.
Sparkling Water = H20 + Apache Spark
Over the past few years, we watched Matei and the team behind Spark build a thriving open-source movement and a great development platform optimized for in-memory big data, Spark.
At the same time, H2O built a great open source product with a growing customer base focused on scalable machine learning and interactive data science.
These past couple of months the Spark and H2O teams started brainstorming on how to best combine H2O’s Machine Learning capabilities with the power of the Spark platform.
One of the primary draws for Spark is its unified nature, enabling end-to-end building of API’s within a single system.
Fast fully featured algorithms in H2O will add to growing open source efforts in R, MLlib, Mahout and others, disrupting closed and proprietary vendors in machine-learning and predictive analytics.
Welcome to H2O 3¶
When you launch H2O on Hadoop using the hadoop jar command, YARN allocates the necessary resources to launch the requested number of nodes.
If the cluster manager settings are configured for the default maximum memory size but the memory required for the request exceeds that amount, YARN will not launch and H2O will time out.
To calculate the amount of memory required for a successful launch, use the following formula: The mapreduce.map.memory.mb value must be less than the YARN memory configuration values for the launch to succeed.
How-to: Build a Machine-Learning App Using Sparkling Water and Apache Spark
Thanks to Michal Malohlava, Amy Wang, and Avni Wadhwa of H20.ai for providing the following guest post about building ML apps using Sparkling Water and Apache Spark on CDH.
An entry point to the H2O programming world (called H2OContext) is created and allows for the launch of H2O, parallel import of frames into memory and the use of H2O algorithms.
Because all the data used in the modeling process needs to read into memory, the recommended method of launching Spark and H2O is through YARN, which dynamically allocates available resources.
YARN will allocate the container to launch the application master in and when you launch with yarn-client, the spark driver runs in the client process and the application master submits a request to the resource manager to spawn the Spark Executor JVMs.
(Although this app is tested on Spark 1.4, it should work on 1.3, the version inside CDH 5.4, as well without mods.) We’ve seen some incredible applications of Deep Learning with respect to image recognition and machine translation but this specific use case has to do with public safety;
The cool thing about these two cities (and many others!) is that they are both open data cities, which means anybody can access city data ranging from transportation information to building maintenance records.
For this example, we looked at the historical crime data from both Chicago and San Francisco and joined this data with other external data, such as weather and socioeconomic factors, using Spark’s SQL context:
Figure 3: Spark + H2O Workflow We perform the data import, ad-hoc data munging (parsing the date column, for example), and joining of tables by leveraging the power of Spark.
Figure 6: Chicago arrest rates and total % of all crimes by category Once the data is transformed to an H2O Frame, we train a deep neural network to predict the likelihood of an arrest for a given crime.
Figure 7: Chicago validation data AUC The last building block of the application is formed by a function which predicts the arrest rate probability for a new crime.
Figure 8: Geo-mapped predictions Because each of the crimes reported comes with latitude-longitude coordinates, we scored our hold out data using the trained model and plotted the predictions on a map of Chicago—specifically, the Downtown district.
The color coding corresponds to the model’s prediction for likelihood of an arrest with red being very likely (X > 0.8) and blue being unlikely (X < 0.2).
- On Monday, June 24, 2019
Sparkling Water 2 0: The Next Generation of Machine Learning on Apache Spark (Jakub Hava)
Sparkling Water integrates the H2O open source distributed machine learning platform with the capabilities of Apache Spark. It allows users to leverage H2O's ...
Productionizing H2O Models using Sparkling Water by Jakub Hava
Slides can be viewed here: In this webinar, Jakub Háva, ..
ISV Showcase: End-to-end Machine Learning using H2O on Azure : Build 2018
H2O's AI platform provides open source machine learning framework that works with sparklyr and PySpark. H2O's Sparkling Water allows users to combine the ...
Sparkling Water Webinar
Sparkling Water is the latest innovation to combine two best-of-breed open source technologies Apache Spark and H2O. Sparkling Water is the newest ...
H2O.ai - Ep. 14 (Deep Learning SIMPLIFIED)
H2O.ai is a software platform that offers a host of machine learning algorithms, as well as one deep net model. It also provides sophisticated data munging, ...
Scaling Machine Learning at Booking.com with H2O Sparkling Water
This presentation was recorded at #H2OWorld 2017 in Mountain View, CA. Enjoy the slides: ...
Jo-fai Chow - Introduction to Machine Learning with H2O and Python
Description H2O.ai is focused on bringing AI to businesses through software. Its flagship product is H2O, the leading open source platform that makes it easy for ...
Start Getting Your Feet Wet in Open Source Machine and Deep Learning
At H2O.ai we see a world where all software will incorporate AI, and we're focused on bringing AI to business through software. H2O.ai is the maker behind H2O, ...
Ashrith Barthur | Getting started with H2O on Python
PyData DC 2016 Github: H2O helps Python users make the leap from single machine based processing to ..
H2O Quick Start with Sparkling Water
UPDATE: Spark versions 1.3 and 1.4 are now supported! -------------------------------------------------------------------------------------------------------- H2O 3.0 (previously ...