AI News, A methodology for solving problems with DataScience for Internet of Things - Part One

A methodology for solving problems with DataScience for Internet of Things - Part One

 We then extend this to a broader question:  Could we formulate a methodology to solve Data Science for IoT problems?  I have illustrated my thinking through a number of companies/examples.  I personally work with an Open Source strategy (based on R, Spark and Python) but  the methodology applies to any implementation.

also mention some trends I am following such as Apache NiFi etc As we move towards a world of 50 billion connected devices,  Data Science for IoT (IoT  analytics) helps to create new services and business models.  IoT analytics is the application of data science models  to IoT datasets.  The flow of data starts with the deployment of sensors.  Sensors detect events or changes in quantities.

Thus, classifiers (used to detect anomalies) are commonly used for IoT analytics to detect anomalies.  But by looking at historical trends, streaming, combining data from multiple events(sensor fusion), we can get new insights.

For example,  If you want to detect failure of a component, you could find spikes in values for that component over a recent span (thereby potentially predicting failure).

Missing values in sensor data could be filled in(imputing values),  sensor data could be combined to infer an event(Complex event processing), Data could be normalized, we could handle different data formats or multiple communication protocols, manage thresholds, normalize data across sensors, time, devices etc  

Complex event processing, or CEP, is event processing that combines data from multiple sources to infer events or patterns that suggest more complicated circumstances.

Thus, typically, CEP engines act as event correlation engines where they analyze a mass of events, pinpoint the most significant ones, and trigger actions.

 An aggregation-oriented CEP solution is focused on executing on-line algorithms as a response  to  event  data  entering  the  system  –  for example to continuously calculate an average based on data in the inbound events.

Detection-oriented CEP is focused on detecting combinations of events called events patterns or situations – for example detecting a situation is to look for a specific sequence of events.

Specifically,  Real-time  systems  perform  analytics  on  short time  windows  for  Data  Streams.  Hence, the scope  of  Real Time analytics is a ‘window’ which typically comprises of the last few time slots.

Once built, the model can be validated against a real time system to find deviations in the real time stream data.

By collecting these logs for a period of time and analyzing the sequence of event patterns, a model to predict a fault can be built including the probability of failure for the sequence.

Real-time bushfire alerting with Complex Event Processing in Apache Flink on Amazon EMR and IoT sensor network

Bushfires are frequent events in the warmer months of the year when the climate is hot and dry.

In this blog post, we use event processing paradigm provided by Apache Flink’s Complex Event Processing (CEP) to detect potential bushfire patterns from incoming temperature events from IoT sensors in real time, and then send alerts via email.

The scenario for this post assumes that the sensors are long-lived battery powered and deployed over a multi-hop wireless mesh network using technologies like LoRaWAN.

Key parameters recorded by the devices include temperature in degree Celsius, time stamp, node id, and infectedBy, as illustrated in Figure 1.

List of sensor events containing measured temperature by the IoT sensor devices over different time points Once a potential bushfire starts to spread, the IoT sensors placed within its path can detect the subsequent temperature increase.

As shown in Figure 2, the parameter ‘infectedBy’ sent by a given node indicates that it has been infected by a neighboring IoT device (that is, the bushfire has been spread through this path) with the ‘node id’ listed as the parameter value.

High-level overview of an IoT sensor network monitoring temperature of the surrounding geographical area For the purposes of this scenario, an 11-node IoT sensor network is shown in Figure 2 that exhibits how the bushfire spreads over time.

A link between two neighboring nodes denotes the wireless connectivity within a multi-hop wireless ad hoc network.

Figure 2 shows the following details: This analogy helps us visualize the overall spread of the bushfire over a network of IoT devices.

Once this pattern is detected by real-time event stream processing in Amazon EMR, an SNS alert email is sent to the subscribed email address.

This streamlines the temperature measurements emitted by the IoT devices over the infrastructure layer to build predictive alert and visualization systems for potential bushfire monitoring.

High-level block diagram of the real-time bushfire alert and visualization systems The diagram shows that the IoT sensor events (that is, measured temperature) feed into an IoT Gateway.

The events are then consumed by a stream processing engine that matches the incoming events to a pattern and later sends out alerts to the subscribers, if necessary.

Architecture of the real-time IoT stream processing pipeline using AWS services In this section, we depict a component-level architecture for an event processing system using several of the AWS services, as shown in Figure 4.

In this case, Amazon Kinesis Data Streams service is chosen as the destination to act as reliable underlying stream storage system with 1 day as retention period.

Apache Flink consumes the records from the Amazon Kinesis Data Streams shards and matches the records against a pre-defined pattern to detect the possibility of a potential bushfire.

AWS IoT rule and action for the incoming temperature events In this blog, we have chosen Apache Flink as the stream processing engine as it provides high throughput with low latency processing of real-time events.

Most IoT use cases deal with a large number of sensor devices continuously generating high volume of events over time.

Celsius and are next followed by another event (from another IoT sensor), which has also reached the same threshold temperature and has been infected by the first IoT sensor node corresponding to the first event on the pattern.

If this condition repeats iteratively for the other three nodes like node-3, node-4, and node-5, then a complete pattern of four network degree path (N1 ->

A sample Amazon SNS email alert to notify a potential bushfire and its traversing path All the incoming IoT event records (unfiltered and raw events got from the Amazon Kinesis Data Stream) are pushed into an Amazon Elasticsearch Service cluster for durable storage and visualization on the Kibana web UI.

A sample bushfire heat-map visualisation from Amazon Elasticsearch Services The URL below explains in detail, the steps to set up all the necessary AWS components and run the IoT simulator and Apache Flink CEP.

It creates the architecture from start to finish, shown in Figure 4, by setting up the IoT simulator in an EC2 instance with all other respective components, and then automatically runs the stack.

This allows access to the IoT simulator running in this EC2 instance from your workstation, and use the same public IP address for accessing Kibana web UI described later in this post.

For simplicity, you can provide the public IP address of the workstation from where you are running the CloudFormation stack, this IP address opens up for access on the Amazon Elasticsearch domain for displaying the real-time Kibana dashboard.

Use the Kibana web URL and create an index pattern under the Management section with the name “weather-sensor-data,” and then choose the dashboard to see the visualization of the real-time spread of the bushfire covered by the IoT network.

Kibana Web UI is not accessible If you see the Message “User: anonymous is not authorized to perform: es:ESHttpGet” while trying to access the Kibana web UI, then this means that the public IP address that was specified during the CloudFormation stack creation time either is not correct or might have been changed.

No SNS alert E-mail notification If you do not receive an SNS email alert about the potential bushfire after several minutes of observing the complete visualization, then check whether you had confirmed the SNS subscription at the beginning while the CloudFormation stack was creating by checking your inbox.

In this blog post, we discussed how to build a real-time IoT stream processing, visualization, and alerting pipeline using various AWS services.

We encourage you to explore the IoT simulator code and test with different network configuration to ingest more records with different patterns and visualize the bushfire spread path pattern on the Kibana dashboard.

Complex event processing

Event processing is a method of tracking and analyzing (processing) streams of information (data) about things that happen (events),[1]

Analysts suggest that CEP will give organizations a new way to analyze patterns in real-time and help the business side communicate better with IT and service departments.[5]

CEP is used in operational intelligence (OI) products to provide insight into business operations by running query analysis against live feeds and event data.

This new event may trigger a reaction process to note the pressure loss into the car's maintenance log, and alert the driver via the car's portal that the tire pressure has reduced.

This new event triggers a different reaction process to immediately alert the driver and to initiate onboard computer routines to assist the driver in bringing the car to a stop without losing control through skidding.

For example, in the final situation the car is moving normally and suffers a blown tire which results in the car leaving the road and striking a tree, and the driver is thrown from the car.

Even though there is no direct measurement that can determine conclusively that the driver was thrown, or that there was an accident, the combination of events allows the situation to be detected and a new event to be created to signify the detected situation.

the aerospace industry, it is good practice to monitor breakdowns of vehicles to look for trends (determine potential weaknesses in manufacturing processes, material, etc.).

One use for CEP is to link these separate processes, so that in the case of the initial process (breakdown monitoring) discovering a malfunction based on metal fatigue (a significant event), an action can be created to exploit the second process (life cycle) to issue a recall on vehicles using the same batch of metal discovered as faulty in the initial process.

The integration of CEP and BPM must exist at two levels, both at the business awareness level (users must understand the potential holistic benefits of their individual processes) and also at the technological level (there needs to be a method by which CEP can interact with BPM implementation).

The CEP application may collect data about what customers on the phone are currently doing, or how they have recently interacted with the company in other various channels, including in-branch, or on the Web via self-service features, instant messaging and email.

The application then analyzes the total customer experience and recommends scripts or next steps that guide the agent on the phone, and hopefully keep the customer happy.[15]

The financial services industry was an early adopter of CEP technology, using complex event processing to structure and contextualize available data so that it could inform trading behavior, specifically algorithmic trading, by identifying opportunities or threats that indicate traders (or automatic trading systems) should buy or sell.[16]

Today, a wide variety of financial applications use CEP, including profit, loss, and risk management systems, order and liquidity analysis, quantitative trading and signal generation systems, and others.

The timestamps are not required to be ascending (merely non-decreasing) because in practice the time resolution of some systems such as financial data sources can be quite low (milliseconds, microseconds or even nanoseconds), so consecutive events may carry equal timestamps.

Or the need to act upon live market prices may involve comparisons to benchmarks that include sector and index movements, whose intra-day and historic trends gauge volatility and smooth outliers.

The majority of these techniques rely on the fact that representing the IoT system's state and its changes is more efficient in the form of a data stream, instead of having a static, materialized model.

What is complex event processing and why is it needed for IoT?

Complex event processing is an emerging network technology commonly used in the “internet of things”

It is a kind of computing in which incoming data about events is turned into more useful, higher level “complex” event data designed to provide insight into what is happening.

The events being analyzed can be happening across different parts of an organization as sales leads, orders or customer service calls, according to David Luckham, research professor of electrical engineering at Stanford.

These data types can include news items, text messages, social media posts, stock market feeds, traffic reports, weather reports or other kinds of data. An event may also be defined as a “change of state,”

Engineering health care solutions through event processing – using commercial or open-source CEP systems that have been tested across a wide range of use cases – may arguably deliver a higher level of safety.

Scalable Real-time Complex Event Processing at Uber, WSO2Con USA 2017

Shuyi Chen, Senior Software Engineer at Uber, presented his talk on "Scalable Real-time Complex Event Processing at Uber" at WSO2Con USA 2017 on 21st ...

Bringing complex event processing to Spark streaming

Complex event processing (CEP) is about identifying business opportunities and threats in real time by detecting patterns in data and taking appropriate ...

Real time Prediction using WSO2 Complex Event Processor

For more information visit :

Eclipse IoT Day ECE 2017 - Complex Event Processing of An Electric Car In A Simple Way

In this talk we will present a unique dashboard for a prototype electric car that is used for displaying basic information such as motor functionalities, speed, ...

Big Data Complex Event Processing (CEP) using scale out Drools in Apex, a Hadoop native platform

Abstract: Learn how to do complex event processing for big data using drools library and Apex platform. Apex is native YARN big data-in-motion platform that ...

CS63 Big Data : Apache Flink CEP

Source code here : - Complex Event Processing will match incoming events against a pattern and triggers ..

Distributed rules engines and CEP

Recorded at SpringOne2GX 2013 in Santa Clara, CA Speaker: John T. Davies, CTO of C24 We've had powerful Rules Engines and Complex Event Processing ...

Integrating Data-Parallel Analytics into Stream-Processing Using an In-Memory Data Grid

William Bain, CEO at ScaleOut Software, Inc., spoke at the In-Memory Computing Summit Europe 2018 in London on June 25. His talk was titled, "Integrating ...

Demo Geo Complex Event Processing

Dipti Patel-Misra, CEP America - Roadmap to Success in Data and Analytics | Corinium

In this all-encompassing conversation with Dipti Patel-Misra, Chief Data & Analytics Officer at CEP America, she shared excellent points as we unravel the ...