AI News, Advanced Machine Learning with Basic Excel

Advanced Machine Learning with Basic Excel

In this article, I present a few modern techniques that have been used in various business contexts, comparing performance with traditional methods.

The techniques have been used by the author in automated data science frameworks (AI to automate content production, selection and scheduling for digital publishers) but also in the following contexts: The technique blends multiple algorithms that at first glance look traditional and math-heavy, such as decision trees, regression (logistic or linear) and confidence intervals.

The methodology presented here is the result of 20 years worth of applied research on various large industrial data sets, where the author  tried for years (eventually with success) to build a system that is simple and work.

It is aimed at people that are not professional coders, people who manage data scientists, BI experts, MBA professionals, and people from other fields, with an interest in understanding the mechanics of some state-of-the-art machine learning techniques, without having to spend months or years learning mathematics, programming, and computer science.

10 observations)  have the estimated / predicted response computed using the Jackknife regression (algorithm #2 above), the remaining points get scored using the pseudo-decision tree algorithm (algorithm #1 above.) A

In addition, while not incorporated in the spreadsheet, confidence intervals can be computed for each node with at least n observations (say n = 10) using percentiles for the response, computed for all data points (in this case, representing articles) in the node in question, see example at the bottom of section 3.

By response, I mean the variable that we are trying to predict: in this case the page views number attached to an article (indeed, its logarithm, to smooth out big spikes due to external factors, or the fact that older articles have by definition more page views -- see page view decay prediction for details.) So no statistical theory is used anywhere in the methodology, not even to compute confidence intervals.

Classical decision trees, especially the large ones with millions of nodes from just one single decision tree and involving more than 5 or 6 features at each final node, suffer from similar issues: over-fitting, artificial feature selection resulting in difficulties interpreting the results.

Yet even with this restricted set of features, it reveals interesting insights about some keywords (Python, R, data, data science) associated with popularity (Python being more popular than R), and some keywords that surprisingly, are not (keywords containing ''analy', such as analytic.) Besides keywords found in the title, other features are used such as time of publication, and have also been binarized to increase stability and avoid an explosion in the number of nodes.

Side Note: Confidence intervals for response (example) Node N-100-000000 in the spreadsheet has an average pv of 5.85 (pv is the response), and consists of the following pv values: 5.10, 6.80, 5.56, 5.66, 6.19, 6.01, 5.56, 5.10, 6.80, 5.69.

Competition is also dropping this kind of content for the same reasons, so, ironically, this is an opportunity to build a monopoly.  Also variety is critical, and only promoting blogs that work well today is a recipe for long term failure, though it works well in the short term.  To read my best articles on data science and machine learning, click here.

xlsx-style

For those contributing to this fork: Supported read formats: Supported write formats: Demo: http://oss.sheetjs.com/js-xlsx Source: http://git.io/xlsx With npm: In the browser: With bower: CDNjs automatically pulls the latest version and makes all versions available at http://cdnjs.com/libraries/xlsx

shim To use the shim, add the shim before the script tag that loads xlsx.js: For parsing, the first step is to read the file.

This example extracts the value stored in cell A1 from the first worksheet: This example iterates through every nonempty of every sheet and dumps values: Complete examples: Note that older versions of IE does not support HTML5 File API, so the base64 mode

On OSX you can get the base64 encoding with: The node version installs a command line tool xlsx which can read spreadsheet files

workbook is a workbook object: Complete examples: XLSX is the exposed variable in the browser and the exported node variable XLSX.version is the version of the library (added by the build script).

XLSX.write(wb, write_opts) attempts to write the workbook wb XLSX.writeFile(wb, filename, write_opts) attempts to write wb to filename Utilities are available in the XLSX.utils object: Exporting: Cell and cell address manipulation: js-xlsx conforms to the Common Spreadsheet Format (CSF): Cell address objects are stored as {c:C, r:R} where C and R are 0-indexed column

the following pattern to walk each of the cells in a range: Built-in export utilities (such as the CSV exporter) will use the w text if it is

Special worksheet keys (accessible as worksheet[key], each starting with !): The following properties are currently used when generating an XLSX file, but not yet parsed: workbook.SheetNames is an ordered list of the sheets in the workbook wb.Sheets[sheetname] returns an object representing the worksheet.

The exported read and readFile functions accept an options argument: The defaults are enumerated in bits/84_defaults.js The exported write and writeFile functions accept an options argument: Cell styles are specified by a style object that roughly parallels the OpenXML structure.

COLOR_SPEC: Colors for fill, font, and border are specified as objects, either: BORDER_STYLE: Border style is a string value which may take on one of the following values: Borders for merged areas are specified for each cell within the merged area.

So to apply a box border to a merged area of 3x3 cells, border styles would need to be specified for eight different cells: Tests utilize the mocha testing framework.

The simplest way to test is to move the script: To produce the dist files, run make dist.

Data Analysis in Excel 7 - Quickly Import XML Files and Data into Excel

Excel Forum: Excel Tutorials: This Excel Video Tutorial shows you how to import XML files and data.

Power Query & Excel Keyword Generator

Watch how easy & fast can KW generation be! Input sheet for KW combinations is here: You..

How to Insert Check Mark Symbol in Excel

In this tutorial you are going to learn how to insert a tick mark in Excel. Step # 1 -- Navigating to the Character Map in Excel First of all open the sheet where you want to insert the...

Use forward and backward pass to determine project duration and critical path

Check out for more free engineering tutorials and math lessons! Project Management Tutorial: Use forward and backward pass to determine project duration and critical path

UiPath Studio - Desktop Automation and Data Inputs

In this video we will automate the previous desktop application based on external data received from somewhere else. Now, for training purposes, we will use the simplest method to enter data...

Build a Twitter Bot: Getting mentions (1/4)

Want more? Explore the library at Official site Twitter

Python for Beginners: Reading & Manipulating CSV Files

A quick tutorial designed for anyone interested in Python and learning what basic programming skills can do for you. More Python training & resources at:

Scraping ecommerce data

Learn how to extract product data from the Amazon web site.

C++ Tutorial - 24 - Reading From a File

Facebook - GitHub - Google+ - LinkedIn -