AI News, BOOK REVIEW: My data science journey

My data science journey

I describe here the projects that I worked on, as well as career progress, starting 25 years ago as a PhD student in statistics, until today, and the transformation from statistician to data scientist that occurred slowly and started more than 20 years ago.

My first project involved designing an alarm system, to send automated email to channel managers whenever traffic numbers were too low or too high: a red flag indicated significant under-performance, a bold red-flag indicated extreme under-performance.

The alarm system used SAS to predict traffic (time series modeling, with seasonality, and confidence intervals for daily estimates), Perl/CGI to develop it as an API, access databases, and to send automated email, Sybase (star schema) to access traffic database and create a small database of predicted/estimated traffic (to match with real, observed traffic), and of course, cron jobs to run everything automatically, in batch mode, according to a pre-specified schedule - and resume automatically in case of crash or other failure (e.g.

It started with Visa in 2002, after a small stint with a statistical litigation company (William Wecker Associates), where I improved time-to-crime models that were biased because of right-censorship in the data (future crimes attached to a gun are not seen yet - this was an analysis in connection with the gun manufacturers lawsuit).

At Visa, I developed multivariate features for credit card fraud detection in real time, especially single-ping fraud, working on data sets with 50 million transactions - too big for SAS to handle at that time (a SAS sort would crash), and that's when I first developed Hadoop-like systems (nowadays, SAS sort can very easily handle 50 million rows without visible Map-Reduce technology).

Most importantly, I used Perl, associative arrays and hash tables to process hundreds of feature combinations (to detect the best one based on some lift metric) while SAS would - at that time - process one feature combination over the whole weekend.

It had been wrong for a long time without anyone noticing, well before I joined this project: Tealeaf sessions spanning accross multiple servers were broken in small sessions (we discovered it by simulating our own sessions and look at what shows up in the log files, the next day), making it impossible to really track user activity.

But I came back to the Internet around 2005, this time to focus on traffic quality, click fraud, taxonomy creation, and optimizing bids on Google keywords - projects that require text mining and NLP (natural language processing) expertize.

IATA Traffic data & Statistical Services

Learn more about our full suite of traffic data and statistical services, including: World Air Transport Statistics: Monthly Traffic Statistics: ..

Excel - Time Series Forecasting - Part 1 of 3

Part 2: Part 3: This is Part 1 .

How Does Google Traffic Work?

How does Google Traffic keep up with road conditions in so many places? Freshbooks message: Head over to and don't forget ..

TomTom Traffic Stats

Traffic Stats is a popular traffic analytics platform from TomTom and is available either through the self-service Traffic Stats web portal or via the API. The Traffic ...

How to Check live Data Traffic using tplink wifi router

Find the traffic of your all devices. and check the DATA traffic on a particular or single device like phone or laptop and pc without any application software.

FortiGate Cookbook - Logging Traffic and Using FortiView (5.4)

Want to learn more? Watch our other Cookbook videos here: In this video, ..

Marine Traffic – A visualisation of global shipping data

Unsigned created custom software to produce a film showcasing the data in a new visual format. Learn more:

Safe Traffic Rotator Google Analytics Real Time Stats - Semalt

Visit us - Subscribe to get free educational videos here ..

Road traffic accident global statistics must watch

Network Traffic Statistical Analysis