AI News, Artificial Intelligence in Motion
- On Thursday, June 7, 2018
- By Read More
Artificial Intelligence in Motion
As i said at my last post, i will begin to post some articles about some approaches that i developed in order to find new users from the web service Twitter (a real-time micro-blogging and social network web service that the user can post messages up to 140 characters).
These articles will introduce some basic concepts about data clustering analysis, a method used to discover and visualize things, people or ideas that have close relations.
For this, it will be presented how to gather and prepare all data provided from the Twitter and show some particular clustering algorithms associated with well-known distance measures.
Developing those clustering algorithms helped me to understand how i could design my recommendation algorithm using a special distance measure giving as result a score for a specified user of the social network.
The supervised learning techniques use data and expected results in order to 'learn' how to extract new information and produce a result based on what he has learned until that moment.
Now, let's get it started by exploring the twitter user profiles, and show based on their statuses updates (text messages), how they can be grouped in accordance to their statuses (text) and also the words based on their use.
For this experiment, i've used the subset of 100 friends randomly picked from my friends social network and the clustered data will be the number of times that a particular set of words appears in each user's twitter statuses .
By clustering user profiles based on word frequencies, it may be possible to verify if there are groups of user profiles that frequently write about similar subjects or write in similar styles (e.g.
To generate this dataset, you'll be downloading the twitter statuses from a set of users, extracting the text from the entries, and creating a table of word frequencies.
Since words like 'the' will appear in almost all of them, you can reduce the total number of words included by selecting only those word that you consider viable to appear in the list of words.
The final step is to use the list of words and the list of statuses to create a text file containing a big matrix of all word counts for each of the user statuses.
The advantage is that you can easily load and dump the data without losing the type of the object and avoid further parsing and processing to manage the data.
Since some twitter statuses contain more entries or much longer entries than others, and will thus contain more words overall, the Pearson correlation will correct for this, trying to determine how well two sets of data fit onto a stright line.
In this article in order to draw and save as JPG the dendrogram, i will use the Python library (PIL) which is available at http://pythonware.com The PIL makes it very easy to generate images with text and lines, which is all you'll really need to construct a dendrogram.
- On Wednesday, September 18, 2019
How to Cluster Cues in the Assessment Phase of the Nursing Process
During the assessment phase of the nursing process nurses cluster cues before forming a nursing diagnosis. In this video I am going to outline exactly what a ...
Phone Chips Powering Desktops?
Tell us what you want to see from Linus Media Group merch! Twitter: Instagram: .
Using twitter to predict heart disease | Lyle Ungar | TEDxPenn
Can Twitter predict heart disease? Day in and day out, we use social media, making it the center of our social lives, work lives, and private lives. Lyle Ungar ...
Container DevOps in Azure : Build 2018
It's never been easier to containerize your services using Docker and deploy them to Azure using Kubernetes. In this session we will introduce you to the world ...
5 2 5B clustering and motifs 2045
The Banach–Tarski Paradox
Q: "What's an anagram of Banach-Tarski?" A: "Banach-Tarski Banach-Tarski." twitter: Instagram: ..
Technology Keynote: Microsoft Azure : Build 2018
Create a Free Account (Azure):
Intro to Big Data, Data Science & Predictive Analytics
We introduce you to the wide world of Big Data, throwing back the curtain on the diversity and ubiquity of data science in the modern world. We also give you a ...
Getting started with containers on Azure : Build 2018
There is a common thread in advancements in cloud computing – they enable a focus on applications rather than the machines running them. Containers, one of ...