AI News, Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Those efforts seem to have paid off, based on my view counts over the past year: And based on read counts, here are my top 10 blog posts, most of which are stats-related: It’s so nice to see people are enjoying the posts, even sharing them and reaching out with additional thoughts and questions.

We Analyzed 1.3 Million YouTube Videos. Here’s What We Learned About YouTube SEO

We analyzed 1.3 million YouTube videos to better understand how YouTube’s search engine works.

Videos that contain an exact match keyword in their video title appear to have a slight edge over videos that don’t. This means that including a keyword in your title may improve your rankings by a slim margin.

In fact, the average length of a video ranking on the first page of YouTube is 14 minutes, 50 seconds.

The value that longer videos provide may encourage more interaction signals (including comments and likes) that ultimately impact rankings.

In fact, if you do a cursory search of popular keywords, you’d be hard pressed to find a short video (<

The average video on the first page of YouTube’s search results is 14 minutes, 50 seconds long.

In fact, we did find that shares have a strong correlation with higher rankings in YouTube: It’s important to note that we used YouTube’s public share report for this analysis.

One of the major issues of using social shares as a ranking signal is that they’re easily gamed.

Unlike sharing content using a webpage’s social sharing icons, YouTube knows which users share video content…and where they share.

Combine that with the fact that YouTube encourages publishers to create highly-shareable content (and that YouTube reports shares in YouTube Analytics), and you have a strong possibility that the relationship between shares and rankings is more than a chance correlation.

So they changed their algorithm to emphasize factors like audience retention and engagement: However, we discovered that a video’s total view count continues to have a significant correlation with rankings.

That’s because, without views, your video can’t generate the other signals that YouTube uses to evaluate your video’s quality (like total watch time and comments).

We found a moderate correlation between a channel’s total subscribers and rankings: This is good news if you run a small or new channel.

For example, for this popular keyword, videos from two small channels outrank a video from a channel that has over 2 million subscribers: This type of result isn’t uncommon on YouTube.

As they do with shares, YouTube displays the number of subscriptions driven underneath each video: (Publishers can choose not to show this information publicly).

However, you can also ask viewers to subscribe: I’ve found that a clear call-to-action to subscribe significantly boosts my “subscriptions driven”

We found a weak correlation between keyword-rich video tags and rankings: While tags don’t appear to be as important as they once were, our data shows that they still make a small dent.

However, we found that including an exact keyword in your video title only has a slight potential impact on rankings: These findings could mean a few things: It could be that YouTube has de-emphasized the importance of video titles.

However, this seems unlikely as YouTube has stated that: “Titles contain valuable information to help viewers find your videos in search results.”

In fact, its common to see videos ranking well in YouTube for popular keywords…even when they don’t contain the exact term in their title.

According to our data, keyword-optimized descriptions don’t have any impact on rankings: This finding contradicts a common “best practice”

There are a few possible explanations for this finding: First, like with titles, YouTube may not require an exact keyword in your description to understand what your video is about.

This implies that your video description isn’t nearly as important as user-generated signals (including views and “subscriptions driven”).

This is unlikely as YouTube states that: “Well-written descriptions with the right keywords can boost views and watch time because they help your video show up in search results.”.

An optimized description helps you show up in the suggested videos sidebar, which is a significant source of views for most channels.

However, I still recommend writing keyword-rich descriptions as they can help your video rank for related terms (and appear as a “suggested video”).

We discovered that HD videos appear significantly more often than SD videos on YouTube’s first page: This data can be interpreted in two ways: First, it could be that YouTubers that create the best video content also tend to record in HD.

Warren Buffett’s Best Kept Secret to Success: The Art of Reading, Remembering, and Retaining More Books

This is how Warren Buffett, one of the most successful people in the business world, describes his day.

Staples (yes, the office supply chain) collected speed reading data as part of an advertising campaign for selling e-readers.

Still, if you can bump up your words per minute marginally while still maintaining your reading comprehension, it can certainly pay dividends in your quest to read more.

In this sense, a desire to read more might simply mean having more time to read, and reading more content—books, magazines, articles, blog posts—in whole.

There are huge extremes at either end, both those who read way more than 17 books per year and those who read way less—like zero.

A Huffington Post/YouGov poll from 2013 showed that number might be even higher: 28 percent of Americans haven’t read a book in the past year.

guide to reading 300% faster Tim Ferriss, author of the 4-Hour Workweek and a handful of other bestsellers, is one of the leading voices in lifehacks, experiments, and getting things done.

According to Ferriss: Untrained readers use up to ½ of their peripheral field on margins by moving from 1st word to last, spending 25-50% of their time “reading” margins with no content.

You’ll find similar ideas in a lot of speed reading tips and classes (some going so far as to suggest you read line by line in a snake fashion).

The takeaway here: If you can advance your peripheral vision, you may be able to read faster—maybe not 300 percent faster, but every little bit counts.

Spritz and Blinkist take unique approaches to helping you read more—one helps you read faster and the other helps you digest books quicker.

Each word is centered in the box according to the Optimal Recognition Point—Spritz’s term for the place in a word that the eye naturally seeks—and this center letter is colored red.

Spritz has yet to launch anything related to its technology, but there is a bookmarklet called OpenSpritz, created by, that lets you use the Spritz reading method on any text you find online.

Though the way the information is delivered—designed to look great and be eminently usable on mobile devices so you can learn wherever you are—makes it one-of-a-kind.

If you look at it in terms of raw numbers, the average person watches 35 hours of TV each week, the average commute time is one hour per day round-trip, and you can spend at least another hour per week for grocery shopping.

reading habits, Pew also noted that the average reader of e-books reads 24 books in a year, compared to a person without an e-reader who reads an average of 15.

How to Talk About Books You Haven’t Read, written by University of Paris literature professor Pierre Bayard, suggests that we view the act of reading on a spectrum and that we consider more categories for books besides simply “have or haven’t read.”

UB book unknown to me SB book I have skimmed HB book I have heard about FB book I have forgotten ++ extremely positive opinion + positive opinion – negative opinion – extremely negative opinion Perhaps the key to reading more books is simply to look at the act of reading from a different perspective?

great place to start with book retention is with understanding some key ways our brain stores information. Here are three specific elements to consider: Let’s say you read Dale Carnegie’s How to Win Friends and Influence People, one of our favorites here at Buffer.

In the case of Carnegie’s book, if there is a particular principle you wish to retain, think back to a time when you were part of a specific example involving the principle.

Inspectional reading can take two forms: 1) a quick, leisurely read or 2) skimming the book’s preface, table of contents, index, and inside jacket.

Getting into detail with a book (as in the analytical and syntopical level) will help cement impressions of the book in your mind, develop associations to other books you’ve read and ideas you’ve learned, and enforce repetition in the thoughtful, studied nature of the different reading levels.

Even Professor Pierre Bayard, the author of How to Talk About Books You Haven’t Read, identifies the importance of note-taking and review: Once forgetfulness has set in, he can use these notes to rediscover his opinion of the author and his work at the time of his original reading.

We can assume that another function of the notes is to assure him that he has indeed read the works in which they were inscribed, like blazes on a trail that are intended to show the way during future periods of amnesia.

Response to “Proposal to Update Data Management of Genomic Summary Results Under the NIH Genomic Data SharingPolicy”

introduction The exome era turns five years old this fall, with the anniversary of the “Targeted capture and massively parallel sequencing of 12 human exomes”

Born at a time when whole genome sequencing cost about a quarter million dollars, exome sequencing provided the cost savings that made sequencing large numbers of individuals for research feasible.

If we run out of family members and still haven’t found a likely causal variant, only then do we turn to whole genome sequencing (WGS), often alongside functional analyses such as RNA sequencing from muscle tissue.

WGS has remained a last resort for a few reasons: it’s still a bit more expensive than WES, it produces more data we have to process and store, and despite some notable recent advances [Ritchie 2014, Kircher 2014] our ability to interpret the pathogenicity of variants in the 99% of the genome that doesn’t code for protein is still a far cry from what we can do with exons.

But we still turn to WGS for two main reasons: (1) its superior ability to call structural variations by covering the breakpoints, and (2) its more reliable coverage of the exome itself.

We now have 11 individuals from Mendelian disease families on whom we’ve performed both standard WES and very, very high-quality WGS, which gives us a chance to quantify exactly what we’re missing in the WES.

The question of what we’re missing in WES is one input to our periodic re-evaluation of whether WES is still the best value for research dollars or if we should be turning to WGS right away.

Again, this is not designed to be a totally fair comparison, and the WES and WGS datasets differ on several dimensions: Many of the limitations we see in our analysis below are not actually inherent in exome capture but rather relate to PCR amplification or read length.

For the coverage depth analysis, I used BEDtools to create four sets of genomic intervals: Note that in this analysis I did not consider all Gencode intervals, as Agilent SureSelect does not even attempt to target some of these intervals.

(The question of relative coverage of all genic regions is a topic for a separate post.) For each genomic interval, in WGS and WES, I then ran GATK DepthOfCoverage for every combination of minimum MAPQ of 0, 1, 10, or 20, and minimum base pair quality of 0, 1, 10 or 20.

In other words, I computed the read depth across every base of these genomic intervals, stratified by mapping quality and by base pair quality.

Although the samples were sequenced to a variety of mean exonic depths in both WES and WGS, it is categorically true across all 11 samples that the mean exonic depth of the WES data is higher than the WGS data:

As you’d expect, the WGS BAMs have similar coverage in a 10bp buffer around all exons (or muscle exons) as they do in the exons themselves, while the WES has reduced coverage in the ± 10bp buffer regions.

In the whole genomes, on the other hand, requiring base quality of 10 or 20 loses almost a third of the depth in the one sample shown here (yellow curves).

My suspicion is that this is not due to it being WGS per se, but rather the fact that 250 bp reads are near the limit of what next generation sequencing is presently capable of, so that getting such long reads involves some sacrifice in base pair quality at the ends of the read.

Therefore I computed for each of the 11 samples what percent of the Broad Exome is covered at 20x or better, varying the minimum MAPQ threshold but applying no base quality cutoff.

The comparable picture 20x+ depth stratifying by base quality is more predictable: as above, the WGS but not WES samples here take a hit at the higher base quality cutoffs:

Fixing these variables lets us look more quantitatively at the distribution of coverage depth across the exome, instead of looking at mean coverage or at the percent covered at ≥ 20x.

So if you consider 10x the minimum to have decent odds of calling a variant, then the combination of longer reads (thus greater mappability) and more uniform coverage in these WGS samples is bringing an additional 5% of the exome into the callable zone.

Surprisingly, though we think of muscle disease genes as being enriched for unmappable repetitive regions, the above figure is nearly identical when only muscle disease gene exons are included.

Moreover, even among the PASS variants in the WES set, many PASSed only because the WES samples had been joint-called with a few hundred other samples, while none of the 11 individuals considered here actually had an alternate allele call with GQ >

I looked at all such discordant sites in muscle disease gene exons (16 sites), and then intended to look at a random sampling of 10 WES-only and 10 WGS-only alt alleles exome-wide, though I ended up doing this a few separate times as I tweaked the filters above.

In the final run, with the PASS and genotype quality filters described above in place before the random sampling, I did not notice an obvious difference in the proportion of genotype calls that I concluded looked correct on IGV after viewing reads from both datasets.

6 of the 10 WES-only alleles looked like the correct genotype call had indeed been made based on WES, and symmetrically, 6 of the 10 WGS-only allele calls looked correct.

chose this example because it’s interesting to note that the variable depth in WES doesn’t just lead to missed genotypes, it also leads to false positives.

In WES, almost half of reads supported that same interpretation, but a subset of reads (6 of them) failed to span the LLL repeat, so they didn’t see that there were 4 Ls instead of 3, and this resulted in a call of a 3-bp deletion of the R.

The capture probe for this exon in the Agilent baits file ends at the beginning of the polyA: So perhaps it has captured fragments centered further left, such that only their tail ends protrude into the polyA area.

The density of mismatches may be enough to throw off alignment of a 76bp read but not a 250bp read, or it may be that so many SNPs so close together is enough to reduce the affinity of the DNA for capture probes, introducing bias where the ref allele dominates.

Although calls like this with no read support are probably just technical errors, I added this category to the list to remind us that not all of the ~2000 WGS-only alleles and ~500 WES-only alleles reflect differences in sequencing technology –

This probably reflects the more uniform distribution of coverage in our WGS data due to lack of capture and lack of PCR amplification, as well as the superior mappability of the longer reads.

WGS gave superior variant calls where its longer reads gave better mappability or spanned short tandem repeats, where its coverage was more uniform either in total depth or in distribution of the position of a variant within a read, and where it gave more even allelic balance in loci where a high density of SNPs may have interfered with exome capture.

WES and WGS both gave spurious variant calls at sites of extraordinarily high depth, and many calls in both datastes were simply not supported by reads at all.

Slate’s Use of Your Data

to deliver relevant advertising on our site, in emails and across the Internet,

to personalize content and perform site analytics.

for more information about our use of data, your rights, and how to withdraw consent.

It is widely accepted that academic papers are rarely cited or even read.

Dahlia Remler takes a look at the academic research on citation practices and finds that whilst it is clear citation rates are low, much confusion remains over precise figures and methods for determining accurate citation analysis.

“90% of papers published in academic journals are never cited.” This damning statistic from a 2007 overview of citation analysis recently darted about cyberspace. A similar statistic had made the rounds in 2010 but that time it was about 60% of social and natural science articles that were said to be uncited.

I was not the only one who wanted the supporting evidence.  So, I dove into Google scholar, searching the disparaged academic literature for articles on academic citation rates.

For everything except humanities, those numbers are far from 90% but they are still high: One third of social science articles go uncited! Ten points for academia’s critics. Before we slash humanities departments, though, remember that much of their most prestigious research is published in books.

Within a given journal, some articles have many citations while others have few and many have zero—citations within a given journal are skewed.  The average rate of citations for a whole journal, the impact factor, is pulled up by the few articles with many citations.

had a hard time finding the rates at which articles were uncited, because the overwhelming majority of relevant articles were about other things, such as the effect of time windows, different inclusion criteria for citations, whether the Internet has changed citation practices and so on.

If I recall correctly, he got the figures from/during a lecture he attended in the UK in which the presenter claimed that 90% of the published papers goes uncited and 50% remains unread other than by authors/referees/editors.” Meho noted that the 90% figure was about right for the humanities but not other fields.

Quit social media | Dr. Cal Newport | TEDxTysons

'Deep work' will make you better at what you do. You will achieve more in less time. And feel the sense of true fulfillment that comes from the mastery of a skill.

Link Tracking with Google Analytics

How do you track Link Clicks with Google Analytics? There are two techniques that we are going to discover today in this video: Outbound Link Tracking with ...

An anthropological introduction to YouTube

presented at the Library of Congress, June 23rd 2008. This was tons of fun to present. I decided to forgo the PowerPoint and instead worked with students to ...

PlayStation Presents - PSX 2017 Opening Celebration | English CC

Join us as we kick off PSX 2017 with PlayStation Presents, starting at 8 PM Pacific on Friday December 8th. Listen in on candid discussions with some of ...