AI News, The hardest parts of datascience
The hardest parts of datascience
Contrary to common belief, the hardest part of data science isn’t building an accurate model or obtaining good, clean data.
Before discussing the hardest parts of data science, it’s worth quickly addressing the two main contenders: model fitting and data collection/cleaning.
Most Kaggle competitions are focused on model fitting: Participants are given a well-defined problem, a dataset, and a measure to optimise, and they compete to produce the most accurate model.
It is sometimes due to stakeholders who don’t know what they want, and expect data scientists to solve all their data problems (either real or imagined).
Examples of such problems include: Often, it can be hard to get to the stage where the problem is agreed on, because this requires dealing with people who only have a fuzzy idea of what can be done with data science.
For example, improving the well-being of the population (e.g., a company’s customers or a country’s citizens) is an overarching problem that arises in many situations.
However, the reality is that experimental data is often censored, there many constraints on running experiments (ethics, practicality, budget, etc.), and confounding factors may make it impossible to identify the true causal impact of interventions.
In both cases it is hard (or impossible) to perform experiments to determine causality, and in both cases this fact has been used to mislead the public by parties with commercial and ideological interests.
In the case of smoking, due to ethical reasons, one can’t perform an experiment where a random control group is forced not to smoke, while a treatment group is forced to smoke.
Tobacco companies have exploited this fact for years, claiming that there may be some genetic factor that causes both smoking and a higher susceptibility to smoking-related diseases.
While no serious climate scientist doubts the fact that human activities are causing climate change, this can’t be proved through experimentation on another Earth.
It doesn’t take a scientist to figure out that pumping your lungs full of smoke on a regular basis is likely to be harmful, as is pumping the atmosphere full of greenhouse gases that have been sequestered for millions of years.
by Frank Shostak summarises some of the problems with GDP: The GDP framework cannot tell us whether final goods and services that were produced during a particular period of time are a reflection of real wealth expansion, or a reflection of capital consumption.
For instance, if a government embarks on the building of a pyramid, which adds absolutely nothing to the well-being of individuals, the GDP framework will regard this as economic growth.
It is a bit odd that GDP growth is still considered a worthwhile goal by many people, given that it can easily be skewed by a few powerful individuals who choose to build unnecessary pyramids (though perhaps this is the real reason why the GDP persists –
For example, with the precision and recall measures that are commonly used to evaluate the performance of search engines, it is rare to be able to increase both precision and recall at the same time.
For example, in the 1990s, the number of page views was a good measure of interaction with websites, but nowadays it is a pretty weak measure because many websites are single-page applications.
As demonstrated by the examples throughout this article, over-simplification of complicated matters is a pervasive issue that goes beyond what’s commonly considered “data science”.
I believe it’s also important to maintain one’s integrity and not just make up stories that people would buy, but it’d be naive to assume that this never happens.
Measuring economiesThe trouble with GDP
ONE of Albert Einstein’s greatest insights was that no matter how, where, when or by whom it is measured, the speed of light in a vacuum is constant.
In the case of light, a measurement of inflation based on the cost of things that generated light and one based on a quality-adjusted measure of light itself would have differed by 3.6% a year. When a first-year undergraduate first encounters the idea of GDP as the value added in an economy, adjusted for inflation, it sounds pretty straightforward, says Sir Charles Bean, the author of a recent review of economic statistics for the British government.
The production boundary Measuring GDP requires adding up the value of what is produced, net of inputs, across a wide variety of business lines, weighting each according to its importance in the economy.
For today’s rich economies, dominated by made-to-order services and increasingly geared to the quality of experience rather than the production of ever more stuff, the trickiness is raised to a higher level.
In a world where houses are Airbnb hotels and private cars are Uber taxis, where a free software upgrade renews old computers, and Facebook and YouTube bring hours of daily entertainment to hundreds of millions at no price at all, many suspect GDP is becoming an ever more misleading measure.
In Britain Colin Clark, an enterprising civil servant, had been collecting statistics on national income since the 1920s, and in 1940 John Maynard Keynes made a plea for more detailed figures on Britain’s capacity to make guns, tanks and aeroplanes.
He went on to establish the modern definition of GDP as the sum of private consumption and investment and government spending (with account taken for foreign trade).
Kuznets had treated government spending as a cost to the private sector, but Keynes saw that if wartime procurement by the state was not treated as demand, GDP would fall even as the economy grew.
measure created when survival was at stake took little notice of things such as depreciation of assets, or pollution of the environment, let alone finer human accomplishments.
In a famous speech in March 1968, Robert Kennedy took aim at what he saw as idolatrous respect for GDP, which measures advertising and jails but does not capture “the beauty of our poetry or the strength of our marriages”.
In 1972 Mr Nordhaus and James Tobin, a colleague at Yale, came up with a “measure of economic welfare” which counted some bits of state spending, such as defence and education, not as output but as a cost to GDP.
In 2009 a report commissioned by the French president, Nicolas Sarkozy, and chaired by Joseph Stiglitz, a prominent economist, called for an end to “GDP fetishism” in favour of a “dashboard” of measures to capture human welfare.
This convention means that so-called “home production”, such as housework or caring for an elderly relative, is excluded from GDP, even though such unpaid services have considerable value.
It is only fairly recently that statisticians have started to measure some bits of public-sector output directly by, for instance, counting the number of operations performed by health services or the number of students taught in schools.
Typically financial services are not paid for directly in fees: banks make a large part of their income from charging more interest on loans than they pay on deposits.
To capture the value being added, statisticians use an imputed figure, the “spread” between a risk-free interest rate and a lending rate, and multiply this by the stock of loans.
But because fear of bank defaults was driving spreads up, GDP figures recorded a spike in the sector’s value added, and thus its contribution to GDP (see chart 1).
The statisticians have to fall back on crude proxies to estimate what is going on: thus the paid-sex market is assumed to expand in line with the male population, and the charges at lap-dancing clubs are taken as a measure of the price of sex.
further complication is that, for all the caution that statisticians offer against seeing GDP as a measure of welfare, the two are intertwined in perhaps the trickiest part of their calculations: adjusting for inflation.
Multiplying together the prices of n goods and then taking the nth root of the product allows price aggregations to take into account a degree of switching proportionate to the change in relative prices.
An advisory committee of leading economists set up by America’s Senate in the mid-1990s and headed by Michael Boskin, of Stanford University, reckoned that failure to adjust for quality and new products meant true inflation was overstated by at least 0.6% a year.
It called for greater use of “hedonic” estimation, a technique that captures the implicit value of each particular attribute of a product by measuring how variation in those traits affects the product’s price: for example, how much more do people pay for a brighter light bulb?
The Bureau of Economic Analysis, America’s main statistical body, has used market wage rates to estimate the value of home-production activities, such as cooking, cleaning and ironing.
Following a similar approach, Erik Brynjolfsson and Joo Hee Oh of MIT estimated that the welfare gain of free internet products added 0.74% a year to America’s GDP between 2007 and 2011 (other studies reach somewhat lower estimates).
a landmark post-war study reckoned that GDP per worker rose by 1.4% a year, an unprecedented rate, in the first half of the 19th century.
You say you measured a revolution In the 1980s, research by Nicholas Crafts of Warwick University found that the 18th century’s glut of industrially transformative inventions had been applied rather narrowly, with madcap growth seen only in a few sectors of the economy.
But would a typical American really be indifferent between 1989 medical care at 1989 prices and today’s medical services at current prices, asks Ken Rogoff of Harvard University?
The challenge, said Mr Nordhaus in his paper on light, is to construct measures that “account for the vast changes in the quality and range of goods and services that we consume.” But that means finding ways to more readily compare hand-held e-mail with fax machine, self-driving car with jalopy, vinyl records with music-streaming services and custom-made prosthesis with health-service crutches.
The quest to measure happiness has missed a key metric—and it’s more important than money
The Nordic nations, with their generous social welfare systems supporting small, homogenous populations, are often trumpeted as somehow getting it all right.
Ask Icelanders why the country consistently tops happiness charts and some will cite women’s empowerment, while others point to the sense of community fostered by hanging out in the country’s many municipal swimming pools and hot springs.
Last year, Michael Porter, a Harvard economist, led the launch of the Social Progress Index (SPI), a new tool to measure how societies are doing in comparison to one another and on a range of non-economic measures.For years economists have been trying to answer the ultimate question: What do we really need to be happy?
The SPI teases out a range of indicators, from basic human needs like water and sanitation, to more complex ones like access to advanced education.
Despite a decent economic situation in aggregate, on an individual level, people weren’t able to pursue their goals, Porter suggested.
In 2016, Kuwait scored much lower than many places in the category of personal rights, for example, even though its wealth means that other needs were being met.
Saudi Arabia, where women aren’t allowed to drive, came 126th out of 160 countries on those same rights despite being one of the richest countries in the world.
Traditional economic indicators failed to capture the region’s discontent with a shortage of quality jobs, poor public services, and lack of government accountability, according to a World Bank report released in 2015.
“The old social contract of redistribution with limited voice had stopped working, especially for the middle class, prior to 2011.
Though by many measures people in the West are wealthy, they felt lack of opportunity—in work or the ability to plan for the future—which led to deep discontent.
Yet the post-industrial Western world got used to measuring social progress using GDP, which led to prioritizing economic effort above most everything else.
Keeping populations safe, making them productive, or controlling them have all been goals of various administrations from far in the past up to the present day.) Wealth is nothing without the opportunity provided by good health to live free of pain and worry.
person’s happiness, and her perception of success or failure, ultimately depends on what measures the individual values over the course of her life—whether that’s providing for a family, fighting climate change, or writing poetry.
Is GDP the best measure of growth?
The extraordinary economic expansion of the past 50 years was clearly a success in terms of GDP: the world economy is six times larger, and average per capita income has almost tripled.
Designed to measure the physical production of goods in the market economy, GDP is not well suited to accounting for private- and public-sector services with no output that can be measured easily by counting the number of units produced.
For the latter, there are many alternative measures, including the Human Development Index (HDI), introduced by the United Nations in 1990, and the OECD’s Better Life Index.2 2.Per capita GDP as a measure of national economic performance and broader measures of well-being, such as the HDI, are not identical, but they correlate with one another.
These correlations reflect positive feedback mechanisms in both directions: healthier, more educated people are more productive, while higher national incomes generate resources that can be used to improve health and public services.
So while we have used GDP to define growth in our report, we welcome the portfolio of initiatives that aspire to improve the GDP accounts, define new metrics of importance, and create dashboards that reflect a more robust picture of well-being.
Sustaining rapid gains in productivity and standards of living requires leaders, in both the private and public sectors, to think about not only every aspect of how organizations operate but also the trade-offs that may be required.
Any new conversation needs to include fundamental questions about how the world economy is run, and every assumption about growth and the role it plays in people’s lives needs to be robustly debated.
What Are the Best Measurements of Economic Growth?
If a statistician wants to understand the productive output of the steel industry, for example, he needs only to track the dollar value of all of the steel that entered the market during a specific period.
The productive capacity of an economy does not grow because more dollars move around, an economy becomes more productive because resources are used more efficiently.
In other words, economic growth needs to somehow measure the relationship between total resource inputs and total economic outputs.
The OECD itself described GDP as suffering from a number of statistical problems. Its solution was to use GDP to measure aggregate expenditures, which theoretically approximates the contributions of labor and output, and to use multi-factor productivity, or MFP, to show the contribution of technical and organizational innovation.
Most of the nation's resources are dedicated toward the war effort, such as producing tanks, ships, ammunition and transportation, and all of the unemployed are drafted into war service.
With an unlimited demand for war supplies and government financing, the standard metrics of economic health would show progress.
- On Saturday, August 24, 2019
Types of data, time series data, cross sectional data and pooled data
In this video tutorial you will learn Types of data and sources of data for empirical analysis. In types of data there are three types, which we discussed in this ...
What Is Economic Growth And How Is It Measured?
If the economy had grown at that pace for an entire year, annual growth would be 18 apr 2015 in general, economic is measured terms of production activities a ...
Forecast Accuracy: MAD, MSE, TS Formulas
We enter the formulas that measure the accuracy of the forecast. Based in Excel 2003/2000.
Sensitivity Analysis in Excel
How to do a sensitivity analysis in Excel with two input variables.
What Is The Genuine Progress Indicator?
A new index of changes in well being australiaexecutive director, the 6 jun 2014 bhutan's gross national happiness and vermont's genuine progress indicator ...
What Is The GNP Deflator Why Is It Used?
Definition. Gross National Product deflator. The ratio of a countrys nominal GNP to its real GNP, expressed as a percentage. It measures the percentage ...
What Is The Meaning Of GDP Per Capita?
Per capita gdp definition & example per boundless. The per capita gdp is especially useful when comparing one country to another, because it shows the ...
The Gini Coefficient
This video introduces the Gini coefficient, which is a way to summarize income inequality using a single number. For more information and a complete listing of ...
Festival of Dangerous Ideas 2013: Vandana Shiva - Growth = Poverty
When natural resources like timber, water and mineral deposits can be extracted from ecosystems, they become assets with dollar values that can be bought ...
Macroeconomics - Chapter 19: GDP: Measuring Total Production and Income
Microeconomics is the study of how households and firms make choices, how they interact in markets, and how the government attempts to influence their ...