AI News, Data Treatment in Environmental Sciences

Data Treatment in Environmental Sciences

Data Treatment in Environmental Sciences presents the various methods used in the analysis of databases—obtained in the field or in a laboratory—by focusing on the most commonly used multivariate analyses in different disciplines of environmental sciences, from geochemistry to ecology.

The approach taken by the author details (i) the preparation of a dataset prior to analysis, in relation to the scientific strategy and objectives of the study, (ii) the preliminary treatment of datasets, (iii) the establishment of a structure of objects (stations/dates) or relevant variables (e.g.

The approach taken by the author details (i) the preparation of a dataset prior to analysis, in relation to the scientific strategy and objectives of the study, (ii) the preliminary treatment of datasets, (iii) the establishment of a structure of objects (stations/dates) or relevant variables (e.g.

Big data

Big data is a term used to refer to data sets that are too large or complex for traditional data-processing application software to adequately deal with.

Data with many cases (rows) offer greater statistical power, while data with higher complexity (more attributes or columns) may lead to a higher false discovery rate.[2]

Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating, information privacy and data source.

Current usage of the term 'big data' tends to refer to the use of predictive analytics, user behavior analytics, or certain other advanced data analytics methods that extract value from data, and seldom to a particular size of data set.

Scientists, business executives, practitioners of medicine, advertising and governments alike regularly meet difficulties with large data-sets in areas including Internet search, fintech, urban informatics, and business informatics.

Data sets grow rapidly- in part because they are increasingly gathered by cheap and numerous information- sensing Internet of things devices such as mobile devices, aerial (remote sensing), software logs, cameras, microphones, radio-frequency identification (RFID) readers and wireless sensor networks.[10][11]

data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process data within a tolerable elapsed time.[22]

data requires a set of techniques and technologies with new forms of integration to reveal insights from datasets that are diverse, complex, and of a massive scale.[25]

2016 definition states that 'Big data represents the information assets characterized by such a high volume, velocity and variety to require specific technology and analytical methods for its transformation into value'.[26]

2018 definition states 'Big data is where parallel computing tools are needed to handle data', and notes, 'This represents a distinct and clearly defined change in the computer science used, via parallel programming theories, and losses of some

CERN and other physics experiments have collected big data sets for many decades, usually analyzed via high performance computing (supercomputers) rather than the commodity map-reduce architectures usually meant by the current 'big data' movement.

The methodology addresses handling big data in terms of useful permutations of data sources, complexity in interrelationships, and difficulty in deleting (or modifying) individual records.[48]

The data lake allows an organization to shift its focus from centralized control to a shared model to respond to the changing dynamics of information management.

preferring direct-attached storage (DAS) in its various forms from solid state drive (SSD) to high capacity SATA disk buried inside parallel processing nodes.

Between 1990 and 2005, more than 1 billion people worldwide entered the middle class, which means more people became more literate, which in turn led to information growth.

The world's effective capacity to exchange information through telecommunication networks was 281 petabytes in 1986, 471 petabytes in 1993, 2.2 exabytes in 2000, 65 exabytes in 2007[12]

While many vendors offer off-the-shelf solutions for big data, experts recommend the development of in-house solutions custom-tailored to solve the company's problem at hand if the company has sufficient technical capabilities.[66]

Data analysis often requires multiple parts of government (central and local) to work in collaboration and create new and innovative processes to deliver the desired outcome.

Research on the effective usage of information and communication technologies for development (also known as ICT4D) suggests that big data technology can make important contributions but also present unique challenges to International development.[68][69]

Advancements in big data analysis offer cost-effective opportunities to improve decision-making in critical development areas such as health care, employment, economic productivity, crime, security, and natural disaster and resource management.[70][71][72]

However, longstanding challenges for developing regions such as inadequate technological infrastructure and economic and human resource scarcity exacerbate existing concerns with big data such as privacy, imperfect methodology, and interoperability issues.[70]

Predictive manufacturing as an applicable approach toward near-zero downtime and transparency requires vast amount of data and advanced prediction tools for a systematic process of data into useful information.[74]

A conceptual framework of predictive manufacturing begins with data acquisition where different type of sensory data is available to acquire such as acoustics, vibration, pressure, current, voltage and controller data.

Big data analytics has helped healthcare improve by providing personalized medicine and prescriptive analytics, clinical risk intervention and predictive analytics, waste and care variability reduction, automated external and internal reporting of patient data, standardized medical terms and patient registries and fragmented point solutions.[77]

This includes electronic health record data, imaging data, patient generated data, sensor data, and other forms of difficult to process data.

Human inspection at the big data scale is impossible and there is a desperate need in health service for intelligent tools for accuracy and believability control and handling of information missed.[79]

The use of big data in healthcare has raised significant ethical challenges ranging from risks for individual rights, privacy and autonomy, to transparency and trust.[81]

Because one-size-fits-all analytical solutions are not desirable, business schools should prepare marketing managers to have wide knowledge on all the different techniques used in these subdomains to get a big picture and work effectively with analysts.

The industry appears to be moving away from the traditional approach of using specific media environments such as newspapers, magazines, or television shows and instead taps into consumers with technologies that reach targeted people at optimal times in optimal locations.

For example, publishing environments are increasingly tailoring messages (advertisements) and content (articles) to appeal to consumers that have been exclusively gleaned through various data-mining activities.[85]

Health insurance providers are collecting data on social 'determinants of health' such as food and TV consumption, marital status, clothing size and purchasing habits, from which they make predictions on health costs, in order to spot health issues in their clients.

defines the Internet of Things in this quote: “If we had computers that knew everything there was to know about things—using data they gathered without any help from us—we would be able to track and count everything, and greatly reduce waste, loss and cost.

By applying big data principles into the concepts of machine intelligence and deep computing, IT departments can predict potential issues and move to provide solutions before the problems even happen.[93]

In this time, ITOA businesses were also beginning to play a major role in systems management by offering platforms that brought individual data silos together and generated insights from the whole of the system rather than from isolated pockets of data.

Besides, using big data, race teams try to predict the time they will finish the race beforehand, based on simulations using data collected over the season.[135]

They focused on the security of big data and the orientation of the term towards the presence of different type of data in an encrypted form at cloud interface by providing the raw definitions and real time examples within the technology.

Moreover, they proposed an approach for identifying the encoding technique to advance towards an expedited search over encrypted text leading to the security enhancements in big data.[141]

The SDAV Institute aims to bring together the expertise of six national laboratories and seven universities to develop new tools to help scientists manage and visualize data on the Department's supercomputers.

The British government announced in March 2014 the founding of the Alan Turing Institute, named after the computer pioneer and code-breaker, which will focus on new ways to collect and analyse large data sets.[151]

In May 2013, IMS Center held an industry advisory board meeting focusing on big data where presenters from various industrial companies discussed their concerns, issues and future goals in big data environment.

used Google Trends data to demonstrate that Internet users from countries with a higher per capita gross domestic product (GDP) are more likely to search for information about the future than information about the past.

The authors of the study examined Google queries logs made by ratio of the volume of searches for the coming year ('2011') to the volume of searches for the previous year ('2009'), which they call the 'future orientation index'.[158]

Eugene Stanley introduced a method to identify online precursors for stock market moves, using trading strategies based on search volume data provided by Google Trends.[159]

An important research question that can be asked about big data sets is whether you need to look at the full data to draw certain conclusions about the properties of the data or is a sample good enough.

Even as companies invest eight- and nine-figure sums to derive insight from information streaming in from suppliers and customers, less than 40% of employees have sufficiently mature processes and skills to do so.

As a response to this critique Alemany Oliver and Vayre suggested to use 'abductive reasoning as a first step in the research process in order to bring context to consumers’ digital traces and make new theories emerge'.[177] Additionally,

Agent-based models are increasingly getting better in predicting the outcome of social complexities of even unknown future scenarios through computer simulations that are based on a collection of mutually interdependent algorithms.[178][179]

Finally, use of multivariate methods that probe for the latent structure of the data, such as factor analysis and cluster analysis, have proven useful as analytic approaches that go well beyond the bi-variate approaches (cross-tabs) typically employed with smaller data sets.

new postulate is accepted now in biosciences: the information provided by the data in huge volumes (omics) without prior hypothesis is complementary and sometimes necessary to conventional approaches based on experimentation.[181][182]

Large data sets have been analyzed by computing machines for well over a century, including the 1890s US census analytics performed by IBM's punch card machines which computed statistics including means and variances of populations across the whole continent.

However science experiments have tended to analyse their data using specialized custom-built high performance computing (supercomputing) clusters and grids, rather than clouds of cheap commodity computers as in the current commercial wave, implying a difference in both culture and technology stack.

Integration across heterogeneous data resources—some that might be considered big data and others not—presents formidable logistical as well as analytical challenges, but many researchers argue that such integrations are likely to represent the most promising new frontiers in science.[192] In

the authors title big data a part of mythology: 'large data sets offer a higher form of intelligence and knowledge [...], with the aura of truth, objectivity, and accuracy'.

the other hand, big data may also introduce new problems, such as the multiple comparisons problem: simultaneously testing a large set of hypotheses is likely to produce many false results that mistakenly appear significant. Ioannidis

Using Fourier transform IR spectroscopy to analyze biological materials

Although the spectral domain allows chemical identification, the combination with microscopy (microspectroscopy) permits the examination of complex tissues and heterogeneous samples5.

In transflection mode, for illustration, the sample is placed on an inexpensive IR-reflecting surface (such as that found on low-emissivity (Low-E) slides) and measurements are generated by a beam passing through the sample and reflecting back from the substrate (i.e., the reflective surface) through the sample.

One excellent interpretation application of IR imaging data is to consider it as a metabolomic tool that allows the in situ, nondestructive analysis of biological specimens, (e.g., determining the glycogen levels in cervical cytology)25.

IR spectra representing distinguishing fingerprints of specific cell types (e.g., stem cells versus transit-amplifying cells versus terminally differentiated cells) within a defined tissue architecture (e.g., crypts of the gastrointestinal tract and cornea)9,26 are now easily recorded.

For biomedical analyses, the major goal today is to derive an image of tissue architecture expressing the underlying biochemistry in a label-free fashion28, a development that can considerably extend our diagnostic potential beyond present capabilities.

The screening of cervical cytology specimens to distinguish normal versus low-grade versus high-grade cells4,29, to grade primary neoplasia30, or to determine whether tissue margins and potential metastatic sites are tumor free31,32 are examples of this concept across many types of tissues.