AI News, BOOK REVIEW: Making MySQL 5,888.6x faster! Yep, you read that right!

Making MySQL 5,888.6x faster! Yep, you read that right!

I had patiently waited 588 seconds (close to 10 minutes) for MySQL to execute a query and then watched as BigObject cranked out the same query in 0.1 seconds!

GB RAM The data set we worked with: Sales: 1,000,004 records Product: 4,434 records Customer: 10,000 records MySQL query: SELECT SUM(qty), customer.gender FROM sales JOIN customer ON sales.cid = customer.id GROUP BY customer.gender;

Without BigObject, MySQL takes a long time to perform queries making real-time data analysis and feedback virtually impossible.

With BigObject and the Pre-Cache layer, there is a dramatic turbo-boost that allows you to build amazing real-time data analysis and feedback functionality into your application.

He has consulted fortune 500 companies providing key direction in the utilization of future technologies to achieve market dominance.

As the chief evangelist at BigObject, Titus is driven to revolutionize data capture and analysis as well as help companies derive maximum value from their data.

SnappyData is Spark 2.0 compatible and open source. Download now

One of Databricks’ most well-known blogs is the blog where they describe joining a billion rows in a second on a laptop.

Start the SnappyData shell with some decent memory (required for the data load test): (the GC options are similar to a default SnappyData cluster) Define a simple benchmark util function Let’s do a warmup first that will also initialize some Spark components and sum a billion numbers: >

Incidentally, the Parquet reader is actually quite a bit faster than Spark caching when compression is disabled and file is completely in the OS cache, again showing the power of vectorization and code generation.

The trick lies in Spark's optimized implementation for single column join on integral types when the values are contiguous where it can use a 'dense' array with upper and lower bounds instead of a full hashmap.

Time taken in SnappyData (two column join): 0.81 seconds Still sub-second and ~19X faster than Spark 2.0 The improvement seen in this particular case is due to a more generic approach to optimizing joins on contiguous values.

While Spark uses a single column “dense” vector to optimize the single column join case, the SnappyData’s hash-join implementation uses per-column MIN/MAX tracking to quickly reject streamed rows if any of the join keys lie beyond the limits.

Time taken in Spark (join and groupBy): 12.4 seconds So the performance drops by more than two orders of magnitude for such a join and groupBy (the data size is 100M compared to 1 billion in the joins before).

For better response times to queries on a non-changing data sets, Spark recommends caching data from external data sources as cached tables in Spark.

For instance, a scan of columns managed as byte arrays is copied into an 'UnsafeRow' object for each row, and then the column values are read from this row breaking vectorization and introducing lots of expensive copying.

A set of tasks scheduled from the driver to an executor are grouped as a single TaskSet message with common task data sent only once instead of separate messages for each task.

For example time range based queries will now be able to scan only the batches that fall in the said range and skip others, providing a huge boost to such queries on time-series data.

Alternate hash aggregation and hash join operators have been added that have been finely optimized for SnappyData storage to make full use of storage layout with vectorization, dictionary encoding to provide an order of magnitude performance advantage over Spark's default implementations.

Performance tips for Azure Cosmos DB and .NET

SQL .NET SDK version 1.9.0 and above support parallel queries, which enable you to query a partitioned collection in parallel (see Working with the SDKs and the related code samples for more info).

Parallel queries provide two parameters that users can tune to custom-fit their requirements, (a) MaxDegreeOfParallelism: to control the maximum number of partitions then can be queried in parallel, and (b) MaxBufferedItemCount: to control the number of pre-fetched results.

If the partitioned collection is partitioned such a way that all or a majority of the data returned by a query is concentrated in a few partitions (one partition in worst case), then the performance of the query would be bottlenecked by those partitions.

Performance Tips and Tricks in .NET Applications

(25 printed pages) Overview Performance Tips for All Applications Tips for Database Access Performance Tips for ASP.NET Applications Tips for Porting and Developing in Visual Basic Tips for Porting and Developing in Managed C++ Additional Resources Appendix: Cost of Virtual Calls and Allocations This white paper is designed as a reference for developers writing applications for .NET and looking for various ways to improve performance.

This paper strictly builds on that knowledge, and assumes that the programmer already knows enough to get the program running.

Some of the tips here are helpful in the design phase, and provide information you should be aware of before you begin the port.

The first set of tips is a must-read for writing in any language, and contains advice that will help you with any target language on the Common Language Runtime (CLR).

Due to schedule limitations, the version 1 (v1) run time had to target the broadest functionality first, and then deal with special-case optimizations later.

Here's a simple example of how expensive exceptions can be: we'll simply run through a For loop, generating thousands or exceptions and then terminating.

chunky call is a function call that performs several tasks, such as a method that initializes several fields of an object.

This is to be viewed against chatty calls, that do very simple tasks and require multiple calls to get things done (such as setting every field of an object with a different call).

It's important to make chunky, rather than chatty calls across methods where the overhead is higher than for simple, intra-AppDomain method calls.

In each of these cases, you should try to design your application so that it doesn't rely on small, frequent calls that carry so much overhead.

transition occurs whenever managed code is called from unmanaged code, and vice versa.

The overhead is as little as 31 instructions plus the cost of marshalling if data marshalling is required, and only 8 otherwise.

Make sure that data that gets passed across the managed boundary is only converted if it needs to be: it may turn out that simply by agreeing on a certain datatype or format across your program you can cut out a lot of marshalling overhead.

The following types are called blittable, meaning they can be copied directly across the managed/unmanaged boundary with no marshalling whatsoever: sbyte, byte, short, ushort, int, uint, long, ulong, float and double.

This adds extra boxing and unboxing overhead to your program, and can end up costing you more than it would if you had stuck with objects!

Add is useful for adding a single item, whereas AddRange has some extra overhead but wins out when adding multiple items.

The JIT is smart enough (in many cases) to optimize away bounds-checking and other things inside a For loop, but is prohibited from doing this on foreach walks.

The end result is that in version 1, a For loop on strings is up to five times faster than using foreach.

When a string is modified, the run time will create a new string and return it, leaving the original to be garbage collected.

Most of the time this is a fast and simple way to do it, but when a string is being modified repeatedly it begins to be a burden on performance: all of those allocations eventually get expensive.

Here's a simple example of a program that appends to a string 50,000 times, followed by one that uses a StringBuilder object to modify the string in place.

It definitely makes the most sense to run ngen.exe during install time, since you can make sure that the application is optimized for the machine on which it is being installed.

The v1 JIT optimizes jagged arrays (simply 'arrays-of-arrays') more efficiently than rectangular arrays, and the difference is quite noticeable.

Here is a table demonstrating the performance gain resulting from using jagged arrays in place of rectangular ones in both C# and Visual Basic (higher numbers are better): The assignment benchmark is a simple assignment algorithm, adapted from the step-by-step guide found in Quantitative Decision Making for Business (Gordon, Pressman, and Cohn;

The neural net test runs a series of patterns over a small neural network, and the numeric sort is self-explanatory.

The optimizations made to jagged arrays will be added to future versions of the JIT, but for v1 you can save yourself a lot of time by using jagged arrays.

For very specific instances, you may be able to get an improvement from a larger buffer (loading large images of a predictable size, for example), but in 99.99% of cases it will only waste memory.

All buffers derived from BufferedStream allow you to set the size to anything you want, but in most cases 4 and 8 will give you the best performance.

The philosophy of tuning for database access is to use only the functionality that you need, and to design around a 'disconnected' approach: make several connections in sequence, rather than holding a single connection open for a long time.

If you use a more generic interface such as System.Data.Odbc when you could be using a specialized component, you will lose performance dealing with the added level of indirection.

A reader is simply a stateless stream that allows you to read data as it arrives, and then drop it without storing it to a dataset for more navigation.

Here's a small table demonstrating the difference between DataReader and DataSet on both ODBC and SQL providers when pulling data from a server (higher numbers are better): As you can see, the highest performance is achieved when using the optimal managed provider along with a data reader.

Be sure to use CommandType.StoredProcedure instead of CommandType.Text Connection pooling is a useful way to reuse connections for multiple requests, rather than paying the overhead of opening and closing a connection for each request.

There are a lot of options that you can set for the connection pool, and you can track the performance of the pool by using the Perfmon to keep track of things like response time, transactions/sec, etc.

For the SQL Managed Provider, it's done via the connection string: When filling a dataset with the data adapter, don't get primary key information if you don't have to (e.g.

These require additional trips to the server to retrieve meta data, and give you a lower level of interaction control.

Remember that the dataset stores all of its data in memory, and that the more data you request, the longer it will take to transmit across the wire.

This is essential for dealing with blob data types since it allows data to be read off of the wire in small chunks.

Imagine a complex e-commerce site with several static pages for login, and then a slew of dynamically-generated pages containing images and text.

For even better performance, you could cache commonly used images and boilerplate text that appear frequently on the site using the Cache API.

One extremely powerful feature of ASP.NET is its ability to store session state for users, such as a shopping cart on an e-commerce site or a browser history.

This carries less overhead than full read/write session state, and is useful when you need only part of the functionality and don't want to pay for the write capabilities.

An example of View State might be a long form that users must fill out: if they click Back in their browser and then return, the form will remain filled.

Perhaps the largest performance drain here is that a round-trip signal must be sent across the network each time the page is loaded to update and verify the cache.

The managed world is free-threaded, and using Single Threaded Apartment COM requires that all unmanaged threads essentially share a single thread for interop.

The majority of the performance issues come from areas where the run time does not support a feature of Visual Basic 6, and it has to be added to preserve the feature in Visual Basic 7.

However, this in itself may uncover other sections of your code that are doing more work than you had previously thought, and it may help you stomp some bugs in the process.

This can result in a substantial performance loss, and specifying that your characters are a full word long (using charw) eliminates this conversion.

Without a return statement, each function is given several local variables on stack to transparently support returning values without the keyword.

If you aren't sure about MC++, there are many good resources to help you make your decision This section is targeted at developers who have already decided that they want to use MC++ in some way, and want to know about the performance aspects of it.

I'm going to focus on the 'port-everything' option or deal with writing MC++ from scratch for the purposes of this discussion, since those are the scenarios where the programmer will notice a performance difference.

Memory management, thread scheduling and type coercions can be left to the run time if you desire, allowing you to focus your energies on the parts of the program that need it.

The VC7 compiler, not bound by the time restrictions of the JIT, can perform certain optimizations that the JIT cannot, such as whole-program analysis, more aggressive inlining and enregistration.

You can spend your time porting more code, rather than writing special wrappers to glue the ported and not-yet-ported code together if you use MC++, and that can result in a big win.

You can always interoperate with unsafe code if you need those features, but you will pay the performance penalty of marshalling data back and forth.

MC++'s ability to mix managed and unmanaged code smoothly provides the developer with a lot of power, and you can choose where on the gradient you want to sit when writing your code.

MC++ allows you to tweak some of the performance hits inherent in managed code, by giving you precise control over when to use unsafe features.

You can take the address of an item in the middle of an array and return that address from a function: We exploit this feature for returning a pointer to the 'characters' in a System.String via our helper routine, and we can even loop through arrays using these pointers: You can also do a linked-list traversal with injection in MC++ by taking the address of the 'next' field (which you cannot do in C#): In C#, you can't point to 'Head', or take the address of the 'next' field, so you have make a special-case where you're inserting at the first location, or if 'Head' is null.

Just place __box keyword before any type to represent its boxed form: In C# you have to unbox to a 'v', then update the value and re-box back to an Object: The bad news: In C++, using the STL Collections was often just as fast as writing that functionality by hand.

The CLR frameworks are very fast, but they suffer from boxing and unboxing issues: everything is an object, and without template or generic support, all actions have to be checked at run time.

I recommend using this method in tight code where performance is absolutely critical, and you've identified two or three hot spots.

In the v1 run time, all indirect function calls are made natively, and therefore require a transition into unmanaged space.

Any indirect function call can only be made from native mode, which means that all indirect calls from managed code need a managed-to-unmanaged transition.

However, in the specific case of a regular C++ file that has been compiled using /clr, the method return will be considered managed.

One of the nicest things about MC++ is that you come to grips with all the performance issues up front, before you start coding: this is helpful in paring down work later on.

Watch for future articles currently under development, including an overview of design, architectural and coding philosophies, a walkthrough of performance analysis tools in the managed world, and a performance comparison of .NET to other enterprise applications available today.

This chart compares the cost associated with different types of method calls, as well as the cost of instantiating a type that contains virtual methods.

While these numbers will certainly vary on different machines and configurations, the relative cost of performing one call over another remains significant.

Notice that calling a non-virtual method within a ValueType is more than three times as fast as in a class, but once you treat it as a class you lose terribly.

Java Persistence Performance

}And the entity looks like this:@Entity@Table(name = "TSC06_JOB_QUEUE")@XmlRootElement@NamedQueries({ @NamedQuery(name = "Tsc06JobQueue.findAll", query = "SELECT t FROM Tsc06JobQueue t"), @NamedQuery(name = "Tsc06JobQueue.findByJobRunId", query = "SELECT t FROM Tsc06JobQueue t WHERE t.jobRunId = :jobRunId"), @NamedQuery(name = "Tsc06JobQueue.findByJobStartTime", query = "SELECT t FROM Tsc06JobQueue t WHERE t.jobStartTime = :jobStartTime"), @NamedQuery(name = "Tsc06JobQueue.findByJobEndTime", query = "SELECT t FROM Tsc06JobQueue t WHERE t.jobEndTime = :jobEndTime")})public class Tsc06JobQueue implements Serializable { private static final long serialVersionUID = 1L;

// @Max(value=?) @Min(value=?)//if you know range of your decimal fields consider using these annotations to enforce field validation @Id @Basic(optional = false) @NotNull @Column(name = "JOB_RUN_ID") // @SequenceGenerator( name = "appJobSeq", sequenceName = "TSC06_JOB_RUN_ID_SEQ", allocationSize = 1, initialValue = 1 )// @GeneratedValue( strategy = GenerationType.SEQUENCE, generator = "appJobSeq"

Optimizing and Troubleshooting Your Application, the Google Way (Cloud Next '18)

In order to operate highly available and performant services at scale, Google had to invent concepts like distributed tracing and production debugging. We now ...

Why ORM is an Anti-Pattern? (webinar #10)

We discussed why Object-Relational Mapping was actually an anti-pattern and its usage must be replaced with SQL-speaking objects. The discussion was ...