Sunday, March 29, 2009

Primer on PLINQ

In this presentation, we present a primer on LINQ and then we extended its reach to the parallel version Parallel Language Integrated Query (or Parallel LINQ). Parallel LINQ (PLINQ) provides Declarative data parallelism through the ParallelEnumerable and ParallelQuery Classes. We concentrate on the ParallelQuery Class which has two methods (AsParallel, AsSequential).

Language Integrated Query (LINQ) is a special kind of search that locates data from various data sources. LINQ shortens queries while simplifying connecting to a variety of data source. It is all about searching efficiently and consistently with less effort. LINQ searches an array as one searches a Structured Query Language (SQL) server. It divides queries into four common types: LINQ-to-Object and LINQ-to-XML (which we will find also support PLINQ), as well as, LINQ-to-Dataset and LINQ-to-SQL.

The NET Framework LINQ namespaces create a different kind of data connection. The System.Linq namespace contains all the basic classes for LINQ and the System.Linq.Expressions namespace contains the classes, interfaces, and enumerations used to create expressions.

The three Stages in a Query Operation are

1/. Get the data source. If the source is an array you must declare the array and assign values.
2/. Define the query expression
3/. Execute the query to return the results.

For example, a LINQ query that retrieves data from an array would show these three stages as:

int[] nums = new int[] { 0, 4, 2, 6, 3, 8, 3, 0, 4, 2, 1 };

var result = from n in nums
where n < 5
orderby n
select n;

foreach (int i in result)

The standard LINQ operators consist of a collection of 50 methods that define extension methods on the static Enumerable and Queryable classes from the System.Linq namespace. The operators fall into one of two categories: for deferred execution of a query where the query will not execute until you consume the results, and Other operators will execute a query immediately.

LINQ uses keywords for making a query. They tell LINQ what to search for, starting with defining the from and in keywords. The where, orderby, join, and let provide additional conditions. A LINQ query requires four lines. First a variable that holds the query. An Enumerator object to select individual query values. The var keyword identifies the query variable. For example:

var MyQuery =
from StringValue
in QueryString
select StringValue;

Parallel LINQ (PLINQ) forms declarative data parallelism. Parallel Language Integrated Query (PLINQ) uses System.Linq namespace in the System.Threading.dll assembly. LINQ's declarative nature provides the flexibility for a clever implementation of PLINQ to use parallelization. PLINQ extends LINQ developers to use multiple cores for their LINQ expressions by running any LINQ-to-objects query using data parallelism. PLINQ fully supports all .NET query operators and the existing LINQ model.

PLINQ uses two classes:

 The ParallelEnumerable method exposed through System.Linq.ParallelEnumerable Class
 The AsParallel method exposed through System.Linq.ParallelQuery Class

PLINQ is a query execution engine that accepts any LINQ-to-Objects or LINQ-to-XML query and automatically utilizes multiple processors or cores for execution when they are available. The change in programming model is tiny, meaning you don't need to be a concurrency guru to use it.

With PLINQ you don't need to move your entire database server processing logic over to in-memory LINQ-to-Objects queries in the client. Instead, PLINQ offers an incremental way of using parallelism for existing solutions.

Internally, PLINQ uses Tasks and the parallelizing query gets processed by multiple threads. Preserving the order is an extra step that gives up some of the performance gains.

The following Listing expanding on an example provides an AsParallel methods for finding prime numbers in a loop using PLINQ based upon an example provided by Steven Toub.

Counting Prime Numbers In For Loop

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data;
using System.Xml;
using System.Diagnostics;
using System.Threading;

namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
while (true)
{
Console.WriteLine(Time(delegate
{
Queue primes = new Queue();

for (int i = 0; i < 2000000; i++)
{
if (CheckPrime(i)) primes.Enqueue(i);

}
}));

Console.WriteLine(Time(delegate
{
Queue primes = new Queue();
Parallel.For(0, 2000000, i =>
{
if (CheckPrime(i)) primes.Enqueue(i);

});
}));
Console.ReadLine();
}
}

private static bool CheckPrime(int p)
{
if (p < 2) return false;
int upperBound = (int)Math.Sqrt(p);
for (int i = 2; i <= upperBound; i++)
{
if (p % i == 0) return false;
}
return true;
}

static TimeSpan Time(Action a)
{
Stopwatch sw = Stopwatch.StartNew();
a();
return sw.Elapsed;
}

}
}

SUMMARY

In this presentation, we began with a primer on LINQ and then we extended its reach to the parallel version Parallel Language Integrated Query (or Parallel LINQ). Parallel LINQ (PLINQ) provides Declarative data parallelism through the ParallelEnumerable and ParallelQuery Classes. We concentrated on the ParallelQuery Class which has two methods (AsParallel, AsSequential) and provide applications to demonstrate its capabilities.

REFERENCES

[1] Alesso, H. P. and Smith, C. F., Connections: Patterns of Discovery, John Wiley & Sons Inc., New York, NY, 2007.

[2] Alesso, H. P. and Smith, C. F., Thinking on the Web: Berners-lee, Turing and Godel, John Wiley & Sons, Inc. 2008.

[3] Microsoft Visual Studio 2010