Friday, 6 December 2013

First view of parallel programming in C#

I have my current apprentice group sitting an assessment in software testing yesterday. Whilst they were doing this I thought I would investigate parallel computing in C#/.Net.

This is something I was very heavily involved with from about 1997 to 2000. At the time I was working in Schlumberger with MPP (massively parallel computers) – mainly IBM SP/2 (now IBM e-series), Silicon Graphics Origin Servers and Sun SparcStations. In the office (in 1999) we had a sixteen processor IBM SP/2 – but I also worked closely with IBM, SGI and Sun to visit them and use servers with up to 500 processors, including several weeks spent at IBM in Poughkeepsie in upstate New York and Cray Park (SGI’s HQ at the time) in Minnesota.

What I was looking at was something I could do on my own laptop, which has an Intel i5 with 4 cores. Could I get a speed-up and how easy would it be to implement. Previously I used libraries like MPI (Message Passing Interface) and PVM (Parallel Virtual Machine) to provide a common approach for all the platforms that we supported.

I looked for some reference material (see http://www.microsoft.com/en-gb/download/details.aspx?id=19222) which I could work through, but needed some examples to see it in operation. This gives an examples – however, I would strongly recommend read this to understand what some of the issues might be.

I you are reading this looking for asynchronous programming in another thread (i.e. doing some work/calculation whilst loading a page) then this is the wrong article. This will be about solving a problem – and I’ll start with a simple one, but eventually maybe use it for some mathematical puzzle.

The problem that I will look to solve will be to perform the addition of an array of numbers to get a sum. I’ll work with an array of 500,000,000 shorts – and these are initialised via a loop with the current number of milliseconds. I just needed some values in the array. Something more “real” or even “random” would be better, but I just need some numbers.

The code to do this will be

private static short[] numbers = new short[500000000];

static void InitialiseTask()
{
    for (int i = 0; i < numbers.Length; i++)
    {
        numbers[i] = (short) DateTime.Now.Millisecond;
    }
}

To perform the sum of this array the sequential code would be

static void SequentialTask()
{
    Console.WriteLine("Starting Sequential Task");
    long sum = 0;
    for (int i = 0; i < numbers.Length; i++)
    {
        sum += numbers[i];
    }
    Console.WriteLine("Sum {0}", sum);
}

I am going to package these up with a method to record start and end times

static void Task(Action task)
{
    DateTime startTime;
    Console.WriteLine("Starting {0}", task.Method);
    startTime = DateTime.Now;

    // Run the task
    task.Invoke();
    Console.WriteLine("{0}", task.Method);
    Console.WriteLine("-- START {0:hh:mm:ss.fff} FINISH {1:hh:mm:ss.fff} MS {2}",
        startTime, 
        DateTime.Now, DateTime.Now.Subtract(startTime).TotalMilliseconds);
}

So that my main program would be

static void Main(string[] args)
{
    Task(InitialiseTask);
    Task(SequentialTask);
}

And running this gave this result



So let’s turn the sequential task into a parallel task. We will split this up into two summations – the first one will sum the first 250,000,000 values and the second one the last 250,000,000 values. We will then add these two sum together to get the final sum.

We have method below which takes as parameter a task number – which will determine which part of the array we will count. The return is a long – not an int or short which will be too small.

static long ParallelSum(int taskNumber)
{
    Console.WriteLine("Starting Parallel Task {0}", taskNumber);
    long sum = 0;
    int start, end;
    if (taskNumber == 0)
    {
        start = 0;
        end = numbers.Length / 2;
    }
    else
    {
        start = numbers.Length / 2;
        end = numbers.Length;
    }

    for (int i = start; i < end; i++)
    {
        sum += numbers[i];
    }
    Console.WriteLine("Sum for task {0} is {1}", taskNumber, sum);
    return sum;
}

I know I have explicitly set the start/end points of the loop – rather than calculate them. Next for the .Net magic (I always tell my apprentices that there is no magic, but this is quite cool).

static void ParallelTask()
{
    Console.WriteLine("Starting Parallel Tasks");

    long[] sums = new long[2];
    Parallel.For(0, 2, task =>
        {
            sums[task] = ParallelSum(task);
        });

    long totalSum = 0;
    foreach (long sum in sums)
        totalSum += sum;

    Console.WriteLine("Total Sum {0}", totalSum);
}

We use the magic Parallel.For(inclusiveStart, exclusiveEnd, Action body). This will run each iteration – but in parallel. The body (delegate) is run once per iteration. So here the delegate ParallelSum will be run with arguments 0 and 1 (i.e. from and including inclusiveStart of 0 to exclusiveEnd of 2 – i.e. not 2).

I have created a local array variable to save the summations of each task and retrieved at the end of each task that summation. Once all the tasks have been completed a total sum is created. 

Modifying the main program to perform the parallel task as below

static void Main(string[] args)
{
    Task(SequentialInitialiseTask);
    Task(SequentialTask);
    Task(ParallelTask);
}

Giving output of



With the sequential task taking 1621 milliseconds and the parallel task 1296. In a further task (where I didn't do anything else on my laptop the results were 1588 ms against 844. Which is even more impressive. I am however concerned about the nearly six minutes to set the values in the array. So let’s make a parallel version of that. The main program will look at that

static void Main(string[] args)
{
    Task(ParallelInitialiseTask);
    Task(SequentialTask);
    Task(ParallelTask);
}

The task to initialise the array in parallel

static void ParallelInitialiseTask()
{
    Console.WriteLine("Starting Parallel Tasks");
    Parallel.For(0, 2, ParallelInitialise);
}

static void ParallelInitialise(int taskNumber)
{
    Console.WriteLine("Starting Parallel Task {0}", taskNumber);
    int start, end;
    if (taskNumber == 0)
    {
        start = 0;
        end = numbers.Length / 2;
    }
    else
    {
        start = numbers.Length / 2;
        end = numbers.Length;
    }

    for (int i = start; i < end; i++)
        numbers[i] = (short)DateTime.Now.Millisecond;
}

When you run this the results are



Showing a good speed up.

The splitting up of the problem (i.e. each task working with its own part of the array) is known as partitioning using domain decomposition. Here we have multiple tasks doing the same thing – each task handling a portion of the data. 

My processor has two cores and four logical processors – so the next step is to look at splitting the problem up more. I’d also like to look at the tools available to view the task running. Maybe more next time, if I get a few hours of fun!

Also need to investigate what thread safe really means, locking and a review of all the costs involved. The article mentioned earlier has a lot of detail – and hopefully I can get off the line with some simple examples. Although I need to dig out some other typical problems to solve.

Note: A server like a SP/2 was essentially multiple RS/6000 workstations networked together with a big fast networking switch – and a Silicon Graphics Origin has what could be described as a shared memory architecture, but each piece of memory will be associated with a processor – a processor accessing the memory local to that processor will get quicker access than to other memory. 

No comments:

Post a Comment