Array Distribution : TechWeb : Boston University

For many data parallel jobs, we often need to perform computations on arrays. A non-distributed array is one in which the whole array resides in a worker’s memory space while a distributed array is a single array with its parts distributed across workers in a specific fashion. For example, you may choose to distribute a two-dimensional array distinctly along its rows or columns; as a result, an array distributed across workers saves memory over replicating it on every worker. For modest sized problems, non-distributed arrays may be more convenient at the expense of memory usage. For problems with large arrays, distributing the arrays may be necessary. Furthermore, a distributed array may also enhance the computational efficiency (due to for instance improved cache usage) and communication efficiency (due to less frequent communication).

Non-distributed arrays

In the previous sections, we have dealt exclusively with non-distributed arrays. An array that has been defined on the MATLAB client, when referred to in the worker space, is automatically copied into the respective worker space as one of the following three types: replicated array, variant array, or private array.

A non-distributed array shares a common name across all workers and is typed in the MATLAB space as a composite array. It may either be generated directly with the composite command or indirectly through the context of array creation. Here is an example:
```
>> matlabpool open local 4
>> a = Composite();     % create composite array with 4 entries
>> a{2} = magic(3);    % use cell notation to create an array on worker 2
>> a{1} = randi(3); a{3} = rand(4); a{4} = ones(2);    % defines the rest
>> spmd, d = magic(3); end   % d is replicated on all workers
>> matlabpool close
```
Distributed arrays

Arrays may be distributed in two different ways depending on the application. The corresponding utilities for these are
- distributed
  This type of array is generated and accessible in the MATLAB client space. The arrays must be distributed along the last dimension. For example, for a two-dimensional array, the array is distributed along the second dimension, or the columns
```
>> c = distributed.rand(200);   % NOT in spmd; distributed along columns
```
- codistributed
  This is the most general form of distribution. Distribution may be along any of the dimensions
```
>> spmd
>> d1 = codistributed.rand(1000, codistributor1d(1));   % codistributed along rows
>> d2 = codistributed.ones(500, codistributor1d(2));   % codistributed along columns
>> end  %  end spmd
```

The whos command may be used to ascertain an array’s data type

>> whos
  Name         Size              Bytes  Class          Attributes

  a            1x4                1145  Composite                
  b            1x4                1145  Composite                
  c          200x200              1181  distributed              
  d1        1000x1000             1181  distributed              
  d2         500x500              1181  distributed

The byte sizes above only reflect the memory overhead for the arrays in the MATLAB space.

Ways to build distributed arrays
- Build by partitioning a larger array
- Build from smaller local arrays
- Build with MATLAB constructor functions

Tips

Sometimes, you may distribute an array in a two-step process:
```
>> A = rand(200);
>> d = codistributed(A);
```
However, when possible, it is more memory-efficient to generate d directly in one step.
```
>> d = codistributed.rand(200);
```
Partly due to backward compatibility considerations, there are multiple alternatives to accomplishing the task of array distribution.


Previous	Home	Next

Important alerts

Incident: Intermittent Internet Connectivity Issues Across Campus

Resolved: The Links are Unavailable

Non-distributed arrays

Distributed arrays

Ways to build distributed arrays

Tips