This wiki service has now been shut down and archived

DPA PDA Paper

From ESIWiki

Jump to: navigation, search

Patterns for Distributed Applications

The aim of this paper is to determine and classify the common Distributed Application Patterns (Distributed Dwarfs?). Also given that we believe this is the first time that this is being attempted with scientific applications as a focus, we would like to inquire what are the vectors that should be considered in defining a Distributed Application Pattern?

I love the Berkeley Dwarfs paper! However, I think that they have got a little too loose in some of the detail. In particular, they state that "A dwarf is an algorithmic method that captures a pattern of computation and communication", which I cheer, but then the actual patterns quoted are too vague for me. Where is the pattern in "Dense Linear Algebra"? The communication patterns they show with the little 2D plots are really nice, but they are too static I think - a pattern should capture when as well as where. I'd like us to be tighter in this respect.(Murray)

What is a pattern anyway? There seem to be two competing definitions:

  • the less formal one, from SE, where a pattern is a repository of collected wisdom on some problem/structure class
  • a more formal one, where a pattern is something which can be both given a precise semantics, and presented as a concrete programming construct (library call, or whatever is appropriate to the given programming model)(Murray)

Orthogonal to this distinction is the question of the qualification for "patternhood", for our purposes. I suggest, as a bare minimum a pattern must both

  • have more than 2 instantiations in the application set
  • be complex "enough"to be non-trivial to implement directly, and/or to offer scope for non-trivial optimisation(Murray)

What are the common Distributed Application Patterns?

  • Client/Server
  • P2P
  • Master/Worker - A single master controls a set of workers. Conceptually, the master spawns a worker to do a task and gets the results back from that worker. Workers do not communicate with each other. The only ordering that may exist is in how the master chooses to spawn workers for specific tasks. Often, many workers are working on the same task with different input parameters, and the master is assembling the outputs into a some concatenated or derived product.
  • Pipelining
  • Dynamic Application Deployment
  • Distributed Database access
  • Data Distribution (also known as geometric decomposition). Here, a data set having some mapping into a space is divided as the space is divided. Each division of the space is a process, and each process is the owner of the data in that division, responsible for providing the data to any other process that requests it, and responsible for updating is as requested by other processes or by an overarching algorithm. Data distribution is usually an implementation of a SPMD model, where there is a single program or algorithm being implemented on all the processes, and only the data on each process is different. Ideally, communication here is local, meaning the each process should only require data from neighboring processes (processes that own neighboring divisions of the space), though there can be non-local or even global communication in some instances.
  • Workflows

Also, see here for parallel programming patterns.

What are the (eigen)vectors for defining a Distributed Application Pattern?

  • Communication patterns
    • type of comms: message, files, others?
  • Computation Patterns (and the temporal links between computation and communication)
  • Services
    • Resource allocation, Replication, Recovery, Load-sharing etc.
  • Form of data processing
    • Structure versus unstructured data;
    • Data Control
  • What more?
  • Decomposition of Data vs. Replication of Data?


  • Useful Reference: Fethi Rabhi et al: Extend/distill their classification and then cast their pattern(s) using the vocabulary of scientific applicatoin
Views
Navigation