Meeting 1 Notes
From ESIWiki
Contents |
Day 1 Notes
- Title, Scope of the paper? grid versus distributed?
- Survey of Distributed Programming Methods, Abstraction and Applications?
- Has Dist Programming been mostly language focussed?
- Emphasise the interaction/coupling Gap analysis
- Existing models support representative set of applications...
- Application Driven Survey....
- Not just survey but a relationship
- Like to run in a distributed way?
- Drawing our pool of our applications in someway from applications are inherently distributed?? Something fundamental making it distributed?
- Dist versus parallel: fault tolerance in distributed
- "fault tolerance tree" is that a pattern??
- Is it an application issue versus system?
- fault tolerance tree in Erlang, used in telecoms driven by application
- Similar issue in "Space born" , but is middleware
- Isn't one of the major new functionality in distributed systems fault tolerance?
- Predictability in computing cycles, memory, sharing, network much greater
- Security and privacy?
- Legion project -- mechanism to implement security
- project/middleware level.
- parallel is a subset to distributed
- parallel anything that can be done on a cluster -- which is a part
- Dedicated resources?
- predictability?
- From Katz's diagram: what are the properties/interactions that applications will encouter?
- Throughput systems -- exchange systems? web server? would they be a Berkely dwarf?
- Application characteristics?
- PT: using programming language perspective, easy to distinguish b/w implicit versus explicity and when you use implicit parallel you're very similar to distributed!!
- MC: simplest model of the server system is parallel server like, long running applications there are distributed issues e.g., structure, talks to databases etc
- there is an unparallel component plus a distributed component
- DK: Server is a class that they don't have.
- Client-server: RMI, RPC, P2P
- Distributed Dwarf
- Dwarf: Algo method that captures a pattern of communication and computation.
- "structured composition of primitives"
- Supervision-and-monitoring
- Client-server: RMI, RPC,
- P2P (is like linear algebra)
- MC: Would they generate an application? Would they generate a programming construct that the application
- Want a random walk, what do i need to tell you? Flooding?
- Survey of Distributed Programming Methods, Abstraction and Applications?
Day 2 Notes
- Defining the Vectors of the table [Jon to take first charge]
- What is the focus of the Application Set?
- All Distributed Application vs narrowed-focus?
- Scientific ? Computational Science?
- Compute Intensive, Data Intensive?
- communication? network intensive?
- End-user (?) programmable
- Web/Internet Services ? TK: This is an implementation technique, not
- What is the method of distributing and/or coordinating the distribution?
- Data-mining ?
- Amount of data, compute and transfer?
- DSK: Trying to limit isn't going to be successful! And Thilo did not say that!!
- What is the data? What is the processing on the data?
- Large-scale: computing, data, data-rate transfer, number of components, scalable, non-trivial....
- What is happening on the back-end should be considered, e.g. Expedia?
- Those applications that are not-trivially done on a dedicated petaflop machine
- Aim is to find some of the patterns and some of the gaps... here is a reduced set that enables (?) us to do so..
- Set of Applications (Application classes) for which we have expertise?
- Is it a distributed System or is it an Application?
- What do we want to exclude?
- E.g., Talking to Google? What happens inside Google..
- Internet telephony (skype)? Why exclude this if this is Large-Scale? (because this isn't scientific computing?)
- Distributed Gaming?
- All Distributed Application vs narrowed-focus?
- Musings on the Paper
- Audience?
- CS PhD Students
- Developers of Programming Environment
- Application Programmers (non-CS)
- Audience?
- Contribution(s)
- Critical Perspectives <---> Gap Analysis
- Proposal of Pattern/Structure Design space for Distributed Application
- How are DA programmed? How can be programmed? What is needed to overcome? Make the "are" converge with "can"?
- Is the current "problem" a) technology (programming tools) limited? b) limitation of our understanding of our applications? c) is it a fundamental limitation?
- Contribution(s)
Day 3 Notes
- Omer to begin discussion with ACM Computing Surveys
- Definitions and Concepts (Andre)
- Need to stay away from Design Patterns such as UML... say explicitly what is not a pattern for us
- Need a second example of Programming Model (SPMD)
- Streaming as a pattern? or a programming model?
- Something does not have live in a single category. e.g. streaming can be both a pattern and a model..
- Other categories/examples that highlight overlap
- Discuss "distributed" examples rather than just simple parallel examples
- Decoupling and layering has performance issues. Needs mentioning.
- Aspects: distinguish from patterns. Is it common "functionality".. Cross-applications
- take file sharing, like bit-torrent, kazaa and analyse in the context
- Pattern: discovery, file fragmentation
- P2P is a programming model
- Abstraction: file
- Parallel model/pattern is closer to abstraction, than say in distributed world ??
- systems versus application models/patterns/abstractions..
- Need more examples, possibly more specific to the distributed computing domain..
- Definitions and Concepts (Andre)
- Application Focus Session (Thilo)
- Is performance of necessity? Or is it distributedness...
- scalability in other directions -- users, sites, heterogenity..
- Need to identify the design-space, and then see how our App Classes fits
- "Large amounts of data that are moved around" -- probably not captured
- Say that the examples we have chosen cover the models/patterns/abstractions that are in use from other applications e.g., we don't discuss Skype but the abstraction and model it (Skype) uses are covered.
- Four-categories of P2P [Omer]
- Semi-centralized (napster, seti@home) neither scale nor are resilient
- structured decentralized, ocean-store, pastry(!) provides reliability
- unstructured decentralized e.g., gnutella
- supernode overlays (hirearchichal) e.g. Kazaa
- Discuss Applications that fit into each category
- P2P (gnutella, kazaa, ) as a case study of our classification.
- Client-Server 5 classes with examples
- Afternoon Session
- Section 4
- recurring problem (brokering problem under fixed resources) versus recurring solution (pattern)
- Remember: Applications view, e.g., if applications use the components of brokering in the same way.. then that is a pattern (programming/application)
- Co-allocation: Find a set of resources that match a common slot; application requirement is to find such a set; are there common techniques that emerge when co-allocation is required in different contexts
- Visualize a data set, output of simulation -- which can't be done on a single resource. Requires co-allocation, which in turn requires brokering. So is there a pattern? But this brokering/co-allocation is a feature of the hosting environment.
- Task Farm is a pattern; parameter sweep is an implementation of that pattern. How do people perform that pattern? Using Condor? Need to look at Applications to uncover this..
- One example, of patterns for distributed computing -- @HOME, "fan out", one of the parameters would be "how to determine when the answer is satisfactory" (stopping criteria)?
- parameters to these patterns could be performance specific and not just application specific (say if # of processors available becomes less than X, do not compute)
- Is Voting a pattern?? [Jon] (in the context of task farm)
- interesting as a pattern if there is at least one parameter that needs to be programmed
- for BOINC there is a "system-hook" that provides a programmable consensus capability.
- Pattern is @HOME, is the pattern and the parameter is the termination choice [MC]
- Why is @HOME a different pattern than "conventional" Bag of Tasks?
- @HOME seems to require some kind of consensus approach because of the hosting environment...
- @HOME seems to require _replication_
- Q: Why this pattern is (possibly) a unique Distributed pattern
- A: arises from "fault tolerance" bullet (Need to clarify..)
- Latency Hidding for Distributed App/Computing
- caching, name-resolution, replica are latency hiding techniques more commonly used in distributed systems
- e.g, music/file sharing, where to place files versus queries..
- Coarse grained, bulk-operations
- speculative versus redundant
- uncomfortable with bulk-operations as a pattern [MC]
- doing the bulking is semantically safe..
- bulk transfer more as an optimization technique
- which may only be applied when the programmer has enuf knowledge of the application to make sure its safe... and the pattern aspect is that if the optimizatin is applicable that relieves the programmer of the responsibility of verification/checking for semantics ?!
- Omer: All patterns need validation/pre-checking?!
- e.g., pipelining is the pattern(!)and bulking is the trick that can be invoked for pipelining
- Gap Analysis: A first pass...
- "If I tried to implement Application X using Model A, would I succeed"
- Previously missed programming/application construction opportunities
- Support for Application by providing abstractions
- Will people write applications if "patterns exist"...
- Possible Usage Scenario: Don't want to rewrite hurricane modelling code want help to compose it...
- Identify the patterns
- This should/will inform the tool developers (intermediate user) and possibly not the end-user (application)
- Provide examples of the kinds of tools that could possibly be developed to support applications to use the identified patterns more naturally..
- for example does your favourite workflow enactment engine support bulk operations?
- "What is the support for these patterns?"
- Other possible analysis/assessment directions
- Could we construct novel distributed algorithms from i) patterns ii) programming abstractions? Case study using Galaxy simulation, Triana and Patterns
Day 4 Notes
- Morning Session
- Yesterday we _identified_ two distributed patterns (i) Task Farming with Consensus & (ii) Pipelining. Can we identify more patterns by eye-balling the list of applications in section 2 ?
- What are the "Vectors" for defining Distributed Patterns? (or are we really talking about dwarfs again, and not patterns as defined in the paper -- section 1)?
- Fill in the "System Properties"/Distributed Characteristics of the Applications in Table 1..
- Notes
- Formalized relationship b/w interaction and computation (e.g., process algebra)
- Process Algebra: A way of writing a series of concurrent events and their interaction
- Easy to write for Master/Client and Pipelining
- A Fundamental Vector: Interaction vector
- Patterns here is a reference to something that can be put into a library and not from a s/w engineering perspective
- Could also do a UML Diagram
- Some Workflow Patterns:
- Star: Like task farm
- Ring:
- Facade: wrappers, say fortran code and present as C e.g., double precision ...
- Master/Slave:
- Streaming:
- Contract:
- Obs/Subscriber/Publisher:
- The above are classic design patterns!!
- The above could use application based
- Some of the above might be able to exploit programming skeletons...
- To define Programming Skeleton: not just a collection of primitives, but usage mode is required
- Architectural Pattern to describe the coupling: So now go inside the box to describe how to do coupling?
- How is Coordination handled?
- Does loose versus tightly imply a coordination pattern?
- Centralized, synchronized coordination
- Event based coordination
- Distributed coordination: each code has its own execution strategy, where each component runs ignorant of other components but periodically exchanges information/state if they need to
- Does loose versus tightly imply a coordination pattern?
- Ownership Issues:
- Scheduled allocation: Two executables running where one relies one the other. Not strictly co-allocated, but coordinated allocation;i.e. there is a time dependency (analogous to frame-rate scheduling)..
- Who can send data to whom?
- What is the Site Policy? When do you need availability
- Or is Co-scheduling (more tolerant) OK? Is co-allocation of resources necessary? Policy of resources and co-allocation?
- Communication: Frequency, Size of message & Tolerance to delay (latency) in message exchange between components. All three of the above could be time-dependent/dynamic quantity! Coordination strategy is fixed. Loose coupling could also mean tolerance to other tasks being completed...
- Montage is loosely-coupled. What does that mean?? Montage is typically toleranct to delay. There isn't an equivalent to data flow.. with the entire job waiting for the message to be delivered..
- Redefined Vectors
- Communication: Degree of tolerance to say, event delivery, message size, frequency..
- Coordination: Determine dependencies. how do we manage to execution of tasks that are distributed?
- Ownership: Unique to distributed systems.
- DAG could be executed either centrally or distributed
- Montage components owned by same 'person'; but who determines if it can be run?
- Functional Specification of "LEAD" project and "SCOOP" project would be similar. Thus we don't need to go into the specific details but we are OK at the level of Application Class.
- Separate into "programming/application-level pattern" and "resource or scheduling abstraction/patterns".
- Co-allocation is one such resource-level pattern.
- To a first approximation, it appears that application-level patterns need to be cognizant of the resource-level patterns
- Formalized relationship b/w interaction and computation (e.g., process algebra)
- Afternoon Session
- Pattern suggestions, arising from "what is unique to distributed systems versus parallel systems":
- Fault Tolerance
- Security, privacy
- Replication
- Scalability versus Throughput
- External Administration (how is this different from ownership that we discussed earlier today?)
- Pattern suggestions, arising from Vectors characterising distributed applications:
- Characteristics of the Application Hosting Environment; Heterogenity of the Hosting Environment
- Information services?
- Is there a fundamental dependency on the deployed systems? e.g., QoS
- network characterisitics such as HD yes... others?
- Jha: Concern that section 3 is not "linked in" with the section 2 and 4
- Fault tolerance in a distributed sense?
- More complicated than parallel; non-shared
- Certainly a gap; we are unaware of distributed checkpointing system that works across heterogenous system (as a whole)
- there are distributed checkpointing systems based upon consensus algorithm
- Not to check the entire distributed application, but "what" to checkpoint
- Are there different application classes with 'unique' checkpointing requirements? e.g., Master-Slave has a trivial requirement -- none (just rewind)
- To achieve fault tolerance: how many times to replicate?
- techniques Primary, secondary, shadows... many different levels
- Transactional Systems require Fault Tolerance
- Distributed System Fault Models
- Malleability(?) helps with Fault Tolerance.
- Security: There exist Security Models
- Might have models for Access Control
- http//SecurityPatterns.org
- Security patterns, if any, would help in the deployment of distributed applications. Ideally the security model (AC) should not influence the design/development of applications... if it does then
- Pattern suggestions, arising from "what is unique to distributed systems versus parallel systems":
mention in Critical Assessment..
- Replication: A means to an end. For fault tolerance, performance..
- What needs to be replicated? When to replicate? Replica placement?
- Content-distribution system: e.g., bit-torrent
- probably some patterns for data systems
- replicated-computation versus redundant computation?
- just an optimization technique?
- frequency of replication, number of replicas etcs, have to be a dynamically determined parameters! this is something to mention...
- similar to (and with the same problems as) caching
- Scalability and throughput
- when you saturate, the throughput doesn't collapse
- there might be a pattern for scalability (obviously
- Replication: A means to an end. For fault tolerance, performance..
application/context dependent)
- Information Services
- Globus-MDS access pattern(s)? [J Schopf]
- Service Discovery??
- Which access method to use?
- Which registry to use?
- Service Discovery pattern(s)?
- How to propagate the query?
- Lack of a commonly agreed _way_ of accessing SD...
- QoS:
- Patterns within the SLA...
- Contract mayhave patterns..
- Information Services
- Closing Thoughts
- Paper: How do we feel? Gap Analysis of the paper (as it stands..)
- Section 3 (in isolation) vis-a-vis 2 & 4?
- Correct, but not a problem..
- Gaps in programming models (!) without talking about patterns
- Homework: identify specific work items in the paper
- Next Steps?
- identify work items and then get down to the business of addressing them!!!
- Paper: How do we feel? Gap Analysis of the paper (as it stands..)
Day 5 Notes
- Morning Session
- Paper Work Items
- Section 1: Introduction [Jha]
- Lots & Lots of work...
- Definitions and Concepts [Andre Merzky]
- Section 2: Applications [Katz]
- Application Table on Page
- Additional Entries [All..]
- Try to stick to one page?? [Jha: No]
- Description/Discussion of the Application Classes, connect with "scope" in the Introduction [Jon, Jha]
- Discussion of the "vectors" [Jon, Jha]
- Application Table on Page
- Section 3: Programming Models and Abstractions [Omer]
- Composition Group Table (3)
- specific questions...
- Composition Group Table (3)
- Section 4: Patterns to Support Distributed Applications [Murray]
- Section 5: Critical Assessment of Programmming Distributed Applications [Jon]
- Section 1: Introduction [Jha]
- Timeline:
- 17-20 Sep (Austin): Dennis Gannon, Manish Parashar, Craig Lee (?), Dave DeRoure, Reagan Moore (?), Radu Prodan (Innsbruck, ASKALON), Ann Chevarnak/Ewa Deelman.....
- 15-20 Oct Week of OGF
- 24 Oct First Draft
- 31 Oct Workshop
- Possible Targets
- ACM Computing Surveys
- ACM TOPLAS (Transaction on Programming Languages and Systems)
- IEEE Transactions on Parallel and Distributed Systems
- Concurrency and Computation http://aspen.ucs.indiana.edu/CandCPandE/
- APS/IEEE Computational Science and Engineering [A modified article?]
- Scientific Computing
- IEEE DSOnline
