This wiki service has now been shut down and archived
Trust and Security in Virtual Communities Third Workshop Notes
From ESIWiki
Trust and Security in Virtual Communities
Notes From The Third Workshop: Trusted Services: Requirements and Prospects
Edinburgh, July 8th and 9th 2008
Introduction by Andrew Martin
- Good progress has been made in e-science, but work is still constrained by a lack of technical understanding.
- Questions in identity management and federation.
- One aim for the workshop: find real examples and use cases which would benefit from emerging security technologies.
- Another aim: Can we engage more application groups in these meetings?
Summary of previous workshops
- Poor usability in AuthZ AuthN due to certificates.
- Pioneers can make the technology work, but other people can't. Pioneers are often guilty of cutting security corners.
- Need for scalable solutions to reduce support costs and let people of all technical abilities use the grid.
- Need for increased usability for developers. In order to put together workflows, you need common AuthZ and AuthN.
- Best practice guides and published recommendations are in short supply.
- Poor interoperability
- Some common problems without general, off-the-shelf solutions as yet. E.g. anonymisation
- The good news is that people are doing risk analysis. However, the threats are not always rationalised or understood.
- Lots of people rolling their own solutions, lack of re-use of good systems.
Comment: Social engineering has been used at medical establishments to get patient records for insurance companies. The very best medical research institutes lose 20 records a year. In the UK it is more like 150. It's easy to overlook real risks like this
Some themes and concerns
- The NGS cannot be used for sensitive data due to the lack of guarantees for confidentiality and integrity.
- Some people require confidentiality of parameters to routines or searches.
- Service reputation difficult to establish
- Provenance. Scientists running queries have a problem. DB Queries may not be repeatable! This is because the databases change, and it's quite possible bad queries insert incorrect data.
- Assurance. Gaining assurance that a service really does encrypt data when sending it to another service.
Comment: Is this one of the reasons that the liberty alliance failed?
Comment: Medical data *is* money. As a result accessing it is valuable. Even hosting companies are charging money for you to access data held by them. There are complex ethical and IP issues here.
All the problems involving PKI for usability in grid systems are slowly moving away, as certificates are abandoned. However, this is still a problem if you cross national borders. Certificates are being moved out of user hands and into repositories.
Comment: Hiding certificates is a really hard engineering problem, as what's "under the hood" often dictates usage. For example, certificate timeouts will drop grid jobs, so users have to be aware of what's going on. Comment: Putting certificates in users' hands can be done successfully, but only with power users. Exception not the rule. Comment: 2 Levels of trust, first joining the grid, then being able to trust the services offered. We could do with more quantitative metrics about how trustworthy a service is.
- Lots of barely-compatible alternatives, which makes monitoring who is violating policies very difficult. We need to give levers to the right people to enforce controls.
- Best practice guides are needed. Avoid "over doing" security and the opposite.
Open Question
Are security (assurance) requirements a continuum? Or do assurance levels go in steps? One dimension probably over simplifies the issue. Diagram.
Comment: Assurance goes in jumps, not a smooth continuum. How big the steps are is difficult! Physicists do have high integrity requirements, but possibly not confidentiality. The strong perimeter model works for them. On the other hand, companies have high confidentiality requirements if we intend to persuade them to share machines with competitors.
Comment: There is a lack of skill/knowledge for security in the medical domain. There are also problems in primary care, as GPs only have 7 minutes to see a patient, no time for complex security requirements.
It would be great if the assurance continuum had smooth steps, as then we can offer incremental security measures. Big jumps will make it harder to persuade people to use an experimental grid system, they can't trade off the risk as easily.
Trust
- We already trust services. What we need to do is make the grounds for this trust explicit. There is a middleware problem.
- Grid/Cloud abstractions are incompatible with accountability. They have not been engineered with provenance in mind. This makes chasing people difficult, and the more complex the system, the more difficult it can be.
- Does trust sometimes begin and end with signing certificates?
- And yet we need grid services, and if grids were to get a bad reputation (due to a lack of assurance) it would be a loss to research.
- Definition of trust. TCG Definition about predictable behaviour.
- 3 Steps for establishing trust, as defined by Graeme Proudler. Software must be unambiguously identified and behaving normally. This enables the assessment of trust. Last step is (by Proudler) based on previous good experience with the software. Alternatives are formal methods, certification. RFC 2828.
Experiences with Developing ePCRN Projects Security for Handling, Dealing with and Accessing Medical Data (Adel Taweel)
This is not a typical healthcare application. It is focused on extracting information for clinical trials in primary care.
Problems
- Finding Subjects (this is the biggest issue). Patients are not all in the same clinics, unlike in trials on a specific condition. It is therefore harder to identify common-case patients, and as a result most studies fail. The ones that succeed have huge expenses, employing nurses and staff to find the patients.
- It is then difficult to persuade patients to come to clinics, run the clinic, collect data and then manage it. Big problem that most data is in paper form, making analysis more difficult.
This project is trying to facilitate recruitment by establishing secure distributed query processing. It also aims to help with data retention and study design.
It is at least easier than in the US, as GPs tend to use one of 5 systems, and 50% use the same one - EMIS.
Problem with data collection. Data is created in lots of different places in the NHS. Every clinician will typically reassess a patient, as they either don't have the previous clinicians data or they don't trust it. It also may be quicker just to ask the patient. As a result the data is spread between hospitals.
Clinicians are reluctant to give out data. So instead this system only queries for counts, the number of records matching a search. All queries are performed on anonymised data. Having established if many patients exist at a clinic that match the criteria, the researchers contact the clinics and negotiate. Someone at the clinic (a nurse, possibly) then finds out who the patients are and tries to recruit them to the trial. If the patient is willing, contact information is then given to the researcher.
For the front end, a "what-you-see-is-what-you-access" RBAC model is used.
There are payment issues when negotiating with clinics. Some will want to negotiate for fees in order to let you use their data. This needs to be done with minimal fuss.
Issue: Doctors wont let you use their live systems, for performance concerns. So a complete replica must be set up. This is connected to a gateway, giving researchers some access. How do we make this secure?
Ideally, it would be much nicer if records were collated together within Primary Care Trusts. However, this is unlikely.
Connecting for Health. There is a system for connecting JANET to N3 (NHS Net). However, it is currently only one way, so NHS can establish a connection with JANET, and not the other way around. Proposal to make the reverse possible.
Issue: There are 8500 clinics in England alone. Major management and security issue.
A grid solution was adopted. Use digital certificates to secure connections between clinics and research workspace. Access controls to control which records each individual researcher is allowed to view. The system basically maps the certificate to a database username and password. However, this mapping is difficult to maintain if it is replicated over 8500 locations.
Problems with short-lived proxy certificates. This is a management issue. The gateway stops working when the certificates expire, which is a problem. Can extend the life of certificates, but that is not ideal. (Comments: Some disagreement about how the proxy certificates work).
Comment: How does the "counts-only" policy get enforced? It is controlled at the front end, and then globus does the access control. You potentially could break the front-end and run bad SQL, but the database is anonymised anyway, so it would not be very helpful. Comment: A similar project is being undertaken by Queensland Health.
Next Stage
- Live links to Clinics.
- A clinical trial system that is easy to use and has access control.
- Secure transfer of patient data.
No good off the shelf solutions.
Shame that there are so many people trying to cash in on the collections of patient data. Some GP computer systems, charges money.
Once data is being taken in trial, the NHS is no longer liable. However, the danger to loss of reputation is still real, and any publicised data loss will impact popularity of future trials.
Jens Jensen
Motivation. We need security for a number of reasons:
- We don't want to lose it
- The data may be confidential
- The work you are doing on the data may be confidential
- The data may be valuable because it took a long time to process, and the resources used are expensive.
Who needs security?
- The owners of data, or the controllers.
- Resource owners may want to avoid being held liable for data breaches, or may be responsible for making sure resources are used efficiently.
Where should we be processing data?
- Clouds, Grids, distributed systems, desktops. They all need security
- Clouds seem to have a more intuitive, easy to use interface than grids.
Side note:
- Suppose encryption and computation commute? So a grid node can do the work without ever needing to decrypt data.
- fE(d) = Ef(d)
- Comment: This has been studied before, called "Encrypted Computation".
Old chestnuts:
- Security in Depth. Complex security systems can be bypassed through physical access. Local data access protocols are insecure, and can let you pretend to be anyone.
- Consistency
- Application level interoperability. The applications needs to know how to access data, even when it is encrypted. Need for transparent ways to use legacy applications.
- Need to be able to trust that your program was written in a "secure" way.
Applications and APIs:
- They need fine-grained access control
- Some provenance is needed, we need to know that the result of calling a remote computation will be correct.
Licensing issues.
- The resource owner sometimes needs to restrict who can access what. E.g. commercial users need to be restricted from running academic-licensed applications.
- Service providers must make best efforts to see applications are used within licences. Should the efforts be bypassed, everything needs to be logged.
Attribute Authorities:
- Anybody in principle can set up a VOMS server. How do we know it can be trusted?
- We need to know the people who run them.
- Untrustworthy attribute authorities have the potential to be very damaging
- We need best practice guides.
- At the moment there are only 2 attribute authorities, but soon every IdP is going to be an attribute authority.
- Attribute translation is an issue. All departments use different naming conventions for different things. There is an initiative to standardise.
Question: Should other people (beyond this group) be concerned about authorities and attributes?
Challenges:
- Need to maintain signatures, keys and passwords for a long time.
- The crypto algorithms are changing. Legacy applications, and even browsers, have limited support for new algorithms.
- Browsers only support MD5 and SHA1.
- Secure applications have to maintain old security algorithms.
- Legacy applications are a problem.
Managing grid jobs:
- The Service Provider needs to account for resource usage, both in CPU and wall-clock time.
- VOs need to know if one user is eating up all the quota on a grid.
- Provide a safe environment, a sandbox, for potentially bad applications to run.
- Applications may want to reuse the results to previous grid jobs.
- Jobs forking or loading external applications with different user IDs are a big problem. What if data is saved with a different username and becomes inaccessible?
Interoperability.
- We need to be careful that systems are not built so securely that they can't interoperate with other programs.
- We should use standards and conversion processes. Use frameworks that users are already familiar with.
Levels of assurance
- Mostly about authentication.
- US Government has 4 levels of assurance, UK grid service decided to use the middle level.
- People consider these problems solved? Perhaps because higher-levels of assurance are not currently requested.
- Still problems though. OpenID, for example, some servers are definitely better than others. The same is true for Attribute Authorities.
Existing work
- A lot is known about local security. It is still not easy or trivial.
- Secure programming practices are known. However, non-computer scientists often don't know them.
- Local systems can be made (to some extent) trustworthy in their original context. However, put on a grid many of the assumptions break.
Example systems - caBIG, IGTF, XtreemOS, JSPG, TrustCom, GridTrust
Adapting applications. We want to be able to run legacy applications on the grid. This involves tweaking libraries. In open source, this is possible. But some we cannot alter. "Gridifying" is actually not much harder than enabling something to run on a distributed system. However, people rarely think about security until it is necessary to pick up a credential to make something work. Many different applications encounter the same problems, but have no common solutions.
Rethinking Applications
- Can we use TPMs?
- Should we escrow data to avoid problems if encryption keys are lost?
- Should applications be signed? If so, we need a framework for this.
We would like to encourage "paranoid" users to start using grid systems, for the interoperability benefits.
The Role of Deception:
- Users running fake (or anonymous) jobs.
- Service providers using honeypots.
Suggestions for solving some problems:
- Need a framework for understanding the risks.
- "Trust" requires special attention. We should create rewards and punishments for end users.
- We need to overcome the "security is hard, I'll do it later" attitude.
Question: Would Virtual Machines be useful? At the moment an individual user has their own linux login account. The security advantages of virtualization are not obvious right now, and there are performance and manageability issues.
Question about SLAs and reputation systems. Currently nobody uses reputation systems. They could be used with uptime and availability as a metric?
Open question: Who is pushing security in e-science applications? There was the security taskforce which made recommendations to the research councils. However, they had little power to enforce these recommendations, and so it just became advice.
User Controlled Dynamic Collaborations (John Zic)
From the Network Technologies Laboratory. Focus on building systems that facilitate secure, trusted collaborations between people. Scenarios include telemedicine and mining. Fundamental assumption of a low latency, high-speed network connection.
CeNTIE Project
- 6 Years
- Collaborations with Digital Media, the people who did The Matrix and Happy Feet.
- Collaboration with Health, enterprise, regional, and 2 Australian banks.
- Focus on trying to help people, so the project is pulled by people with specific problems.
Braccetto Project
- Facilitating teams of teams around a shared task space.
- High demand for a finished product.
- Uses shared drawing areas and video conferencing.
Secure transfer of medical records and the TED Project
- Secure Transfer of medical records using TPM chips (perhaps a world first?).
- Augmented a privacy preserving system with TPMs.
- Problem with attestation. Software frequently changes due to upgrades and running different applications.
Trusted Services Project
- Trust between collaborators cannot be tied to a particular machine. People need to be able to use their laptops or tablet PCs.
- User defined e-contract based system. User defined configuration of SOAs. Implementation all done on a custom network.
Dynamic Collaboration Service
- Hierarchy of Virtual Service Operators.
- Each VSO can offer distinct storage, operations, (and so on) facilities and features.
- Users can exploit many different services at once, if one alone cant meet all their requirements.
- Resource collaborations are necessary. It needs to be possible to set these up quickly, and for a short period of time. Perhaps a few days, or even hours. At the moment this is very difficult. Need to automate as much as possible.
eContract life cycle
- Creation, negotiation, validation (check a contract IS possible at every level), instantiation (contract is maintained, need to handle exceptions, people joining and leaving, people doing bad things), termination.
- Can't get away from policies. Need to be decided ahead of time, then people need to be held accountable to them. People agree on these contracts and THEN put resources out there. Nobody else can even see the resources.
Future Work
- Formal semantics of eContract
- Continuation of protocol work
- Trust Extension Device – commercialising.
TED And the Attestation Solution
- Instead of having repeated failures due to changing environments, have a tailored, known environment.
- Boot from USB stick with its own OS and applications. It runs a Virtual Machine (QEMU) and has a software TPM.
- First PrivacyCA in the world.
- Synchronous trusted email clients, both parties need to be running this USB stick.
- Discussion of putting the TPM on a USB stick, apparently some manufacturers are doing this.
- Not perfect model, though, as CPU bios still needs to be trusted.
- A lot of interest in Gumstix USB devices.
- Interest in vTPMs. Work is being done at IBM and HP to develop these: Usenix paper on vTPMs
Question: Doesn't a USB TPM just encourage hardware attacks? Yes, but this does at least remove one class of attack. This is ultimately an unsolvable problem. Furthermore, the owner of these devices should want them to be used in a trustworthy way.
Provenance and Security (James Cheney)
Definitions and Motivation
- Provenance: a record of origin. Evidence of authenticity or quality. Certification.
- Provenance is valuable because it's hard to do, and assigns blame or responsibility. Necessary in a number of scenarios.
- With paper, you have a paper trail and you can judge a book by its cover! With data this is not the case. Its very easy to forge or alter data.
- Lab scientists produce results and put these into databases. These are then reused and queried to make new theories. However, what if one record was found to be invalid? This would potentially alter a number of papers based off this work, and has a recursive impact. Work based of databases is currently unrepeatable!
Databases
- Many databases are curated and have someone monitoring them the whole time. This is very expensive because it requires a skilled worker.
- Automated databases are considered less reliable.
- NIH funds (bare minimum estimate) $15m per year.
- Database records are copied from other database records. Some curators make different decisions about what to copy and what to record.
- Provenance is done manually, perhaps with a link or note about where data came from. This is labour intensive and fragile. A lack of interoperability makes this difficult to automate.
Solutions in the Databases World
- "Where Provenance" shows where each tuple was copied from. This deals only with queries. However, most curation deals with updates so this is not very thorough.
- "Copy-paste Provenance" covers insert-copy-delete operations. It has its own provenance link semantics. Provenance trees for each operation. Links between trees record a complete history of how a database has changed over time. This can be stored and queried very efficiently.
- "Why Provenance" shows the set of inputs which contributed to the final output. See also "Lineage".
- "How Provenance" give an expression showing how the tuple was obtained from input. Provenance records are then terms and can be combined.
- Very little understanding of the relationships between provenance models. No formal understanding of where to stop with provenance.
Workflows
- Computations are packaged into workflows.
- The workflow engine executes the program. This can happen in many different orders.
- If you want to repeat the execution, or find out why something went wrong, provenance tracking systems should know what happened. The dependencies should be show. This is useful for finding bugs.
- Little understanding of the provenance policies and specifications that these systems should be enforcing. What properties are desirable?
Security and Trust
- In science, users are assumed to be cooperative. Maybe this is unwarranted? Human error, attacks, fraud, auditing.
- All of the earlier work relies upon authentication and logging.
- Information Flow Security is perhaps the inverse problem: non-interference.
- There is a common problem of trust, quality and data integrity. We need mechanisms for solving these problems.
Information Flow security
- | Denning & Denning
- | Sabelfield & Myers
- Prevent disclosure via covert channels.
Security Auditing
- Sarbanes-Oxley regulations about keeping/destroying data.
- Auditing/record keeping problem is very similar to provenance.
- Recent work by Vaughan et al, Evidence-based audit, CSF 2008. | Recording proofs in a trusted way
- Some aspects of provenance are a security problem.
Provenance Theme:
- URL: Principles of Provenance
- There will be a security-oriented workshop in Q4 2008 or Q1 2009.
Question: Do people use write-only storage?
Question: in databases, provenance is automated because you already have some rules and structure to data. More free-form data would require a semantic web type idea. What about for distributed databases? In a grid context some of the nice links within the databases are broken, so it's harder, but perhaps still easier than a complete free-for-all. We need to get good solutions for the simple case first.
Question: is it worth trying to understand the boundary between security and provenance. Do you need non repudiation? Do you need to draw a clear line? Part of the purpose of the theme is to bring the disciplines together where appropriate. Plenty of payback here. Wikipedia, for example, has quite weak data provenance. If security people learn more about provenance we could do something better?
Towards a Trusted Grid Architecture (Andy Cooper)
Motivation and Background
- Presenting a solution to the malicious host problem in grid and (potentially) cloud computing
- The general situation involves a computer which is remote to the user, but the user needs to trust it.
- Users need to be sure that system administrators are not watching the data, and that all the right patches have been applied
- Trust asymmetry. Need to protect the user against the OS as well as the OS against the user.
- Middleware problem. Middleware is common on many systems and is absolutely huge. Globus is roughly the same size as the Linux Kernel. These are the the same properties which have made the OS the target of attacks previously.
- Globus security layer is the most complex part of it and is therefore likely to contain vulnerabilities!
- Complexity undermines the security "features".
- 23 vulnerabilities listed in version 4. Growing number of exploits due to increased value of data and increased features. Expect an explosion of vulnerabilities.
- Dangers of delegation. You must delegate your credentials to lots of systems. Vulnerability in ANY of these systems destroys all the mechanisms.
Proxy credentials to solve the problem?
- Not a great solution
- Giving away a password for even 12 hours is not ideal.
- Can't reduce the expiry date of the certificate for usability reasons.
- Attackers can repeatedly steal the credential, Expiry dates are just an inconvenience.
- Issued credentials cannot be revoked.
Existing Work
- Current state of the arts adds new Trusted Computing services to existing middleware.
- Enhancing Grid Security Using Trusted Virtualization
- There are a number of problems with this approach.
- Problem 1: Interoperability of grid middleware. The system depends on everyone running this security stack. Upgrading an entire grid is probably infeasible.
- Problem 2: attestation of unreliable, insecure software is probably pointless. Furthermore, due to frequent updates it is hard to derive meaning from measurement changes.
- Fundamental problem. You know you are misusing attestation when you attest anything complex.
Suggested Solution: The Job Security Manager
- No complex middleware stack needed.
- The user provides the middleware alongside the stack. All the server provides is a VMM.
- Each grid job has two Vms, one for the job itself and one for the security manager. The two VMs are coupled together.
- Security manager has a secure storage and attestation manager.
(Diagram. See Presentation)
Question: Client is now sending a lot more data. Is this a concern? What about performance overheads of two VMs? Performance of the secure storage device will be fine, as capabilities will be in hard drives rather than software.
Question: Two VMs are created dynamically. Does this add complexity to the process? Does the administrator need access? Administrators should not have access to all the memory on the machine. Xen is currently designed badly to allow logging to the hypervisor.
Question: Why not put this secure storage/attestation stuff in the privileged domain? This implies more work for administrators to upgrade all grid nodes.
Solving The Middleware Problem
- We still have a problem, the grid middleware must still be trusted, as it downloads the data and holds credentials.
- Avoid this by encrypting the grid job.
- Middleware downloads encrypted data, it is never given key to decrypt.
- The security manager gets given the key instead.
- Advantage: this works with any grid middleware stack. The grid middleware just downloads binary data and runs a Virtual Machine.
- The Job Security Manager runs, then contacts the key management service, attests, and gets the key from the service for the data.
- Your home institution keeps hold of the key management service. Key exchange is performed over a secure network channel.
Question: Can the administrator know anything about the running job? Possibly, but this might add complexity and spoil the system.
Digital Rights Management
- Data owner has control over the grid.
- Distributor wants to enforce mandatory access control.
- Trusted "Sub grid" solution. Grid services cannot outsource grid jobs to other VOs, as your KMS releases the keys only when you want it to.
- This effectively gives you control of what software is being run on the grid. DRM is turned around so that users maintain access to their data
- A solution was presented for getting secure data back out. See slides.
- A record can be kept of what software was being used at each grid node, rather than controlling it absolutely.
Question: How many keys are used? Left open for the implementer. Either simplify and have only one key, or use multiple keys and prevent break-once-break-everywhere attack.
Need a trusted Key Management Service. There is no point attesting it, as you have already given it your key. A bad key management service will (hopefully) never be given your keys. This is not defined too strictly so that you have the ability to implement your own KMS.
Comment: Some people have too much data to encrypt. Will hardware encryption make this OK again? Perhaps. In this system you can encrypt only small bits of data if you like. Freedom to do this. Performance might be an issue, but some people run grid jobs of size 4GB, so one Job Security Manager is not a big deal.
Towards a Secure, Tamper-Proof Grid Platform In Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06) pages 373-380. IEEE Computer Society, May 2006.
Secure Logging In a Grid Environment (Jun Ho Huh)
Logging facilities are useful for facilitating communication and recording diagnostic audit trails. They can be used to detect security violations.
Abstract view of the grid (See slides).
Problem
- There are many scenarios where log data is highly sensitive.
- Healthcare Grids. Logs themselves may contain private information, and therefore are privileged.
- Example: Malicious researcher tries to discover personal data through cumulative data accesses over time. This can be prevented by keeping a history. This is very difficult in a grid scenario.
- Take a distributed audit approach. However, it is difficult to negotiate trust on the grid.
- Example: Monitoring SLAs. Client complains a certain service violates the SLA. Service denies receiving any requests. The client has no way of finding out the truth.
- Service provider wishes to produce a trustworthy report on a log request.
- Threats: service provider inserting arbitrary log records, Client could change logs to make false accusations.
Existing Solutions:
- Very little emphasis on how the logs are generated and stored in a secure manner.
- No verification of the creation or storage. No point analysing untrustworthy logs.
- Very little work on protecting the confidentiality of log data
- There is a need for interoperability.
Requirements:
- Authorisation Policy Management for logs. Distributed access requests need to be controlled and policies applied.
- Trustworthy conciliation service of logs. Logs need to be safeguarded from compromise at a remote host.
Solution:
- Uses remote sealing and remote attestation from Trusted Computing.
- Virtualized environment for strong isolation. This limits the effects of any malicious code.
- Architecture diagram (see slides). A small number of back-end domains which have access to device drivers.
- Log access VM has two trusted services – policy management and log migration. Users receive attestation along with logs. Per-user log access VM.
Questions: the source needs to know that ALL the logs are always being received. Not receiving the logs will imply a massive failure. Big issue is knowing that all log records are there.
Question: Are logs kept centrally? This would make sense, but at the moment it is all ad-hoc collaborations.
Question: Different log formats. Are they dealt with? Integrating with existing application logging services. Multiple trust levels of the logs? What about things that don't make I/O Requests? Failed requests, for example. Still need to keep less trustworthy logs.
Question: We need to log something else, the creation of certificates. We need to trust someone who we have delegated the ability to create certificates to. If something does not leave your virtual machine, is it an event? Could code be instrumented to make events when a certificate is created? File creation is logged. But more information needed. We want to know how the certificate is created, what the attributes are, etc. Similar to a journalling file system or DBMS.
Question: Is there too much data here? There is previous work on ordering system calls and looking at patterns. However, spotting suspicious behaviour is difficult. Might be interesting to see the results.
Applying Trusted Computing To A Workflow System (Po-Wah Yau)
Motivation:
- Trying to achieve automated synergy from multiple tasks.
- Complicated workflows can be created, including acylic graphs, loops, etc. Once created, we can reuse these cleverly created workflows.
Two types of workflow:
- Low level, physical resource workflows
- High Level, abstract, user-created workflow. Examples in Pegasus, P-Grade, Taverna, Gridworkflows.org.
Workflow Resource Broker (WRB).
- This maps abstract workflow tasks to physical jobs. It must handle scheduling, security, requirements and hardware issues.
- Key entity in the system
- Must be trusted
- User credentials are delegated to it
- Chooses which service provider to use.
Generic Requirements
- Compromising one grid or workflow job might not be very critical.
- Exposing an entire workflow/sub-workflow might be, though, so we need to keep these confidential.
- May need to keep the location of the resource provider that is running the job confidential.
- Integrity is very important, perhaps more than confidentiality. Incorrect results early on can result in error propagation. This might cause wasted resources and loss of reputation if grid resources are used for incorrect work.
TPM Keys
- Migratable
- Non-migratable
- Certifiable migratable (need a TTP)
Suggested Solution
- n.b. some notes missing. See slides for details.
- Assume Resource Broker (WRB) has a TPM. Some means of knowing that the WRB can be trusted. Users may have their own means of doing this.
- Assume that the user has some way of describing and translating high level security requirements.
- Each job is associated with a symmetric key created by the WRB. Private key for each job too, held in a TPM.
- Sealed key approach, similar to Lohr et al.
- Each node on a workflow attests the previous node (for integrity).
- Each node is given the public key for the next node, and encrypts data with it before sending. This key can only be accessed by the next node if it is in the right configuration.
Solution Properties
- Forward trusted resource selection
- Backward detection of compromised jobs.
- Efficiency through symmetric keys rather that asymmetric.
- Secure measurement log.
See website for more references: DistributedTrust.org
Question: Forward and backward attestation. How do all points in the chain know what is trustworthy? This is an assumption. The user must have some way of knowing this. Attestation actually only performed backwards, rely on the sealed key approach going forward.
A catalogue service is used to maintain lists of sealed keys.
Citation: Securing Grid Workflows with Trusted Computing
Reputation Policy Based Trust (Yonatan Zetuny)
Background
- Currently there are two types of trust systems: policy based (Web services) and reputation based (P2P).
- In grid computing certificate authorities manage trust
- Existing reputation systems have limitations. They are esoteric. Metrics are embedded within algorithm. Clients cant manipulate metrics.
- Some attempts to do reputation-based trust management: GridEigenTrust, PathTrust, PeerTrust.
Solution
- Synergistic model: Clients can weight different beliefs / trust evaluation criteria in different ways.
- Two level synergy, the client defines evaluation and decision criteria.
- Extensible and "exoteric" model.
- Use fuzzy logic.
Details of the system and algorithm have been omitted from these notes. See the slides.
Open Issues
- Grid client could be anyone – broker, monitor, scheduler, etc.
- Advantage of fuzzy logic is that it allows you to deal with uncertainties. you can add more opinions.
- Its easy to add more metrics or sources.
- Not dealing with incorrect feedback or people trying to game the system
Comment: Currently, VOs make the decisions on trust and use blacklists. Does this scale? Jobs currently report back to the VO.
Comment: Could we create an aggregated overall VO trust value?
Comment: Service providers could use such a system for monitoring, to spot when things go wrong. but the provider wants the floating point value, perhaps a graph over time, not an actual binary decision of "trusted" or "not trusted.
Post Lunch Discussion
See Andrew Martin's notes for information on the discussions and a summary of outcomes from this event.
Selected Notes
- We need more projects can case studies.
- Very difficult to unite the people with good security ideas and the people building/using systems.
- A common forum needed? There is a mailing list and website
- Interoperability is a limitation. Need to be able to seamlessly migrate to new, more secure ways of doing things. Need to be able to incorporate legacy environments.
Interesting hardware mentioned during discussions
- Virtualization
- Trusted computing, VT and TPMs.
- Gumstix
- encrypting storage devices
Virtualization
- Might solve many of the problems through stronger isolation.
- Lack of interoperability between VMs. This is going to become a big problem. Jobs are limited to running in only one kind of virtual machine. Discussion about OGF not standardising VMs. Need to be able to describe the configuration of VMs.
- People are using virtualization in other places, to create new resources very easily. Grid Ireland use it to simulate their entire grid, for testing purposes.
Spectrum of Assurance
- Persuading people who are on the "edges" of the currently-feasible assurance levels. Not the people who have needs far beyond what can be delivered.
- Give the customer the ability to do risk assessment
- Joining different VOs is tricky, grid may not be the right model. May need to add trust boundaries to persuade some organisations to climb on board.
References:
- Work from IBM Watson on virtual domains. getting virtual resources and aggregate them into Virtual Domains.
- Security Metrics
Next Steps
- Workshop on levels of assurance, or DRM?
- Get user communities with real security needs to talk to security experts.
- Security is currently considered either a hurdle or an operations issue. Not an opportunity to allow you to do new things.
- Potentially collaboration with projects in Australia.
- Try to get security assurance as a key part of any project definition. Talk to funding councils? Outline draft of assurance levels?
- It would be really good to get some project proposals out of these ideas.