Abstract
Data security is the major concern in cloud. Encryption of data can securely store the data, but it will create other challenges like data searchability, sharability, key management, etc.
This EScube on Cloud pattern is formulated to store, search, and share data/documents securely in cloud. This pattern prevents frequency analysis attack by cloud provider and eases the key management. The pattern also talks about data verification to make sure that the encrypted data is not corrupted.
EScube on Cloud: pattern for Encrypted Searchable, Sharable data Storage.
This is an Architectural pattern. This can be sub-categorized into Data Security pattern.
Cloud store can be used as a reliable & cheaper data store which can be used for storing data and collaborating easily with partners across the globe. But when it comes to confidential, critical data (e.g. all my IT returns), it will create concerns in keeping it in cloud. The concerns are of two types – 1. Some external attacker can get to my critical data, 2. Cloud provider themselves will know my data.
We will need to encrypt our critical data to prevent access by cloud provider and other unauthorized users. At the same time we should be able to search this encrypted data or else it will not be possible to locate our stored information. Also there should be a way to share specific data (for example, 2008 IT return documents) with specific set of people (say our auditor).
The system of forces that need to be resolved are as follows
The EScube on Cloud pattern solves the problem statement by using Cryptographic techniques and Applications. Even though cryptography solves lot of security related listed problems, it is not possible to solve all the listed problems. The blend of Crypto and App solution builds a unique architectural pattern which will be effective and easy to implement.
Figure 1 gives the structure of EScube on Cloud pattern. This primarily contains 5 components:
Figure 1 – Structure of EScube on Cloud
Cloud Data Store (CDS) is the store for encrypted data or documents. CDS also contains the signature of the encrypted document. The Signature and the Encrypted document forms as a Name-Value pair in CDS. The access to this store is public which means that the data can be retrieved by anyone without providing any access credential.
Cloud Index Store (CIS) contains the Search indices or Tags of the document and the document locators of the document (which is on CDS). CIS store is private accessible which means that account credentials are needed to retrieve data from CIS. This gives two levels of security for the data – one is through authentication and secondly through cryptography. More detail on CIS is explained in ‘Building search capability’ section of Technical Implementation.
The documents and their tags are stored in different clouds to avoid frequency analysis attack by the malicious cloud provider. If both are stored in same cloud, then the cloud can map tags towards documents which is considered as a cryptographic security flaw
Data owner as the name says is the owner of the data uploaded to Cloud. Data owner manages the Key store which contains Signature Private/Secret Key (SSK) and the Master Encryption Key (MEK). The data is first encrypted and then signed. The detail of encryption is given in ‘Securing the document’ and ‘Preparation for protected share’ sections. The data owner uses SSK for signing the encrypted data which gives unique signature for that encrypted data. The signature and the encrypted data are stored in CDS. The prepared index will be stored in CIS for which the access credentials will be securely stored by the owner.
When the document is required by the owner, the owner queries CIS with the appropriate ‘tag’ and gets the relevant document locators. The owner can download the document from CDS using the document locator and decrypt the data using the decryption key. Also, the owner can share the document locator and decryption key with others for accessing the document.
Data Consumer is the authorized reader of the document which includes the data owner. The data owner will be given the document locator to locate the document from CDS and the decryption key to decrypt the encrypted document in CDS.
The encrypted document in CDS can be verified against any accidental or incidental tamper through the process of Data Verification. The encrypted data is verified against its signature which is stored along with the encrypted data in CDS. Owner’s Signature Public Key (SPK) is used for verifying. The SPK is available publicly similar to the CDS data and hence anyone can verify the data stored in CDS.
There are 2 trust zones defined in EScube on Cloud pattern. The Data Owner including the Key store is in the trusted zone and all the other components (CDS, CIS and Data Reader) are in the untrusted zone.
Following figure shows different regions of control.
Figure 2 – Regions of Control
Apart from these regions, owner’s Signature Public Key (SPK) stays public and can be accessed by everyone.
The technical details for enablement of this pattern are explained in the following sections:
The data or the document (D) which has to be secured needs to be encrypted using a symmetric algorithm, say AES.
Use encryption key (K):
AES (D, K) à D’
Sign the encrypted document using RSA algorithm.
Use Signature Private Key (SSK):
RSASign (D’, SSK) à ‘Signature’
Let’s call the generated ‘Signature’ as L
Now, the document can be stored in the cloud store (say Azure Blob Store or Amazon S3) with the location (L) as key and the encrypted document (D’) as the value. The data tamper validation can be done with RSA Signature verification using the Public key (SPK).
Sharing the uploaded data or document can be achieved just by sharing the location (L) and the encryption key (K) (for decrypting). But the problem in sharing (K) is that, the consumer will be able to decrypt all the data/documents of the owner by just knowing the location which is public.
But, if the owner use different encryption keys (K1, K2, K3…..KN) for ‘N’ documents (D1, D2, D3…..DN) respectively, then owner can share only ‘K3’ to give access to document D3. This gives protected share but will create key management problem. Let us discuss this particular key management problem little later on how to solve that. Following is the representation of how different keys are used.
Use encryption key “K1”:
AES (D1, K1) à D1’
Signing procedure remains the same (RSASign (D1’, SSK) à L1)
For sharing document D5, the owner will share L5 and K5 with the consumer. The consumer can now get D5’ from the location L5 and decrypt it using K5 to get D5. The document can also be verified for tamper using the signature L5.
Encrypted data is not searchable. So, we have to build this capability before encrypting the data/document. The general technique used for building the search capability on any data is through Indexing.
The document tokens (which are used for indexing) can be retrieved by passing the document through a ‘Tokenizer’. Different tokenizers are available to strip the tokens from different types of documents (this is the first step in any full text indexing). Let us take the tokens for document D1 as t11, t12, t13… t1M. The searchable index will be built as follows:
Keyed hash the tokens using key HK:
HMACSHA HK (t11, t12, t13… t1M) à H11, H12… H1M
Note: Hashing the tags before uploading it will hide the actual tag from the cloud provider. And with HMAC, which is keyed hashing, it will become near to impossible for cloud provider to guess the uploaded tag. This helps in preventing information leakage.
Encrypt the document location (or signature) L1 along with CDS store location information (C1) and document encryption key K1 using MEK.
Note: (C) is the actual store location which could be something like https://azure.microsoft.com/blob/finance. This specifies the container of store. Through this, owner can store data in different container or even different cloud and use the same index store CIS for centralized quick search.
AES (L1 | C1 | K1, MEK) à CD1
Note: Owner can use a different symmetric or asymmetric algorithm here instead of AES. By using asymmetric algorithm, the data upload can be delegated to someone else without the need of sharing the encryption master secret key. For asymmetric algorithm, master public key (MPK) will be used for encryption instead of MEK as mentioned above.
Create unique pointer P1.
Store P1 – CD1 and H1s – P1 (in different tables) in CIS. Since we pack the dynamic encryption key using MEK and store it in the cloud with index, the key management problem discussed in the previous section is not an issue in this pattern.
Note: The indices (P1, H1s and CD1) and data (L1, D1’) have to be stored in different clouds to avoid frequency analysis attack by the malicious cloud provider. If both are stored in same cloud, then the cloud can map tags towards documents which is considered as a cryptographic security flaw.
Consolidated data will be stored as follows:
Pointers
Consolidated Data Locator
P1
CD1
P2
CD2
P3
CD3
P4
CD4
P5
CD5
Table 1 – Pointer detail table
Table 1 shows consolidated data locators for documents D1, D2, D3, D4 and D5 and their newly created respective pointers P1, P2, P3, P4 and P5.
Hashed tags with respect to document pointers are stored as follows:
Hashed Tag
H11
H12
H13
H22
Table 2 – Tag to Pointer mapping table
Table 2 shows the mapping between the hashed tags and the pointers. The meaning of Table 2 is as follows:
Following is the flow of document search for the tag ‘education’. As per the above example ‘education’ tag is mapped to H12 (which is the hash of t12)
Note: If asymmetric encryption is used for encrypting the document key, the owner needs MSK (master secret key) along with MPK (master public key) to decrypt it instead of MEK as mentioned above.
Following table gives the accessibility information of different keys by different persona or actors which acts as the proof of security for EScube on Cloud pattern:
Keys
Actor
Data Owner
Readers
CIS
CDS
Document Encryption Key (K)
Generator, Full control
Provided by data owner
No access
Master Encryption Key (MEK)
Full control
Signature Private Key (SSK)
Signature Public Key (SPK)
Read access
HMAC Key (HK)
CIS Account
Indirect access (as this is the one providing account)
Master Secret Key (MSK) **
Master Public Key (MPK) **
Table 3 – Key access to Actor map
** Used in asymmetric algorithm
In this section let us see the collaboration of components with respect to different flows.
Following figure details the data upload flow.
Figure 3 – Data Upload process
‘Technical Implementation details’ section contains the implementation details of this.
Following figure details the search, share and consume flow.
Figure 4 – Search, Share and Consume process
Following figure details the data verification flow.
Figure 5 – Data Verification process
Following are the benefits of this pattern:
Following are the limitations of this pattern:
Reference site
Description
http://download.microsoft.com/download/D/6/E/D6E0290E-8919-4672-B3F7-56001BDC6BFA/Windows%20Azure%20Blob%20-%20Dec%202008.docx
Azure Storage
http://aws.amazon.com/s3/
Amazon Storage
http://research.microsoft.com/en-us/people/klauter/cryptostoragerlcps.pdf
Cryptographic Cloud Storage
http://www.freepatentsonline.com/y2010/0211782.html
http://www.freepatentsonline.com/20100211782.pdf
Trusted Cloud Computing and Services Framework
http://www.freepatentsonline.com/y2010/0318782.html
http://www.freepatentsonline.com/20100318782.pdf
Secure and Private Backup Storage and Processing for Trusted Computing and Data Services
http://en.wikipedia.org/wiki/Advanced_Encryption_Standard
AES
http://en.wikipedia.org/wiki/RSA
RSA Signature
http://en.wikipedia.org/wiki/Public-key_cryptography
Public Key Cryptography
http://en.wikipedia.org/wiki/HMAC
HMAC