Data security is the major concern in cloud. Encryption of data can securely store the data, but it will create other challenges like data searchability, sharability, key management, etc. 

This EScube on Cloud pattern is formulated to store, search, and share data/documents securely in cloud. This pattern prevents frequency analysis attack by cloud provider and eases the key management. The pattern also talks about data verification to make sure that the encrypted data is not corrupted.


EScube on Cloud: pattern for Encrypted Searchable, Sharable data Storage.

This is an Architectural pattern. This can be sub-categorized into Data Security pattern.

Cloud store can be used as a reliable & cheaper data store which can be used for storing data and collaborating easily with partners across the globe. But when it comes to confidential, critical data (e.g. all my IT returns), it will create concerns in keeping it in cloud. The concerns are of two types – 1. Some external attacker can get to my critical data, 2. Cloud provider themselves will know my data.

We will need to encrypt our critical data to prevent access by cloud provider and other unauthorized users. At the same time we should be able to search this encrypted data or else it will not be possible to locate our stored information. Also there should be a way to share specific data (for example, 2008 IT return documents) with specific set of people (say our auditor).

The system of forces that need to be resolved are as follows

  • Making sure the encrypted data is searchable
  • Sharing specific data with defined people just by giving pointers and preventing other documents/data from them which is not shared to them
  • Avoiding frequency analysis attack by cloud provider
  • Public verification of encrypted data to make sure that it is not corrupted
  • Easy key management

The EScube on Cloud pattern solves the problem statement by using Cryptographic techniques and Applications. Even though cryptography solves lot of security related listed problems, it is not possible to solve all the listed problems. The blend of Crypto and App solution builds a unique architectural pattern which will be effective and easy to implement.

Figure 1 gives the structure of EScube on Cloud pattern. This primarily contains 5 components:

  1. Cloud Data store with Public Access
  2. Cloud Index store with Private Access
  3. Data Owner
  4. Data Consumer
  5. Data Verification

 

 

 

Figure 1 – Structure of EScube on Cloud

2.1.1                     Cloud Data store with Public Access (CDS)

Cloud Data Store (CDS) is the store for encrypted data or documents. CDS also contains the signature of the encrypted document. The Signature and the Encrypted document forms as a Name-Value pair in CDS. The access to this store is public which means that the data can be retrieved by anyone without providing any access credential.

2.1.2                     Cloud Index store with Private Access (CIS)

Cloud Index Store (CIS) contains the Search indices or Tags of the document and the document locators of the document (which is on CDS). CIS store is private accessible which means that account credentials are needed to retrieve data from CIS. This gives two levels of security for the data – one is through authentication and secondly through cryptography. More detail on CIS is explained in ‘Building search capability’ section of Technical Implementation.

The documents and their tags are stored in different clouds to avoid frequency analysis attack by the malicious cloud provider. If both are stored in same cloud, then the cloud can map tags towards documents which is considered as a cryptographic security flaw

2.1.3                     Data Owner

Data owner as the name says is the owner of the data uploaded to Cloud. Data owner manages the Key store which contains Signature Private/Secret Key (SSK) and the Master Encryption Key (MEK). The data is first encrypted and then signed. The detail of encryption is given in ‘Securing the document’ and ‘Preparation for protected share’ sections. The data owner uses SSK for signing the encrypted data which gives unique signature for that encrypted data. The signature and the encrypted data are stored in CDS. The prepared index will be stored in CIS for which the access credentials will be securely stored by the owner.

When the document is required by the owner, the owner queries CIS with the appropriate ‘tag’ and gets the relevant document locators. The owner can download the document from CDS using the document locator and decrypt the data using the decryption key. Also, the owner can share the document locator and decryption key with others for accessing the document.

2.1.4                     Data Consumer

Data Consumer is the authorized reader of the document which includes the data owner. The data owner will be given the document locator to locate the document from CDS and the decryption key to decrypt the encrypted document in CDS.

2.1.5                     Data Verification

The encrypted document in CDS can be verified against any accidental or incidental tamper through the process of Data Verification. The encrypted data is verified against its signature which is stored along with the encrypted data in CDS. Owner’s Signature Public Key (SPK) is used for verifying. The SPK is available publicly similar to the CDS data and hence anyone can verify the data stored in CDS.

2.1.6                     Trust Zone

There are 2 trust zones defined in EScube on Cloud pattern. The Data Owner including the Key store is in the trusted zone and all the other components (CDS, CIS and Data Reader) are in the untrusted zone.

Following figure shows different regions of control.

 

Figure 2 – Regions of Control

  • Data Owner is in the first region of control and has complete control on all the components/boundaries.
  • Cloud Index Store (CIS) is in the second region of control. Only Data owner will be able to access this region of control.
  • Cloud Data Store (CDS) is in the third region of control and is read accessible by all. But the data in CDS can be modified (i.e. write access) only by the Data owner.
  • Data reader forms the fourth region of control which can access the CDS and request Data owner for document and receive the document locators and decryption key.

Apart from these regions, owner’s Signature Public Key (SPK) stays public and can be accessed by everyone.

2.3 Technical Implementation details

The technical details for enablement of this pattern are explained in the following sections:

  1. Securing the document
  2. Preparation for protected share
  3. Building search capability
  4. Document search
  5. Key access control
  6. Key versioning

2.3.1                     Securing the document

The data or the document (D) which has to be secured needs to be encrypted using a symmetric algorithm, say AES.

Use encryption key (K):

AES (D, K) à D’

Sign the encrypted document using RSA algorithm.

Use Signature Private Key (SSK):

RSASign (D’, SSK) à ‘Signature’

Let’s call the generated ‘Signature’ as L

Now, the document can be stored in the cloud store (say Azure Blob Store or Amazon S3) with the location (L) as key and the encrypted document (D’) as the value. The data tamper validation can be done with RSA Signature verification using the Public key (SPK).

2.3.2                     Preparation for protected share

Sharing the uploaded data or document can be achieved just by sharing the location (L) and the encryption key (K) (for decrypting). But the problem in sharing (K) is that, the consumer will be able to decrypt all the data/documents of the owner by just knowing the location which is public.

But, if the owner use different encryption keys (K1, K2, K3…..KN) for ‘N’ documents (D1, D2, D3…..DN) respectively, then owner can share only ‘K3’ to give access to document D3. This gives protected share but will create key management problem. Let us discuss this particular key management problem little later on how to solve that. Following is the representation of how different keys are used.

Use encryption key “K1”:

AES (D1, K1) à D1

Signing procedure remains the same (RSASign (D1’, SSK) à L1)

For sharing document D5, the owner will share L5 and K5 with the consumer. The consumer can now get D5’ from the location L5 and decrypt it using K5 to get D5. The document can also be verified for tamper using the signature L5.

2.3.3                     Building search capability

Encrypted data is not searchable. So, we have to build this capability before encrypting the data/document. The general technique used for building the search capability on any data is through Indexing.

The document tokens (which are used for indexing) can be retrieved by passing the document through a ‘Tokenizer’. Different tokenizers are available to strip the tokens from different types of documents (this is the first step in any full text indexing). Let us take the tokens for document D1 as t11, t12, t13… t1M. The searchable index will be built as follows:

Keyed hash the tokens using key HK:

HMACSHA HK (t11, t12, t13… t1M) à H11, H12H1M

Note: Hashing the tags before uploading it will hide the actual tag from the cloud provider. And with HMAC, which is keyed hashing, it will become near to impossible for cloud provider to guess the uploaded tag. This helps in preventing information leakage.

Encrypt the document location (or signature) L1 along with CDS store location information (C1) and document encryption key K1 using MEK.

Note: (C) is the actual store location which could be something like https://azure.microsoft.com/blob/finance. This specifies the container of store. Through this, owner can store data in different container or even different cloud and use the same index store CIS for centralized quick search.

AES (L1 | C1 | K1, MEK) à CD1

Note: Owner can use a different symmetric or asymmetric algorithm here instead of AES. By using asymmetric algorithm, the data upload can be delegated to someone else without the need of sharing the encryption master secret key. For asymmetric algorithm, master public key (MPK) will be used for encryption instead of MEK as mentioned above.

          Create unique pointer P1.

Store P1 – CD1 and H1s – P1 (in different tables) in CIS. Since we pack the dynamic encryption key using MEK and store it in the cloud with index, the key management problem discussed in the previous section is not an issue in this pattern.

Note: The indices (P1, H1s and CD1) and data (L1, D1’) have to be stored in different clouds to avoid frequency analysis attack by the malicious cloud provider. If both are stored in same cloud, then the cloud can map tags towards documents which is considered as a cryptographic security flaw.

Consolidated data will be stored as follows:

Pointers

Consolidated Data Locator

P1

CD1

P2

CD2

P3

CD3

P4

CD4

P5

CD5

 

 

Table 1 – Pointer detail table

Table 1 shows consolidated data locators for documents D1, D2, D3, D4 and D5 and their newly created respective pointers P1, P2, P3, P4 and P5.

Hashed tags with respect to document pointers are stored as follows:

Hashed Tag

Pointers

H11

P1

P5

 

 

H12

P1

P3

 

 

H13

P1

P2

P4

P5

H22

P2

P3

P4

 

 

 

 

 

 

Table 2 – Tag to Pointer mapping table

Table 2 shows the mapping between the hashed tags and the pointers. The meaning of Table 2 is as follows:

  • All Ps are mapped to documents Ds, i.e. P1 is mapped to D1, P2 is mapped to D2, etc.
  • P1 is listed against H11, H12 and H13. That means document D1 has hashed tags H11, H12 and H13.
  • P2 is listed against H13 and H22. This means document D2 has 2 tags – first hashed tag of D2 (H21) is equivalent to 3rd hashed tag of D1 (H13) and second hashed tag of D2 is H22.
    • For example, let the tags for document D1 be t11 = finance, t12 = education and t13 = report
    • Now document D2 has 2 tags t21 = report, t22 = status
    • Since t13 and t21 are same, generated hashes H13 and H21 will also be same
    • To save space and to have faster search, P2 is added under H13 itself & a separate row for H21 is not created.
    • This model will give compressed mapping index
  • Similarly, other document pointers (till D5) are updated in this table.

2.3.4                     Document search

Following is the flow of document search for the tag ‘education’. As per the above example ‘education’ tag is mapped to H12 (which is the hash of t12)

  • HMAC ‘education’ using HK which will give H12
  • Get all the document pointers for H12 from CIS, which will return P1 and P3
  • Getting document locators for P1 and P3 gives CD1 and CD3
  • AES decrypt CD1 and CD3 using MEK to get L1, C1, K1 and L3, C1, K3
  • Download the document D1’ and D3’ from the locations C1 and C3 having Names L1 and L3.
  • AES decrypt D1’ using K1 to get D1 and D3’ using K3 to get D3

Note: If asymmetric encryption is used for encrypting the document key, the owner needs MSK (master secret key) along with MPK (master public key) to decrypt it instead of MEK as mentioned above.

2.3.5                     Key Access Control

Following table gives the accessibility information of different keys by different persona or actors which acts as the proof of security for EScube on Cloud pattern:

 

Keys

Actor

                

Data Owner

Readers

CIS

CDS

Document Encryption Key (K)

Generator, Full control

Provided by data owner

No access

No access

Master Encryption Key (MEK)

Full control

No access

No access

No access

Signature Private Key (SSK)

Full control

No access

No access

No access

Signature Public Key (SPK)

Full control

Read access

Read access

Read access

HMAC Key (HK)

Full control

No access

No access

No access

CIS Account

Full control

No access

Indirect access (as this is the one providing account)

No access

Master Secret Key (MSK) **

Full control

No access

No access

No access

Master Public Key (MPK) **

Full control

Read access

Read access

Read access

Table 3 – Key access to Actor map

** Used in asymmetric algorithm

In this section let us see the collaboration of components with respect to different flows.

2.4.1                     Data Upload

Following figure details the data upload flow.


Figure 3 – Data Upload process

2.4.1.1         Data storage

  • Generate a new symmetric encryption key (KX) for the chosen algorithm (say AES).
  • Get the Input data/document and encrypt it using the key ‘KX’.
  • Sign the encrypted data using owner’s signature private key (SSK).
  • Store the signature and encrypted data as ‘Name-Value’ pair in the Cloud Data Store (CDS).
  • Extract the tags from the document using any tokenizer or add manually
  • Hash the extracted tags.
  • Encrypt the document encryption key through using any algorithm (say AES) with MEK.
  • Build consolidated data with the signature of the encrypted document, encrypted document encryption key and the CDS store information.
  • Store the hashed tags and the consolidated information.

2.4.1.2         Index storage

Technical Implementation details’ section contains the implementation details of this.

2.4.2                     Search, Share and Consume

Following figure details the search, share and consume flow.


Figure 4 – Search, Share and Consume process

  • Owner searches for some tag in CIS and gets back consolidated data with pointer.
  • Owner parses the consolidated data and gets encrypted document encryption key, document location (data pointer).
  • Owner decrypts the encrypted document encryption key using MEK which is used for decrypting the document.
  • Owner shares the encrypted document location (data pointer) and the decryption key with the consumer.
  • Consumer gets the Encrypted document from CDS.
  • Consumer decrypts the document using the decryption key supplied by the owner and consumes the decrypted ‘Resultant Data’.

2.4.3                     Data Verification

Following figure details the data verification flow.

 

 

Figure 5 – Data Verification process

  • Verification can be done by anyone.
  • At first, the encrypted document and related signature need to be downloaded from CDS.
  • Encrypted document is verified against its signature using SPK (which is public) to identify whether the encrypted document is tampered or not.

Following are the benefits of this pattern:

  • Provides secure data storage in cloud
  • Ease of implementation and usage
  • Works with any cryptographic algorithm
  • Provides quick query response
  • Provides search on secured (encrypted) data
  • Provides data tamper detection
  • Document level access permission by having a unique key for every document
  • Provides distributed data  storage (distributed CDS) with centralized index store (CIS)
  • Ease of key management
  • Can be applied to individual or to enterprise
  • Prevents information leakage
  • Protects from frequency analysis
  • Can be integrated with any existing application with minimal effort
  • Can be integrated with any identity systems including federated, as this pattern is outside of all the identity systems

 

Following are the limitations of this pattern:

  • For every versioning of the key (MEK, MPK or MSK), the whole CIS store need to be repopulated.
  • Once the key and document location is shared to consumer, it will be difficult to prevent the access or make to key obsolete. One of the work around could be to delete that particular document from CDS and recreate it with different key
  • Secured data collaboration on cloud
  • Tamper detection of secured data
  • Any application that requires secured cloud storage

Reference site

Description

http://download.microsoft.com/download/D/6/E/D6E0290E-8919-4672-B3F7-56001BDC6BFA/Windows%20Azure%20Blob%20-%20Dec%202008.docx

Azure Storage

http://aws.amazon.com/s3/

Amazon Storage

http://research.microsoft.com/en-us/people/klauter/cryptostoragerlcps.pdf

Cryptographic Cloud Storage

http://www.freepatentsonline.com/y2010/0211782.html

http://www.freepatentsonline.com/20100211782.pdf

Trusted Cloud Computing and Services Framework

http://www.freepatentsonline.com/y2010/0318782.html

http://www.freepatentsonline.com/20100318782.pdf

Secure and Private Backup Storage and Processing for Trusted Computing and Data Services

http://en.wikipedia.org/wiki/Advanced_Encryption_Standard

AES

http://en.wikipedia.org/wiki/RSA

RSA Signature

http://en.wikipedia.org/wiki/Public-key_cryptography

Public Key Cryptography

http://en.wikipedia.org/wiki/HMAC

HMAC