Difference between revisions of "Encryption"

Jump to: navigation, search
Line 1: Line 1:
In today's world of research we are sending data regularly over the internet, and we are storing the data in the cloud. At any point when the internet is involved we expose our data to some risk. There are tools that keeps our data safe, and most of those tools is in one way or another related to encryption. This page is discussing the the concept of encryption, if you are looking for instruction on how to encrypt something at a specific stage of a project, then go to the specific page for that topic. For example, [[Questionnaire Programming]],
In today's world of research, researchers regularly handle data, send it over the internet and store it in the cloud. At any point, especially when the internet is involved, the data is exposed to some risk. Keeping data safe and encrypted is hence a key component of [[IRB Approval | IRB requirements]] and [[Research Ethics | research ethics]]. Encryption should take place whenever dealing with [[Personally Identifiable Information (PII) | sensitive data]] in any stage of research: from [[Sampling & Power Calculations | sampling]] and [[Primary Data Collection | data collection]] to [[Data Cleaning | cleaning]] and [[Data Analysis | analysis]]. This page discusses encryption in transit and at rest; key pairs; password management; and encryption with SurveyCTO data.  


== Read First ==
== Read First ==
* Almost all encryption depends on a [[Encryption#Public.2FPrivate_Key_Pair |public/private key pair]].
*Store encrypted data <code>[[iefolder]]</code>’s [[DataWork_Folder#Survey_Encrypted_Data | EncryptedData]] folder. Note that while <code>[[iefolder]]</code> makes the folder, it does not encrypt it.
* [[Encryption#Encryption_In_Transit | Encryption in transit]], i.e. encryption while data is sent over the internet is extremely important, but it is easy to implement so most services do this without you even noticing. There is '''never''' a case when ''not'' using encryption in transit is at all ok.
*World Bank SurveyCTO server data must be encrypted via SurveyCTO.
* [[Encryption#Encryption_At_Rest | Encryption at rest]] , i.e. encryption when data is stored on a server or computer, is also important but not as extremely important as encryption at rest. There is no as seamless implementation of encryption at rest as the files are encrypted over a longer period of time, compared to the second or so the file has to be encrypted when it is sent over the internet.
* Almost all encryption depends on a [[Encryption#Public.2FPrivate_Key_Pair |public/private key pair]], which should be securely stored in a password manager.
* [[Encryption#Encryption_In_Transit | Encryption in transit]], or encryption while data is sent over the internet, is extremely important: there is never a case when not using encryption in transit is at all ok. [[Encryption#Encryption_At_Rest | Encryption at rest]], or encryption while data is stored on a server or computer, is also important.


== Encryption in transit ==
==Encryption in Transit and at Rest==
This is by far the most important type of encryption, but luckily it is almost always taken care of by the service provider of the service we are using. Survey Solutions, [[SurveyCTO_Coding_Practices|SurveyCTO]], OneDrive etc. all take care of this. But if you are using a less well-establish service you should make sure that they use encryption in transit.


If your service provider is using ''secure'' transfer methods, you will see <code>https://</code> in the internet address instead of simply  <code>http://</code>. In reality it is more complicated than that, as page you see in the browser use <code>https://</code> but the data is sent back and from an API that is using <code>http://</code>, that can easily be spied on.
=== Encryption in Transit ===
Encryption in transit is by far the most important type of encryption. Service providers almost always take care of (i.e. Survey Solutions, [[SurveyCTO_Coding_Practices|SurveyCTO]], OneDrive). However, when using less established services, confirm that they use encryption in transit by looking at the internet address. If the service provider uses secure transfer methods, you will see <code>https://</code> in the internet address instead of simply  <code>http://</code>. Data sent from an API using <code>http://</code> can easily be spied on.


You should never send anything of importance over the internet unless the URL starts with ''HTTPS''. Data transferred over an ''HTTP'' connection can often be openly read by every server that data passes through. Those servers are controlled by governments and private companies, and hackers can easily tap in to ''HTTP'' traffic and read data, copy files, read passwords etc. ''HTTPS'' is not the only secure way to transfer data over the internet, but it is the one researchers should know of as we use it frequently. If you set up advanced protocols to send files, you should make sure that they are set up to be secure. For example, if you are using ''FTP'' you should be using ''FTPS''.  
Never send anything of importance over the internet unless the URL starts with <code>https://</code>. Data transferred over an <code>http://</code> connection can often be openly read by every server through which the data passes. Those servers are controlled by governments and private companies; hackers can easily tap into their traffic and read data, copy files, read passwords etc. <code>https://</code> is not the only secure way to transfer data over the internet, but it is the one that researchers should know of as we use it frequently. If setting up advanced protocols to send files, make sure that they are set up to be secure. For example, instead of using ''FTP,'' use ''FTPS''.  


Encryption in transit has nothing to do with a service requiring a username or password. A password-protected resource can only be requested by someone with the correct password, but that does not protect the resource from being seen by the servers handling it while it being transferred to the authorized user, once they have entered the correct password. What you need to know is ''HTTP'' is never secure enough to send data, and that if a data collection service does not encrypt your data in transit, then it should absolutely never be used to send sensitive data.
Note that encryption in transit has nothing to do with a service requiring a username or password. While a password-protected resource can only be requested by someone with the correct password, once it is in transit to the authorized user, the servers handling it can still see it if it is not encrypted.  


== Encryption at rest ==
=== Encryption at Rest ===
Encryption at rest means that, in addition to being transferred securely, the data as it is stored on the server is scrambled in such a way that it is unreadable by anyone, even if they are able to access the file directly. If the data is not encrypted at rest, then anyone who has access to that server can read that data (including, for example, the host company and your team’s administrators). If the data is encrypted at rest, then the data is impossible to read even if someone gained unauthorized access to the database or the files where the data is stored. Encryption at rest uses an authorization tool called a [[Encryption#Public.2FPrivate_Key_Pair |public/private key pair]].
Encryption at rest means that the data stored on the server is scrambled in such a way that it is unreadable by anyone even if they can access the file directly. If the data is not encrypted at rest, then anyone with access to the server can read that data (including, for example, the host company and your team’s administrators). If the data is encrypted at rest, however, the data is impossible to read even if someone gained unauthorized access to the database or the files where the data is stored. Encryption at rest uses an authorization tool called a [[Encryption#Public.2FPrivate_Key_Pair |public/private key pair]].


Encryption at rest is unfortunately not as easy to implement as encryption in transit. Both encryption use a private/public key pair to make sure that no unauthorized person gets access to the data. Since the time the data needs to be encrypted in transit is so short, the web servers never need to give us humans the key pair. Once the transfer is complete the key pair is discarded and never needed ever again. However, in encryption at rest we encrypt the data at one point in time and will access the data at some later point in the future. The computers therefore has to give the key pair to a human, and as so often, the weakest link is always the human factor. No private/public key pair is secure if the computer that generated the key saves it or is able to re-generate it, so we humans must safe keep the key pair, and if we lose it the data is lost for ever, and there is no way whatsoever to regain access to it.  
Encryption at rest is not as easy to implement as encryption in transit. Both types of encryption use a private/public key pair to ensure that no unauthorized person gets access to the data. Since the time during which the data must be encrypted in transit is so short, the web servers never need to give users the key pair. Once the transfer is complete, the key pair is discarded and never needed or used again. However, in encryption at rest, the research team encrypts the data at one point in time and will access the data at a later point in time. The computer therefore must give the key pair to a human. As is so often the case, the weakest link is always the human factor. No private/public key pair is secure if the computer that generated the key saves it or is able to re-generate it. Thus, humans must safekeep the key pair. If we lose it, the data is lost forever and there is no way whatsoever to regain access to it.  


How encryption at rest is implemented depends on which service you use. So read the instruction specific to your service. See the section on private/public key pairs in this article for instructions to securely store your key pair.
The exact way in which encryption at rest is implemented depends on the service. For more information, read the instruction specific to your service.  


== Public/Private Key Pair ==
== Public/Private Key Pair ==
Almost all encryption uses some version of private/public key pair, but in many cases you do not have to worry about them. For example, in encryption in transit temporary keys are created by the server and then thrown away once the transfer is complete. Each time you browse the internet this happens hundreds of times, but you do not need to worry about it as the server can keep the key pair for the short duration of the data transfer. However, the server cannot securely do that if the encryption should be used over time, such as in encryption at rest, so then you must keep your own private key.


The keys in the private/public key pair is either a string or a small file. There is a complex mathematical relationship between the two keys (that we never need to understand) that allows anyone with the private key to decrypt anything someone with the public key has encrypted.  
The keys used in the private/public key pair are either strings or small files. Exactly how the private/public key pair is created differs depending on the service. Complex mathematical relationships connect the two keys (we never need to understand these), allowing anyone with the private key to decrypt anything encrypted by someone with the public key. In essence, the public/private key pair system is like a vault with two doors. One door has a tiny opening where you can only put things in the vault but not take anything out. To open this door, you only need the public key. Since the door cannot be used to take anything out of the vault, it is safe for multiple people to have this key. The second door is a big door that can be used to take out all the content of the vault. To open this door, you need the private key. It is therefore very important to restrict access to the private key.


It can be described as a vault with two doors. One door has a tiny opening where you can only put things in to the vault but not take anything out. To open this door you only need the public key, and since that door cannot be used to take anything out of the vault, it is safe for multiple people to have this key. The second door is a big door that can be used to take out all the content of the vault. For this door you need the private key, and it is therefore very important that access to the private key is very restricted.
The following principles always hold for public/private key pairs:


Exactly how the private/public key pair is created differ depending on the service you are using, but the following principles should always be the same:
* If you lose your private key, there is no way of decrypting your data ever again. Your data is lost forever. To safeguard the private key, use a password manager.
 
* The key pair can only be created once. Any services that claim to be able to re-generate the private key for you are not safe: this is equivalent to them having your password. This either means that the data is not properly encrypted or that they saved a copy of your private key that gives them full access to your encrypted data.
* If you lose your private key there is absolutely no way of decrypting your data ever again. Your data is lost forever.
* The key pair can only be created once. Any services that claim to be able to re-generate the private key for you are not safe: this is equivalent to them having your password. This either means that the data is not properly encrypted or that they saved a copy of your private key which gives them full access to your encrypted data.
* It is perfectly fine if the service keeps a copy of your public key. This allows them to encrypt new data as it is coming in. Some services only give you the private key and you do not have to worry about the public key. It depends on the context.
* It is perfectly fine if the service keeps a copy of your public key. This allows them to encrypt new data as it is coming in. Some services only give you the private key and you do not have to worry about the public key. It depends on the context.
* Most services provide a convenient way of decrypting your data, but if you are not asked for the key every time the data is viewed or decrypted, then your data is not encrypted properly.
* Most services provide a convenient way of decrypting your data, but if you are not asked for the key every time the data is viewed or decrypted, then your data is not encrypted properly.


The main burden with this type of encryption is that you need to keep a lot of keys safe and well-organized (one for each encryption object); and you can never store, organize, or transfer them using unsecure methods like email or Dropbox. However, there are tools that are convenient to use for this purpose as well as store your keys themselves safely. These are called password managers.  
=== Password and Key Management ===
When using public/private key pairs, you need to keep a lot of keys safe and well-organized. You can never store, organize, or transfer them using unsecure methods like email or Dropbox. DIME Analytics recommends using password managers as a convenient and safe solution for storing private keys and passwords. While there are many password managers out there, DIME Analytics recommends [https://www.lastpass.com/business-password-manager LastPass], whose free, basic tier satisfies all requirements a research team would ever need.  


=== Storage of passwords and private keys ===
Password managers like LastPass allow you to store small text files or string fields, as private/public key pairs sometimes come in file format. LastPass can save keys as small text files or as strings for you as ''secure notes,'' which you can also use for other important information, like noting to which encryption container the private key pertains. Password managers like LastPass can also help you randomize passwords so that your passwords are impossible-to-guess, long strings. Since the password manager store and remembers this password for you, you will not need to memorize it and you can easily have a different long random string for each account you have.
There are many solutions to storage of private keys and passwords that are both safe and convenient to use. Some of them even allows you to safely share passwords and keys across multiple accounts. The solution we are recommending is a software tool called password managers.  


There are many password managers out there, and if you are already using one and are comfortable with it you can keep using it. We have identified one password manager that satisfies all requirements we think a research team will ever need and that is free to use for all its basic features. That password manager is [https://www.lastpass.com/business-password-manager LastPass]. Note that it will promote its paid tiers to you (and which you may decide to prefer), but there is a free tier that you can use that will satisfy the typical needs of a research team if you create an account for the group.
Some password managers allow you to share passwords across accounts. This is a great feature: if one person updates the password, then everyone else has the updated password in their account immediately. Password sharing is sometimes a paid feature. One way around paying is for the research team to create one account and share access to that account. However, make sure to not share this account to more people than those who really needs it.


Password managers can help you randomize passwords so that you have long strings that are impossible to guess as passwords. Since the password manager keeps this password for you, you will not need to memorize it, so it is not a problem that it is a long string that is impossible for a human to remember and you can easily have a different long random string for each account you have.
==Encryption with SurveyCTO Data==
When [[Questionnaire Programming | programming]] questionnaires in SurveyCTO, use the key generator to create a public-private key pair for your data with the name of your intended survey. These will download two files to your laptop: these are the keys. Create an encrypted survey with the public key. Then put the keys in the vault via LastPass or VeraCrypt. Note that you will need the private key for the form data download. Store encrypted data <code>[[iefolder]]</code>’s [[DataWork_Folder#Survey_Encrypted_Data | EncryptedData]] folder. Note, however, that while <code>[[iefolder]]</code> makes the folder, it does not encrypt it.


Some password managers allows you to store small text files or string fields which may be needed as private/public key pairs sometimes comes in the format of files. LastPass can save keys as small text files files or as strings for you as ''secure notes'', which you can also use for other important information, like noting which encryption container the private key is for.
In your first [[Data Cleaning | cleaning]] do-file, [[De-identification | de-identify]] the [[Personally Identifiable Information (PII) | PII data]] to create a de-identified dataset. If you’re using Veracrypt, the <code>veracrypt</code> command allows Stata to call for VeraCrypt to mount the drive. Then you have to manually enter the password anytime you run the do-file. The first cleaning file should move the non-PII version of your data to the regular data folder.  
 
Some password managers allows you to share passwords across accounts. This is great feature as if one person updates the password then everyone else has the updated password in their account immediately. Sharing passwords or sharing passwords to many accounts are sometimes a paid feature, and one way around this is that the research team creates one account and share access to that account. However, make sure to not share this account to more people than who really needs it.


== Back to Parent ==
== Back to Parent ==
Line 53: Line 50:
==Additional Research==
==Additional Research==
*DIME Analytics' slides on [https://github.com/worldbank/DIME-Resources/blob/master/onboarding-5-encryption.pdf Encryption]  
*DIME Analytics' slides on [https://github.com/worldbank/DIME-Resources/blob/master/onboarding-5-encryption.pdf Encryption]  
*Poverty Action Lab’s [https://www.povertyactionlab.org/sites/default/files/documents/Data_Security_Procedures_December.pdf Data Security Procedures for Researchers]
[[Category: Data_Management ]]
[[Category: Data_Management ]]

Revision as of 22:08, 28 May 2019

In today's world of research, researchers regularly handle data, send it over the internet and store it in the cloud. At any point, especially when the internet is involved, the data is exposed to some risk. Keeping data safe and encrypted is hence a key component of IRB requirements and research ethics. Encryption should take place whenever dealing with sensitive data in any stage of research: from sampling and data collection to cleaning and analysis. This page discusses encryption in transit and at rest; key pairs; password management; and encryption with SurveyCTO data.

Read First

  • Store encrypted data iefolder’s EncryptedData folder. Note that while iefolder makes the folder, it does not encrypt it.
  • World Bank SurveyCTO server data must be encrypted via SurveyCTO.
  • Almost all encryption depends on a public/private key pair, which should be securely stored in a password manager.
  • Encryption in transit, or encryption while data is sent over the internet, is extremely important: there is never a case when not using encryption in transit is at all ok. Encryption at rest, or encryption while data is stored on a server or computer, is also important.

Encryption in Transit and at Rest

Encryption in Transit

Encryption in transit is by far the most important type of encryption. Service providers almost always take care of (i.e. Survey Solutions, SurveyCTO, OneDrive). However, when using less established services, confirm that they use encryption in transit by looking at the internet address. If the service provider uses secure transfer methods, you will see https:// in the internet address instead of simply http://. Data sent from an API using http:// can easily be spied on.

Never send anything of importance over the internet unless the URL starts with https://. Data transferred over an http:// connection can often be openly read by every server through which the data passes. Those servers are controlled by governments and private companies; hackers can easily tap into their traffic and read data, copy files, read passwords etc. https:// is not the only secure way to transfer data over the internet, but it is the one that researchers should know of as we use it frequently. If setting up advanced protocols to send files, make sure that they are set up to be secure. For example, instead of using FTP, use FTPS.

Note that encryption in transit has nothing to do with a service requiring a username or password. While a password-protected resource can only be requested by someone with the correct password, once it is in transit to the authorized user, the servers handling it can still see it if it is not encrypted.

Encryption at Rest

Encryption at rest means that the data stored on the server is scrambled in such a way that it is unreadable by anyone – even if they can access the file directly. If the data is not encrypted at rest, then anyone with access to the server can read that data (including, for example, the host company and your team’s administrators). If the data is encrypted at rest, however, the data is impossible to read even if someone gained unauthorized access to the database or the files where the data is stored. Encryption at rest uses an authorization tool called a public/private key pair.

Encryption at rest is not as easy to implement as encryption in transit. Both types of encryption use a private/public key pair to ensure that no unauthorized person gets access to the data. Since the time during which the data must be encrypted in transit is so short, the web servers never need to give users the key pair. Once the transfer is complete, the key pair is discarded and never needed or used again. However, in encryption at rest, the research team encrypts the data at one point in time and will access the data at a later point in time. The computer therefore must give the key pair to a human. As is so often the case, the weakest link is always the human factor. No private/public key pair is secure if the computer that generated the key saves it or is able to re-generate it. Thus, humans must safekeep the key pair. If we lose it, the data is lost forever and there is no way whatsoever to regain access to it.

The exact way in which encryption at rest is implemented depends on the service. For more information, read the instruction specific to your service.

Public/Private Key Pair

The keys used in the private/public key pair are either strings or small files. Exactly how the private/public key pair is created differs depending on the service. Complex mathematical relationships connect the two keys (we never need to understand these), allowing anyone with the private key to decrypt anything encrypted by someone with the public key. In essence, the public/private key pair system is like a vault with two doors. One door has a tiny opening where you can only put things in the vault but not take anything out. To open this door, you only need the public key. Since the door cannot be used to take anything out of the vault, it is safe for multiple people to have this key. The second door is a big door that can be used to take out all the content of the vault. To open this door, you need the private key. It is therefore very important to restrict access to the private key.

The following principles always hold for public/private key pairs:

  • If you lose your private key, there is no way of decrypting your data ever again. Your data is lost forever. To safeguard the private key, use a password manager.
  • The key pair can only be created once. Any services that claim to be able to re-generate the private key for you are not safe: this is equivalent to them having your password. This either means that the data is not properly encrypted or that they saved a copy of your private key that gives them full access to your encrypted data.
  • It is perfectly fine if the service keeps a copy of your public key. This allows them to encrypt new data as it is coming in. Some services only give you the private key and you do not have to worry about the public key. It depends on the context.
  • Most services provide a convenient way of decrypting your data, but if you are not asked for the key every time the data is viewed or decrypted, then your data is not encrypted properly.

Password and Key Management

When using public/private key pairs, you need to keep a lot of keys safe and well-organized. You can never store, organize, or transfer them using unsecure methods like email or Dropbox. DIME Analytics recommends using password managers as a convenient and safe solution for storing private keys and passwords. While there are many password managers out there, DIME Analytics recommends LastPass, whose free, basic tier satisfies all requirements a research team would ever need.

Password managers like LastPass allow you to store small text files or string fields, as private/public key pairs sometimes come in file format. LastPass can save keys as small text files or as strings for you as secure notes, which you can also use for other important information, like noting to which encryption container the private key pertains. Password managers like LastPass can also help you randomize passwords so that your passwords are impossible-to-guess, long strings. Since the password manager store and remembers this password for you, you will not need to memorize it and you can easily have a different long random string for each account you have.

Some password managers allow you to share passwords across accounts. This is a great feature: if one person updates the password, then everyone else has the updated password in their account immediately. Password sharing is sometimes a paid feature. One way around paying is for the research team to create one account and share access to that account. However, make sure to not share this account to more people than those who really needs it.

Encryption with SurveyCTO Data

When programming questionnaires in SurveyCTO, use the key generator to create a public-private key pair for your data with the name of your intended survey. These will download two files to your laptop: these are the keys. Create an encrypted survey with the public key. Then put the keys in the vault via LastPass or VeraCrypt. Note that you will need the private key for the form data download. Store encrypted data iefolder’s EncryptedData folder. Note, however, that while iefolder makes the folder, it does not encrypt it.

In your first cleaning do-file, de-identify the PII data to create a de-identified dataset. If you’re using Veracrypt, the veracrypt command allows Stata to call for VeraCrypt to mount the drive. Then you have to manually enter the password anytime you run the do-file. The first cleaning file should move the non-PII version of your data to the regular data folder.

Back to Parent

This article is part of the topic Data Security

Additional Research