Authentication and authorization in Hadoop cluster

Here we explain concepts behind activation of security in Hadoop cluster.

As an example setup we have HDFS and Hive. Task is to configure security for these components.

Authentication

Everything starts from user. You enter user name and password in login screen and press submit button. System starts to authenticate you. Behind the scene user/password pair is compared with user/password stored in external directory. It can be LDAP, database, or simple properties file.

There are predefined authentication providers in Hive. For example, you can use predefined LDAP provider. It is needed to fill required properties, like LDAP url, with path to your LDAP server ( for example ldap://yourserver.com:389), base path to your user, for example ou=Users,o=Org. Default LDAP provider will use your user name and generates search string: uid=<user name>,ou=User,o=Org. If in your organization you use instead of uid cn=<user name>, then you should define this in guidKey property.

Sometimes for technical processes there is no LDAP authentication and user/password is taken from other sources, in this case you should override and extend default LDAP provider with your custom code.

Authorization

Authorization can be achieved by defining user groups. If user belongs to some group, than it can have special permissions. User can be more than in one group. Assignment of user to group can be also part of external directory. By default groups are imported from underlying Linux system using ShellBasedUnixGroupsMapping. To see available groups in Hadoop is possible with following command:

hdfs dfs -groups <user name>

if you use groups from Linux, then output of about command will be equal to:

groups <user name>

But it is possible to have another group provider or mix of them. For example, if you use LdapGroupsMapping then groups will be imported from LDAP, and you will be able to see them in HDFS also with same command: hdfs dfs -groups. But these groups are not available in Linux. Moreover, it is possible to have mixed group provider, to import groups, for example, from LDAP and Linux using CompositeGroupsMapping.

As a next step, we should define permissions based on groups, where imported on previous step. In HDFS you can use:

hdfs dfs -chown test:GroupFromLdap /user/test
hdfs dfs -chmod 555 /user/test 

In hive you can define permissions as:

create role new_role;
grant role new_role to GroupFromLdap;
gant all on table new_table to role new role;