Authentication in Hadoop cluster: MIT Kerberos and Active Directory

There are different options how to activate kerberos in Hadoop cluster.

Typically in organizations there is already installed Microsoft Active Directory. All users are maintained in this Active Directory. Active Directory provides Kerberos service and LDAP service.

First and the simplest option is to use Kerberos from Active Directory, but here you need think, that Hadoop needs a large list of Kerberos principles. You can imagine, that you need one principle per service (hdfs (name nodes, data nodes), hive, yarn, spark, hbase, …) multiplied by number of hosts in cluster. Sometimes this is difficult to implement.

Second option is to install MIT Kerberos dedicated for Hadoop cluster and activate trusted one way relation with company-wide Active Directory. This way is described here: . Hadoop services are maintained by dedicated MIT Kerberos for Hadoop cluster, and users are inside of Active Directory. And users because of one way trust relationship can use Hadoop services.

Third option, if Active Directory is completely independent, and trust relationship is also not possible, but you would like that users have possibility to use Hadoop services. In this case you can install MIT Kerberos, which is dedicated for Hadoop Cluster, similar like in second option, but instead of trust relationship with Active Directory you can use LDAP service to authenticate users for usage of Hadoop services. Here you need to create one technical principle in MIT Kerberos, which can access Hadoop services on behalf of LDAP users. But to make access impersonate, you should use proxy users concept in Hadoop cluster: .

How to implement third option for Apache Zeppelin is described here: