As I mentioned in a previous post, I’ve been working on some import/export functionality.
One side of the fence is a database and another is HDFS.
I have an authenticated user on the database who is authorized to access some data and wants it on HDFS owned by him in his home directory. How do I propagate the authentication/authorization?
Turns out that Hadoop does support a secure impersonation feature. In some sense it’s kind of close to how this database supports a proxy impersonation.
In essence, we will configure the Hadoop cluster to recognize the UNIX process account running the middle-ware component as a super-user. We will further specify the IP’s that the proxy requests can originate from. Lastly we specify to what groups of users the proxy super user can impersonate. Of course the middle-ware components will need to invoke the security proxy.
So that’s how it’s supposed to work. In progress for setting up environment for testing. Hope things go smoothly but already know once the Hadoop cluster goes to Kerberos this will break until the middle-ware process account goes to Active-Directory which will be an issue.