Spark Driver : Is DRIVER_HOST_ADDRESS hostname or ip?

问题发现

在集群外spark-client模式提交spark作业,driver会把自己的hostname或者ip发送给AM,但到底什么时候是hostname,什么时候是ip呢?
注:DRIVER_HOST_ADDRESS是hostname或ip问题重大,关乎到Spark Application是否可以成功运行,因为集群节点并不一定能解析出Driver hostname对应的ip地址。

分析思路

sparkcontext初始化时会设置spark.driver.host
_conf.set(DRIVER_HOST_ADDRESS, _conf.get(DRIVER_HOST_ADDRESS))

DRIVER_HOST_ADDRESS的值有下列规则

  • 若设置SPARK_LOCAL_HOSTNAME ,则直接取设置的值;
  • 若没有则通过getCanonicalHostName来获取,此方法为java.net包下InetAddress类的方法

接着分析Spark在构造InetAddress对象时,通过getLocalHost方法,该方法只会匹配第一次读取到的结果,如果获取的地址为环回地址,会遍历找一个非环回的ipv4地址替代。

getCanonicalHostName调用了getHostFromNameService,该方法执行思路如下:

  • 从nameService里查找前面传入地址对应的host,找得到,将host设置为hostname,找不到host,抛UnknownHostException
  • 判断是否进行安全检查,若是,进行安全检查,此处若检查失败抛出SecurityException,

两个异常捕获后,均将host设置为地址,之后返回host。
实际操作中nameService主要为/etc/hosts下的域名ip表。

实验验证

  1. 首先将代码最小化还原
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import java.net.{Inet4Address, InetAddress, NetworkInterface}
import scala.collection.JavaConverters._


object HostName {
private def findLocalInetAddress(): InetAddress = {
val address = InetAddress.getLocalHost
if (address.isLoopbackAddress) {
val activeNetworkIFs = NetworkInterface.getNetworkInterfaces.asScala.toSeq
val reOrderedNetworkIFs = activeNetworkIFs.reverse
for (ni <- reOrderedNetworkIFs) {
val addresses = ni.getInetAddresses.asScala
.filterNot(addr => addr.isLinkLocalAddress || addr.isLoopbackAddress).toSeq
if (addresses.nonEmpty) {
val addr = addresses.find(_.isInstanceOf[Inet4Address]).getOrElse(addresses.head)
// because of Inet6Address.toHostName may add interface at the end if it knows about it
val strippedAddress = InetAddress.getByAddress(addr.getAddress)
print("Your hostname, " + InetAddress.getLocalHost.getHostName + " resolves to" +
" a loopback address: " + address.getHostAddress + "; using " +
strippedAddress.getHostAddress + " instead (on interface " + ni.getName + ")")
return strippedAddress
}
}
}
address
}


def main(args: Array[String]): Unit = {
val addr = findLocalInetAddress()
print(s"addr :$addr\nCanonicalHostName:${addr.getCanonicalHostName}")
}
}

代码打包,命名为GetHostName-1.0-SNAPSHOT-jar-with-dependencies.jar,上传到测试服务器,这里为hostnameA。
jar运行命令如下:

1
java -cp GetHostName-1.0-SNAPSHOT-jar-with-dependencies.jar HostName
  1. 查看hostnameA上hosts文件
1
2
3
4
5
6
7
127.0.0.1 hostnameA hostnameA
127.0.0.1 localhost.localdomain localhost
127.0.0.1 localhost4.localdomain4 localhost4

::1 hostnameA hostnameA
::1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6

预期实验结果:hostname对应的是环回地址,用本机非环回ip替换,之后由于本机环回ip对应的hostname找不到,直接返回该ip
真实实验结果:

1
2
Your hostname, hostnameA resolves to a loopback address: 127.0.0.1; using 192.168.10.2 instead (on interface eth1)addr :/192.168.10.2
CanonicalHostName:192.168.10.2

对比实验1:

1
2
3
4
5
6
7
8
127.0.0.1 hostnameA hostnameA
127.0.0.1 localhost.localdomain localhost
127.0.0.1 localhost4.localdomain4 localhost4

::1 hostnameA hostnameA
::1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
192.168.10.2 hostnameA hostnameA

预期实验结果:仍然是用非环回ip替代环回ip,但可以查到该ip对应的hostname,故返回hostname
真实实验结果:

1
2
Your hostname, hostnameA resolves to a loopback address: 127.0.0.1; using 192.168.10.2 instead (on interface eth1)addr :/192.168.10.2
CanonicalHostName:hostnameA

对比实验2:

在实验1的基础上注释掉该hostname对应的环回地址

1
2
3
4
5
6
7
8
# 127.0.0.1 hostnameA hostnameA
127.0.0.1 localhost.localdomain localhost
127.0.0.1 localhost4.localdomain4 localhost4

::1 hostnameA hostnameA
::1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
192.168.10.2 hostnameA hostnameA

预期实验结果:没有环回地址替换,返回本机hostname
真实实验结果:

1
2
addr :hostnameA/192.168.10.2
CanonicalHostName:hostnameA

对比实验3:

在实验2的基础上注释掉该hostname对应的ipv6一行及最后一行对应的地址

1
2
3
4
5
6
7
8
# 127.0.0.1 hostnameA hostnameA
127.0.0.1 localhost.localdomain localhost
127.0.0.1 localhost4.localdomain4 localhost4

# ::1 hostnameA hostnameA
::1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
# 192.168.10.2 hostnameA hostnameA

预期实验结果:找不到该hostname对应的地址
真实实验结果:

1
2
3
4
5
6
7
8
9
Exception in thread "main" java.net.UnknownHostException: hostnameA: hostnameA: Name or service not known
at java.net.InetAddress.getLocalHost(InetAddress.java:1506)
at com.wtx.hostname.HostName.main(HostName.java:13)
Caused by: java.net.UnknownHostException: VM_16_17_centos: Name or service not known
at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
at java.net.InetAddress.getLocalHost(InetAddress.java:1501)
... 1 more

总结

根据不同的需求进行相应的设置

  1. 若Spark Driver运行在集群外,且集群不一定有该Driver hostname对应的ip,则
    Driver 客户端所处的/etc/hosts最好不要包含该hostname对应的非环回IP地址
    如192.168.10.2 hostnameA hostnameA

  2. 若Driver运行在集群节点上,则无所谓。