본문으로 건너뛰기

Nacos集群(clustering)异常,unable to find local peer: *.*.*.*, all peers: []

· 약 3분

现象:

公司新系统要上线了,生产环境搭建Nacos集群之后,发现有些节点无法被识别。

nacos.log日志文件显示节点正常启动,没有异常日志,不过看naming-server.log日志报了一些异常,无法匹配节点信息。

2022-01-11 08:31:03,630 WARN NamingProxy

java.io.IOException: failed to req API:http://10.*.*.*:8848/nacos/v1/ns/operator/cluster/state. code:500 msg: caused: unable to find local peer: 10.*.*.*:8848, all peers: [];
at com.alibaba.nacos.naming.misc.NamingProxy.reqCommon(NamingProxy.java:321)
at com.alibaba.nacos.naming.cluster.ServerListManager$ServerInfoUpdater.run(ServerListManager.java:183)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2022-01-11 08:31:04,688 WARN [STATUS-SYNCHRONIZE] failed to request serverStatus, remote server: 10.*.*.*:8848

如果服务器有多个网卡或用虚拟机、docker的时候有很大概率能碰上类似问题,用内网IP做集群而Nacos识别的是外网IP亦或反过来。

解决:

修改每个节点的startup.sh文件,加上-Dnacos.server.ipJVM参数即可解决问题。


JAVA_OPT="${JAVA_OPT} -Dnacos.server.ip=selfIp" # selfIp换成我们集群服务器的IP,也就是cluster.conf里面指定的IP
JAVA_OPT="${JAVA_OPT} -Dloader.path=${BASE_DIR}/plugins/health,${BASE_DIR}/plugins/cmdb"
JAVA_OPT="${JAVA_OPT} -Dnacos.home=${BASE_DIR}"
JAVA_OPT="${JAVA_OPT} -jar ${BASE_DIR}/target/${SERVER}.jar"
JAVA_OPT="${JAVA_OPT} ${JAVA_OPT_EXT}"
JAVA_OPT="${JAVA_OPT} --spring.config.additional-location=${CUSTOM_SEARCH_LOCATIONS}"
JAVA_OPT="${JAVA_OPT} --logging.config=${BASE_DIR}/conf/nacos-logback.xml"
JAVA_OPT="${JAVA_OPT} --server.max-http-header-size=524288"

可参考源码:config模块下com.alibaba.nacos.config.server.utils.SystemConfig.java

/**
* System config.
*
* @author Nacos
*/
public class SystemConfig {

public static final String LOCAL_IP = getHostAddress();

private static final Logger LOGGER = LoggerFactory.getLogger(SystemConfig.class);

private static String getHostAddress() {
String address = System.getProperty("nacos.server.ip");
if (StringUtils.isNotEmpty(address)) {
return address;
} else {
address = InternetAddressUtil.localHostIP();
}
try {
Enumeration<NetworkInterface> en = NetworkInterface.getNetworkInterfaces();
while (en.hasMoreElements()) {
NetworkInterface ni = en.nextElement();
Enumeration<InetAddress> ads = ni.getInetAddresses();
while (ads.hasMoreElements()) {
InetAddress ip = ads.nextElement();
// Compatible group does not regulate 11 network segments
if (!ip.isLoopbackAddress() && ip.getHostAddress().indexOf(":") == -1
/* && ip.isSiteLocalAddress() */) {
return ip.getHostAddress();
}
}
}
} catch (Exception e) {
LOGGER.error("get local host address error", e);
}
return address;
}

}

集群环境有变化时,最好把data目录下derby-data,protocal两个目录删掉后再启动,以免因缓存问题发生其他一些问题。