现象:
公司新系统要上线了,生产环境搭建Nacos集群之后,发现有些节点无法被识别。
nacos.log
日志文件显示节点正常启动,没有异常日志,不过看naming-server.log
日志报了一些异常,无法匹配节点信息。
1
2
3
4
5
6
7
8
9
10
11
12
13
2022-01-11 08:31:03,630 WARN NamingProxy
java.io.IOException: failed to req API:http://10.*.*.*:8848/nacos/v1/ns/operator/cluster/state. code:500 msg: caused: unable to find local peer: 10.*.*.*:8848, all peers: [];
at com.alibaba.nacos.naming.misc.NamingProxy.reqCommon(NamingProxy.java:321)
at com.alibaba.nacos.naming.cluster.ServerListManager$ServerInfoUpdater.run(ServerListManager.java:183)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2022-01-11 08:31:04,688 WARN [STATUS-SYNCHRONIZE] failed to request serverStatus, remote server: 10.*.*.*:8848
如果服务器有多个网卡或用虚拟机、docker的时候有很大概率能碰上类似问题,用内网IP做集群而Nacos识别的是外网IP亦或反过来。
解决:
修改每个节点的startup.sh文件,加上-Dnacos.server.ip
JVM参数即可解决问题。
1
2
3
4
5
6
7
8
9
10
JAVA_OPT="${JAVA_OPT} -Dnacos.server.ip=selfIp" # selfIp换成我们集群服务器的IP,也就是cluster.conf里面指定的IP
JAVA_OPT="${JAVA_OPT} -Dloader.path=${BASE_DIR}/plugins/health,${BASE_DIR}/plugins/cmdb"
JAVA_OPT="${JAVA_OPT} -Dnacos.home=${BASE_DIR}"
JAVA_OPT="${JAVA_OPT} -jar ${BASE_DIR}/target/${SERVER}.jar"
JAVA_OPT="${JAVA_OPT} ${JAVA_OPT_EXT}"
JAVA_OPT="${JAVA_OPT} --spring.config.additional-location=${CUSTOM_SEARCH_LOCATIONS}"
JAVA_OPT="${JAVA_OPT} --logging.config=${BASE_DIR}/conf/nacos-logback.xml"
JAVA_OPT="${JAVA_OPT} --server.max-http-header-size=524288"
可参考源码:config
模块下com.alibaba.nacos.config.server.utils.SystemConfig.java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
/**
* System config.
*
* @author Nacos
*/
public class SystemConfig {
public static final String LOCAL_IP = getHostAddress();
private static final Logger LOGGER = LoggerFactory.getLogger(SystemConfig.class);
private static String getHostAddress() {
String address = System.getProperty("nacos.server.ip");
if (StringUtils.isNotEmpty(address)) {
return address;
} else {
address = InternetAddressUtil.localHostIP();
}
try {
Enumeration<NetworkInterface> en = NetworkInterface.getNetworkInterfaces();
while (en.hasMoreElements()) {
NetworkInterface ni = en.nextElement();
Enumeration<InetAddress> ads = ni.getInetAddresses();
while (ads.hasMoreElements()) {
InetAddress ip = ads.nextElement();
// Compatible group does not regulate 11 network segments
if (!ip.isLoopbackAddress() && ip.getHostAddress().indexOf(":") == -1
/* && ip.isSiteLocalAddress() */) {
return ip.getHostAddress();
}
}
}
} catch (Exception e) {
LOGGER.error("get local host address error", e);
}
return address;
}
}
集群环境有变化时,最好把data目录下derby-data
,protocal
两个目录删掉后再启动,以免因缓存问题发生其他一些问题。