你好,我是风一样的树懒,一个工作十多年的后端开发,曾就职京东、阿里等多家互联网头部企业。
文章可能会比较长,主要解析的非常详解,或涉及一些底层知识,供面试高阶难度用。可以根据自己实际理解情况合理取舍阅读
以下是Java线程堆栈分析的 全流程指南,涵盖 关键分析步骤、常见问题模式识别 和 实战排查技巧,助您快速定位多线程问题:
jstack <PID> > thread_dump.txt # 基础命令
jstack -l <PID> >> thread_dump.txt # 附加锁信息(检测死锁)
# 附加到Java进程后执行
thread --all # 查看所有线程
thread -n 3 # 最忙线程Top3
thread -b # 检测死锁
图形化界面直接查看线程状态和堆栈
支持实时监控和堆栈导出
kubectl exec <pod-name> -- jstack 1 > dump.txt # 容器内PID通常为1
"http-nio-8080-exec-1" #31 daemon prio=5 os_prio=0 tid=0x00007f8d3c1d8000 nid=0x1e3 runnable [0x00007f8d25ef9000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <0x0000000716e0c4c8> (a sun.nio.ch.Util$3)
- locked <0x0000000716e0c4e0> (a java.util.Collections$UnmodifiableSet)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at org.apache.tomcat.util.net.NioEndpoint$Poller.run(NioEndpoint.java:743)
字段 | 说明 |
线程名称 | 自定义或框架生成的标识名 |
daemon | 是否是守护线程 |
prio | 线程优先级(1-10) |
tid | 线程ID(十六进制) |
nid | 原生线程ID(转为十六进制用于匹配top输出) |
Thread.State | 线程状态(核心分析点) |
locked | 持有的锁对象地址 |
状态 | 触发场景 | 典型问题线索 |
RUNNABLE | 正在执行或等待CPU调度 | CPU密集型代码、死循环 |
BLOCKED | 等待进入synchronized代码块 | 锁竞争、同步瓶颈 |
WAITING | 无期限等待(Object.wait()) | 未正确唤醒、资源死锁 |
TIMED_WAITING | 带超时的等待(sleep, join等) | 超时设置不当、I/O阻塞 |
NEW | 线程已创建未启动 | 线程泄露前兆 |
TERMINATED | 线程已终止 | 正常结束 |
Found one Java-level deadlock:
=============================
"Thread-1":
waiting to lock monitor 0x00007f88d8005278 (object 0x000000076d2a9c58, a java.lang.Object),
which is held by "Thread-0"
"Thread-0":
waiting to lock monitor 0x00007f88d8006208 (object 0x000000076d2a9c68, a java.lang.Object),
which is held by "Thread-1"
解决方案:使用jstack -l或Arthas的thread -b命令自动检测
"pool-1-thread-3" #17 prio=5 os_prio=0 tid=0x00007f9aec09d800 nid=0x1daf waiting on condition [0x00007f9ae2ae7000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000717023e08> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
问题分析:任务队列无限制增长,线程等待新任务
"http-nio-8080-exec-5" #39 daemon prio=5 os_prio=0 tid=0x00007f9aec2d4000 nid=0x1dc3 waiting on condition [0x00007f9ae23e6000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000717159d38> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at com.mchange.v2.resourcepool.BasicResourcePool.awaitAvailable(BasicResourcePool.java:1465)
at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:644)
诊断要点:连接池线程长时间等待可用连接
使用top -H -p <PID>获取高CPU线程的NID
将十进制NID转为十六进制(printf "%x\n" 12345)
在堆栈文件中搜索nid=0x...
- locked <0x0000000716e0c4c8> (a com.example.OrderService)
- waiting to lock <0x0000000716e0c4c8> (a com.example.OrderService)
优化方案:减小锁粒度、改用并发集合、使用读写锁
检查线程池配置(特别是corePoolSize/maxPoolSize)
查找线程创建堆栈:
"Thread-987" #1001 daemon prio=5 os_prio=0...
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:717)
at com.example.TaskProcessor.createNewThread(TaskProcessor.java:23)
FastThread[1]:上传线程堆栈自动生成分析报告
# 统计各状态线程数
grep "java.lang.Thread.State" thread_dump.txt | sort | uniq -c
# 查找等待特定锁的线程
grep -B 1 "waiting to lock" thread_dump.txt
1.定时采集策略:在CPU/Memory突增时自动抓取堆栈
2.对比分析:正常与异常时段的线程堆栈对比
new ThreadPoolExecutor(..., new CustomThreadFactory("OrderProcessor"));
4.监控告警:设置线程数阈值告警(如>500)
通过系统化的线程堆栈分析,可以快速诊断:
✅ 死锁/活锁问题
✅ 资源竞争瓶颈
✅ 线程泄露风险
✅ 不合理线程池配置
✅ 第三方库的潜在缺陷
今天的内容就分享到这儿,喜欢的朋友可以关注,点赞。有什么不足的地方欢迎留言指出,您的关注是我前进的动力!