You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
arthas/_sources/arthas_3_0/new_feature.md.txt

196 lines
13 KiB
Plaintext

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

Arthas 3.0新特性介绍
===
Arthas 3.0在架构上做了重大改造,通过引入`termd`完整的支持了telnet协议并基于websocket封装了telnet协议实现了`Telnet over HTTP`使得在线诊断成为了可能并且保证了本地启动telnet远程访问在线诊断三种方式的体验完全一致。都支持自动补全高亮显示等功能。并且支持`多人同时在线诊断`。
### 在线诊断
Arthas 3.0最重要的特性通过Arthas在线诊断平台无需再登陆目标机器一键启动Arthas并开启诊断
![image](TODO image.png)
具体使用方法请参见[在线诊断使用说明](https://github.com/alibaba/arthas/wiki/arthas_3_0/home)
### 管道支持
Arthas 3.0开始支持管道, 率先提供了`grep`,`wc`,`plaintext`的支持。
```bash
java.vendor.url http://java.oracle.com/
java.vm.vendor Oracle Corporation
java.runtime.name Java(TM) SE Runtime Environment
sun.java.command org.apache.catalina.startup.Bootstrap start
java.class.path /Users/wangtao/work/ali-tomcat-home/ant-develop/output/build/bin/bootstrap.jar
java.vm.specification.name Java Virtual Machine Specification
java.vm.specification.version 1.8
java.awt.headless true
java.io.tmpdir /Users/wangtao/work/ali-tomcat-home/ant-develop/output/build/temp
java.vendor.url.bug http://bugreport.sun.com/bugreport/
java.awt.graphicsenv sun.awt.CGraphicsEnvironment
java.ext.dirs /Users/wangtao/Library/Java/Extensions:/Library/Java/JavaVirtualMachines/jdk1.
/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java
java.vm.name Java HotSpot(TM) 64-Bit Server VM
java.specification.version 1.8
$ sysprop | grep java | wc -l
36
```
### 启动自检
针对启动时经常出现的权限问题在启动脚本中增加了自检逻辑脚本会自动判断目标进程是否具备attach权限并给出进一步的提示。
```
[huxing.zhx@v125056161.bja /home/huxing.zhx]
$./as.sh 32260
The current user (huxing.zhx) does not match with the owner of process 32260 (admin).
To solve this, choose one of the following command:
1) sudo su admin && ./as.sh
2) sudo -u admin -EH ./as.sh
```
### 去groovy依赖
groovy表达式在arthas2.0中大量使用例如watch表达式
```bash
watch com.alibaba.sample.petstore.web.store.module.screen.ItemList add "params + ' ' + returnObj" params.size()==2
```
其中`"params + ' ' + returnObj"`以及`params.size()==2`背后其实都使用了groovy来进行表达式求值如果反复大量的运行这些表达式groovy会创建大量的classloader打满perm区从而触发FGC。
为了避免这个问题Arthas 3.0中使用了ognl这个更加轻量的表达式求值库来代替groovy彻底解决了groovy引起的FGC风险。但由于这个替换导致原来使用groovy脚本编写的自定义脚本失效。这个问题留待后续解决。
在3.0中watch命令的表达式部分的书写有了一些改变详见[这里](https://alibaba.github.io/arthas/watch)
### 提升rt统计精度
Arthas 2.0中统计rt都是以`ms`为单位对于某些比较小的方法调用耗时在毫秒以下的都会被认为是0ms造成trace总时间和各方法的时间相加不一致等问题虽然这里面确实会有误差主要Arthas自身的开销。Arthas 3.0中所有rt的单位统一改为使用`ns`来统计精准捕获你的方法耗时让0ms这样无意义的统计数据不再出现
```
$ tt -l
INDEX TIMESTAMP COST(ms) IS-RET IS-EXP OBJECT CLASS METHOD
------------------------------------------------------------------------------------------------------------------------------------------------------------
1000 2017-02-24 10:56:46 808.743525 true false 0x3bd5e918 TestTraceServlet doGet
1001 2017-02-24 10:56:55 805.799155 true false 0x3bd5e918 TestTraceServlet doGet
1002 2017-02-24 10:57:04 808.026935 true false 0x3bd5e918 TestTraceServlet doGet
1003 2017-02-24 10:57:22 805.036963 true false 0x3bd5e918 TestTraceServlet doGet
1004 2017-02-24 10:57:24 803.581886 true false 0x3bd5e918 TestTraceServlet doGet
1005 2017-02-24 10:57:39 814.657657 true false 0x3bd5e918 TestTraceServlet doGet
```
### watch/stack/trace命令支持按耗时过滤
我们在trace的时候经常会出现某个方法间隙性的rt飙高但是我们只想知道rt高的时候是哪里慢了对于正常rt的方法我们并不关心Arthas 3.0支持了按`#cost`(方法执行耗时,单位为`ms`)进行过滤只输出符合条件的trace路径。
具体用法为:
```
trace/watch/stack class_name method cost_expression
```
例如:
```
$ trace test.arthas.TestTraceServlet doGet #cost>800
Press Ctrl+C to abort.
Affect(class-cnt:1 , method-cnt:1) cost in 137 ms.
trace test.arthas.TestTraceServlet doGet #cost>800
`---thread_name=http-bio-8080-exec-9;id=6e;is_daemon=true;priority=5;TCCL=org.apache.catalina.loader.WebappClassLoader
`---[816.880001ms] test.arthas.TestTraceServlet:doGet()
+---[min=0.019223ms,max=1.192115ms,total=1.211338ms,count=2] java.lang.System:currentTimeMillis()
+---[1.407006ms] test.arthas.TestTraceServlet:call1()
+---[381.970892ms] test.arthas.TestTraceServlet:callhsfL1()
+---[0.018866ms] test.arthas.TestPathTrace:<init>()
+---[375.753301ms] test.arthas.TestPathTrace:callPathTrace()
+---[12.352252ms] test.arthas.TestTraceServlet:call3()
+---[2.758025ms] test.arthas.TestTraceServlet:call4()
+---[1.246057ms] test.arthas.TestTraceServlet:call5()
+---[10.306568ms] test.arthas.TestTraceServlet:call6()
+---[8.891933ms] test.arthas.TestTraceServlet:call7()
+---[4.030325ms] test.arthas.TestTraceServlet:call8()
+---[6.51316ms] test.arthas.TestTraceServlet:call9()
+---[0.059405ms] javax.servlet.http.HttpServletResponse:getWriter()
+---[0.013107ms] java.lang.StringBuilder:<init>()
+---[min=0.004892ms,max=0.06357ms,total=0.100672ms,count=3] java.lang.StringBuilder:append()
+---[0.018255ms] java.lang.StringBuilder:toString()
`---[0.028812ms] java.io.PrintWriter:write()
```
上述命令只有当`test.arthas.TestTraceServlet#doGet`方法执行耗时大于800ms时才会输出。
### trace命令优化
#### 自动高亮显示最耗时方法调用
trace命令现在会自动显示
![image](TODO/image.png)
#### 带条件过滤的多级trace
目前trace默认只输出一级方法调用耗时有时候并不能完全看出问题。但是如果展开多级的话每一个方法的耗时都统计会造成方法数量迅速膨胀大大增加trace的开销。
TODO
#### 显示当前线程的信息及eagleeye的traceId
```
trace test.arthas.TestTraceServlet doGet
`---thread_name=http-bio-8080-exec-10;id=da;is_daemon=true;priority=5;TCCL=org.apache.catalina.loader.WebappClassLoader;trace_id=1e09489014879085429791006d969d
```
### sysprop命令操作SystemProperty
sysprop命令支持查看所有的系统属性以及针对特定属性进行查看和修改。
```
$ sysprop
...
os.arch x86_64
java.ext.dirs /Users/wangtao/Library/Java/Extensions:/Library/Java/JavaVirtualMachines/jdk1.
8.0_51.jdk/Contents/Home/jre/lib/ext:/Library/Java/Extensions:/Network/Library
/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java
user.dir /Users/wangtao/work/ali-tomcat-home/ant-develop/output/build
catalina.vendor alibaba
line.separator
java.vm.name Java HotSpot(TM) 64-Bit Server VM
file.encoding UTF-8
org.apache.tomcat.util.http.ServerCookie.ALLOW_EQUA true
LS_IN_VALUE
com.taobao.tomcat.info Apache Tomcat/7.0.70.1548
java.specification.version 1.8
$ sysprop java.version
java.version=1.8.0_51
$ sysprop production.mode true
Successfully changed the system property.
production.mode=true
```
### thread命令支持指定采样时间
thread命令计算线程cpu占用的逻辑默认是采样100ms内各个线程的cpu使用情况并计算cpu消耗占比。有时候100ms的时间间隔太短看不出问题所在Arthas3.0中thread命令支持设置采样间隔(以`ms`为单位)可以观察任意时间段内的cpu消耗占比情况。
```
$ thread -i 1000
Threads Total: 74, NEW: 0, RUNNABLE: 17, BLOCKED: 0, WAITING: 15, TIMED_WAITING: 42, TERMINATED: 0
ID NAME GROUP PRIORITY STATE %CPU TIME INTERRUPTED DAEMON
78 com.taobao.config.client.timer main 5 TIMED_WAITING 22 0:0 false true
92 Abandoned connection cleanup thread main 5 TIMED_WAITING 15 0:2 false true
361 as-command-execute-daemon system 10 RUNNABLE 14 0:0 false true
67 HSF-Remoting-Timer-10-thread-1 main 10 TIMED_WAITING 12 0:2 false true
113 JamScheduleThread system 9 TIMED_WAITING 2 0:0 false true
14 Thread-3 main 5 RUNNABLE 2 0:0 false false
81 com.taobao.remoting.TimerThread main 5 TIMED_WAITING 2 0:0 false true
104 http-bio-7001-AsyncTimeout main 5 TIMED_WAITING 2 0:0 false true
123 nioEventLoopGroup-2-1 system 10 RUNNABLE 2 0:0 false false
127 nioEventLoopGroup-3-2 system 10 RUNNABLE 2 0:0 false false
345 nioEventLoopGroup-3-3 system 10 RUNNABLE 2 0:0 false false
358 nioEventLoopGroup-3-4 system 10 RUNNABLE 2 0:0 false false
27 qos-boss-1-1 main 5 RUNNABLE 2 0:0 false true
22 EagleEye-AsyncAppender-Thread-BizLog main 5 TIMED_WAITING 1 0:0 false true
```