Sunday, March 27, 2016

Log4jv2 Maximum Performance without loosing location info

Apache Log4j (version 1.x) had been around for a while now and is considered as the defacto standard in log management among the Java Community. For typical web applications the performance levels provided by this version could be good enough. But when it comes to applications where millisecond level or beyond performance levels are required it is always a concern how fast your log management system should be. The more logs we add to code it makes production issue debugging easy. But when the logging system is not fast enough it makes your code executions slow. Log4j supports log levels to address this (ALL,DEBUG,ERROR,INFO etc.). If you have properly set log levels to your log messages when you are debugging your application you can set a log level which prints lots of logs and when you run the system in production you can set a log level which prints less number of logs to make system works fast. When a production bug is raised and you have been given log files, how many times we have wished if our code had more logs? More logs means a performance degradation. So we always try to balance the number of log lines versus speed of code execution.
In Log4j version 1.x a bit of performance gain can be achieved using Async Appenders. Although it could be a good way to get some quick boost to the logging system without upgrading to version 2.x, the gain can be achieved is very limited.
Apache released the Log4j version 2.x recently which introduced Async Loggers (this should not be confused with Async Appenders). Async loggers made a huge impact on performance of the Log4J system due to the outstanding performance of the Disrupter, a lock free inter thread communications library from LMAX. Async loggers can boost the speed of logging up to 68 times compared to log4J version 1.x synchronous loggers.
But for this exceptional performance gain comes at the cost of restricting some formatting options. Simply put "Location information" should not be used in formatting. And %C or $class%F or %file%l or %location%L or %line%M or %method should not be used. Log4j uses stack trace to figure out the class name and line number of the log and this cannot be done once log entry is handed over to the logger thread. Hence Log4j need to wait till stack trace snapshot is taken before passing it to the internal thread before returning. This makes the logging slow. Hence for the real top speed of logging we cannot print things like class name and line number. Any experienced developer knows that it is useless to have a log file without class names and line numbers of the log entries.
Obvious alternative is adding class name and line number to the log line itself. In this way log4j system doesn't need to find them at run time making performance complications. Example,
public Order processNewOrder(NewOrderTRSRequest newOrderRequest) {
        logger.info("[NormalOrderController.java:261] processNewOrder() called. Time passed ={}ms", NanoWatch.getDuration());
        logger.info("[NormalOrderController.java:262] New Order Waiting to obtain the lock - {}", newOrderRequest);
        long startTime = System.currentTimeMillis();
        Order order;
As you can see, NormalOrderController.java is the class name and 261, 262 are the line numbers of two log lines here.
This might look like a very stupid idea. Obvious questions raised are; we have a huge code base with thousands of log lines, how can we add class name and line number to each of them? Not only that when we add or remove lines to or from the classes, the hard coded line numbers are no longer valid, and have to be adjusted. This is why we developed an Intellij IDEA plugin to do this. Using this plugin it is possible to add classnames and line numbers and correct them in one click. This plugin is freely available in Intellij IDEA plugin repository for anyone to download. If you are interested in the source code, feel free to download it from here. Although this works only on Intellij IDEA it should be a simple task to port this to other IDEs like Eclipse and NetBeans.
Finally with this we have the super performance of log4j async loggers 'with location information'.
Note: It was a debate whether to develop an IDEA plugin or Maven plugin. I believe same result can be achieved in either way. If you prefer not to have classname/line numbers in the code (hence in the source code repository) better option is to develop a maven plugin so that class names and line numbers get added to the releases only. For your local debugging you can enable the location info in log4j. But if you tend to do performance testing quite often while optimising code then adding classname and line number to the code using this IDEA plugin make sense.

No comments: