Dasun Perera: March 2016

Sunday, March 27, 2016

Jboss EAP 6.x MDB FIFO Configuration

MDBs (Message Driven Beans) provide a perfect way to implement asynchronous requests processing in J2EE applications. Generally the sequence such messages get processed, is not very important for most applications. But there can be situations where we need to make sure that the request processing is strictly FIFO (First In First Out). JBoss EAP 6.x Message Driven Beans follows almost a FIFO path but unfortunately not always. When burst of messages come to the JMS (HornetQ) there is a chance that messages get pushed to MDB none sequentially. As per RedHat "The JBoss-EAP6's default JMS broker, HornetQ supports preserve message ordering complying JMS 1.1 specification. However this does not always guarantee the "strict order" and is the expected behaviour."

There are two ways to address this issue in EAP 6.4 (i.e. to make sure message pushing to MDBs is FIFO.)

Auto Group Feature

 <connection-factory name="RemoteConnectionFactory">  
      <connectors>  
          <connector-ref connector-name="netty"/>  
      </connectors   
      <entries>  
          <entry name="java:jboss/exported/jms/RemoteConnectionFactory"/>  
      </entries>  
         <auto-group>true</auto-group>  
 </connection-factory>

This is the easiest way to implement FIFO. Just set the auto-group entry to your connection factory. Please note RedHat official documentation has a typo and it says "autogroup" but correct setting is "auto-group". When you do this setting HornetQ JMS provider make sure that the messages pushed to MDBs are strictly FIFO. It is important to understand that only one instance of your MDB pool will be used, when you set auto-group. (This is obviously the way it should be. Otherwise if more than one instance of MDBs are passed with messages, even if MDB instance firing is FIFO, it is useless. Because there is no guarantee that first one will get finished before the second, since each runs in it's own thread.)

The problem with this auto-group feature is that the configuration is at Connection factory level. This means if you have more than one MDB and all are going to be strictly FIFO and as explained above, only one instance of the pool is going to be used. If your application requirement is to have one or few MDBs to be strictly FIFO and other MDBs to be just normal where messages get processed by a pool of instances, then auto-grouping is not the ideal solution. Best solution is to use a JMSXGroupID JMS property to set a group id at client application level (or in other words at message producer level)

JMS Property JMSXGroupID

 message.setStringProperty("JMSXGroupID", "Group-0");

Please note this is not a Jboss level setting. It should be done at the JMS message producer level (eg. Application which send JMS messages to JBOSS). And you can set any string and no need to specifically use "Group-0". In this solution also only one instance of the MDB instance in the pool is going to get used. But other MDBs which doesn't get this special property set messages has no impact and they will work in the usual behavior with pool of instances. So this way you can have some of your MDBs following strict FIFO, and others work with pool of instances.

Socket Server for Log4jv2 Socket Appender

Log4Jv2 supports different types of appenders. Socket Appender can be used to send log messages to a remote machine where a socket server is running. There are couple of socket server applications available which work with Socket Appenders.

SimpleSocketServer - This comes with log4j-1.2.x.x.jar file and you can start like below,

java -classpath <path>/log4j-1.2.17.jar org.apache.log4j.net.SimpleSocketServer 4712 log4j-server.properties

Chainsaw V2 - This is an elegant application with a proper UI.

We had a requirement to integrate the logs to our own monitoring application running in a remote machine. So, had to write our own application. Below code sample elaborates how to read logs and extract all details of the log entry. Please note that this is not a proper server implementation code but just written to elaborate how log event can be taken and information can be extracted from that.

You will have to add below dependancies to your POM file if your application is maven based. Otherwise you can download these from log4j download site and include in your CLASSPATH

        <dependency>
            <groupId>log4j</groupId>
            <artifactId>log4j</artifactId>
            <version>1.2.17</version>
        </dependency>
        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-core</artifactId>
            <version>2.4</version>
        </dependency>
        <dependency>
            <groupId>com.lmax</groupId>
            <artifactId>disruptor</artifactId>
            <version>3.2.0</version>
        </dependency>

package com.directfn.oms.integration.sim;

/**
 * Created by dasunperera on 10/8/15.
 */

import org.apache.logging.log4j.core.async.RingBufferLogEvent;
import java.io.ObjectInputStream;
import java.net.ServerSocket;
import java.net.Socket;

class TCPServer {
    public static void main(String argv[]) throws Exception {
        ServerSocket welcomeSocket = new ServerSocket(9500);
        Socket socket = welcomeSocket.accept();
        ObjectInputStream inputStream = new ObjectInputStream(socket.getInputStream());
        while (true) {
            RingBufferLogEvent loggingEvent = (RingBufferLogEvent) inputStream.readObject();
            System.out.println("Logger Name:" +loggingEvent.getLoggerName());
            System.out.println("Level" + loggingEvent.getLevel());
            System.out.println("Message" + loggingEvent.getMessage().getFormattedMessage());
            System.out.println("Thread Name" + loggingEvent.getThreadName());
            System.out.println("Date:Time" + loggingEvent.getTimeMillis()); //Comes in milleseconds and need to convert it to a date
        }
    }
}

Git [for subversion users]

Git is a source code management system for software development. It was initially developed by Linus Torvalds when he found the similar tools available were not meeting his requirements for Linux Kernel development. The main differentiators of Git compared to similar tools are speed,data integrity, and most importantly the support for distributed and non-linear workflows. Popular tools like CVS and Subversion are by design non-distributed. They have a central repository and everybody works with that. In case of Git there can be many repositories and it supports collaborating between repositories. The local copy itself is a fully fledged repository which has complete history and all version tracking commands and tools. In case of subversion local copy is just set of files having some meta data to figure out local changes done if any.
A recent video I watched (a google techTalk) Linus explains Git from design point of view. In that he talks about things like speed, data integrity, merge capabilities etc. but the most notable out of all was distributed nature of Git and why such behaviour is needed [at least for the development of Linux kernel]. It is a very interesting how large open source projects work, which he explain very clearly in this presentation.
Large open source projects are contributed by thousands of developers all over the world, they are all not all experts and not all have the same beliefs and styles of work. And most importantly they do not know every other well. So how does such projects proceed, how do they make sure code is not broken by a somebody. This all happens in a model which Linus nicely explains; "the network of trust" using his project Linux kernel development.

What is network of trust?
As Linus explains he trust a set of few people because he knows them personally, has seen their coding and quality of work. So when he gets an enhancement done by one of them he knows it works (the 'trust'). So with limited amount of review he can merge the changes to the main code base. Then each of these trusted guys also has a set of trusted people and they also works in the same way. And this continues to many levels and this is what is called network of trust.

If you are a node in this network of trust, you know the people below you and what capabilities they have. So when they do changes you know how much to test and how much to review before passing it upwards in this trust chain. And you also know what sort of expectation the guy above you has on this chain of trust. So before passing anything to the top you make sure everything works. This goes on ... and using this model thousands of developers work in on project based on network of trust. This is how Linux kernel get enhanced and many other open source projects actually progress with thousands of developers around the world.

Can Subversion support network of trust?
Well, it can, but not very easily. Subversion or CVS are all centralized repositories. The closest thing you can do is, you can have branches for various sub groups to work. Network of trust is not a two level concept; it can go up to many more levels. There is nothing called sub branches or sub-sub branches in Subversion or in similar tools. You can have separate repositories for each node in network of trust. But subversion do not have tools to merge between repositories to implement this network of trust model. So with all respect to subversion it is right to say subversion is not the right tool for projects which needs network of trust.

What makes Git suitable for Network of Trust?
In Git even your local copy is a fully-fledged repository. If you are a node in the network of trust for a project then you can open your repository to the people you trust. So they take a clone of your repo and create their own repos which they will give access to people whom they trust. The people you trust can upload their work to your repo using Git commands and once you do the due-diligence and make sure everything works you can upload it to your upper node repository to which you are a trusted person. So it is all about the features of Git to support this network of trust.

Does this network of trust model fits all projects?
Well, what Linus does definitely fits (otherwise he wouldn't have bothered to write Git!). Large other open source projects also fits. They may be not as religious as Linus when it comes to network of trust based work coordination but they also work more or less the same fashion, so Git fits them. What about a project with 4 people? well.... that is the main subject of this writeup.

Are you telling Git is of no use for small projects?
From my humble view the answer is 'it depends'.
Network of trust model support is the key feature of Git but there are more. Speed is also an important feature of Git, where it is definitely much faster than subversion when it comes to commands. But 1 second becoming 10 millisecond makes no much difference when it comes to a code commit. Data Integrity...well... how many of you have found subversion repo corrupted? even if it does we do have backups of the repo and all developers have copies in the local hard disks, so for me still not a solid reason to jump to Git.

But there is a one.. that is how well Git merge code. Subversion really sucks when it comes to code merge.

What is the problem in subversion when it comes to code merge?
In subversion when you have a branch and start making changes to that branch you have different code in trunk than branch. Branching is done when you want to work on a feature or a bug fix without disturbing people working in the trunk. But finally we need to merge things back to trunk so main code base is up to date. As far as you merge a branch to trunk just once everything is fine in subversion. Problems start to crop up when you merge the branch to trunk once and you want to merge again the same branch to trunk. Subversion doesn't keep track of changes already been merged to the trunk so it cannot merge what is not merged before. This was the case up until subversion 1.5. Version 1.6 can manage this as per documentation but I believe it is still incomplete. For example if you delete a file and recreate that in a branch and when you try to merge it to trunk, sub version still cannot do the merge properly. BTW these problems exist not only between trunk and branches but between two branches also.

When it comes to Git merging really work! Of course in network of trust model merging is the name of the game. If you are really in pain with subversion merge issues then that is good enough reason to say good buy to Subversion. Git is that good in merging.

Didn't we learn branching is evil?
I tend to follow Martin Fowler a lot and his Continuous Integration concepts in particular. Martin discourage branching and his moto is branching is evil hence we should avoid it by working on the trunk as much as we can. As he explains when you are in a branch you are away from integrating with rest of the code and more time you are away then higher the chances that you are breaking the code. When you do coding in trunk which has all changes from all developers you can know soon you break something. Continuous Integration is integrating code as frequently as possible, when you do all changes to trunk you have the maximum continuous integration.

But branching is something you cannot completely avoid; you have to do branching when you want to do something without affecting trunk for a limited duration. But the point is during this branch life period you loose continuous integration. So for you to branch you need to have a reason which has more merit than not to have continuous integration for that period. This decision to branch has to be taken by somebody who is mature enough to weigh the pluses and minuses for that particular case.
In this branches are evil concept obvious issue in Git is everybody is having a local copy (not just a branch it is a completely different repository for each person) and you are working on that. This is complete opposite of continuous integration. In fact network of trust is complete opposite of continues integration; in continues integration your moto is test and prove working than just trust.

Can't we really have both( Network of Trust and Continuous Integration)?
Well.. may be.. may be not.When you try to have both you really do not have either. But the point worth mentioning is Git can be used in centralised way;this mean you commit to main repository as frequently as it should be. But Git is really built for network of struct model so it is always tempting to follow that model when you use Git.

Network of Trust model versus Continuous Integration
Git versus Subversion discussion is really about which model you want to follow. If you think your project fits Network of trust then using Git is a no brainer. If you think you want centralised repository where team commits as frequently as they can where continuous integration checks for integration issues and notifies the team on the spot, then you better stay with Subversion.

My humble Conclusion: If you are on a huge project modeled as a network of trust project then using Git is a no brainer, go for it, there is nothing other than Git for your project. If you are in very small project where you work on trunk almost all the time then it is easy for you to mange things with simple subversion than Git. If you are on a project with needs merging quite often then Git comes handy even with if you do not use all jazzy features of Git. If you are with some average IQ level developers do not make things complex by getting git into your project, you better stay with subversion.

Log4jv2 Maximum Performance without loosing location info

Apache Log4j (version 1.x) had been around for a while now and is considered as the defacto standard in log management among the Java Community. For typical web applications the performance levels provided by this version could be good enough. But when it comes to applications where millisecond level or beyond performance levels are required it is always a concern how fast your log management system should be. The more logs we add to code it makes production issue debugging easy. But when the logging system is not fast enough it makes your code executions slow. Log4j supports log levels to address this (ALL,DEBUG,ERROR,INFO etc.). If you have properly set log levels to your log messages when you are debugging your application you can set a log level which prints lots of logs and when you run the system in production you can set a log level which prints less number of logs to make system works fast. When a production bug is raised and you have been given log files, how many times we have wished if our code had more logs? More logs means a performance degradation. So we always try to balance the number of log lines versus speed of code execution.

In Log4j version 1.x a bit of performance gain can be achieved using Async Appenders. Although it could be a good way to get some quick boost to the logging system without upgrading to version 2.x, the gain can be achieved is very limited.

Apache released the Log4j version 2.x recently which introduced Async Loggers (this should not be confused with Async Appenders). Async loggers made a huge impact on performance of the Log4J system due to the outstanding performance of the Disrupter, a lock free inter thread communications library from LMAX. Async loggers can boost the speed of logging up to 68 times compared to log4J version 1.x synchronous loggers.

But for this exceptional performance gain comes at the cost of restricting some formatting options. Simply put "Location information" should not be used in formatting. And %C or $class, %F or %file, %l or %location, %L or %line, %M or %method should not be used. Log4j uses stack trace to figure out the class name and line number of the log and this cannot be done once log entry is handed over to the logger thread. Hence Log4j need to wait till stack trace snapshot is taken before passing it to the internal thread before returning. This makes the logging slow. Hence for the real top speed of logging we cannot print things like class name and line number. Any experienced developer knows that it is useless to have a log file without class names and line numbers of the log entries.

Obvious alternative is adding class name and line number to the log line itself. In this way log4j system doesn't need to find them at run time making performance complications. Example,

public Order processNewOrder(NewOrderTRSRequest newOrderRequest) {
        logger.info("[NormalOrderController.java:261] processNewOrder() called. Time passed ={}ms", NanoWatch.getDuration());
        logger.info("[NormalOrderController.java:262] New Order Waiting to obtain the lock - {}", newOrderRequest);
        long startTime = System.currentTimeMillis();
        Order order;

As you can see, NormalOrderController.java is the class name and 261, 262 are the line numbers of two log lines here.

This might look like a very stupid idea. Obvious questions raised are; we have a huge code base with thousands of log lines, how can we add class name and line number to each of them? Not only that when we add or remove lines to or from the classes, the hard coded line numbers are no longer valid, and have to be adjusted. This is why we developed an Intellij IDEA plugin to do this. Using this plugin it is possible to add classnames and line numbers and correct them in one click. This plugin is freely available in Intellij IDEA plugin repository for anyone to download. If you are interested in the source code, feel free to download it from here. Although this works only on Intellij IDEA it should be a simple task to port this to other IDEs like Eclipse and NetBeans.

Finally with this we have the super performance of log4j async loggers 'with location information'.

Note: It was a debate whether to develop an IDEA plugin or Maven plugin. I believe same result can be achieved in either way. If you prefer not to have classname/line numbers in the code (hence in the source code repository) better option is to develop a maven plugin so that class names and line numbers get added to the releases only. For your local debugging you can enable the location info in log4j. But if you tend to do performance testing quite often while optimising code then adding classname and line number to the code using this IDEA plugin make sense.