Edited on November 23, 2016
Being able to analyze data in real time while an application is running is great. If you're talking early cycle (development and test environments), then maybe just in time sampling and instrumentation will allow you to diagnose your performance problems because you control the moment where the problem is going to occur.
But it's not enough.
On shared test environments (integration phases) and production, there's no telling when the hiccoughs are going to start. Therefore, it's important to set up round-the-clock monitoring on these environments so as to be able to go back in time and perform delayed or even port-mortem analysis.
In addition, the level of comfort you get from having a collector harvesting thread stacks continuously is terrific, since you only have to configure the connection to your JVMs once. Then all you have to do is query your Persistence Store to analyze runtime behaviour from past times.
By default, your thread dumps will be kept for 24 hours in the store, but you can increase that value or even let the collection grow endlessly (until you run out of disk space). See the "Disk space budget" section down below for more information.
djigger's collector requires a command line option (system property) to be set in order to load it's configuration and boot. However, we've packed everything you need in the startCollector script (in the bin folder), so you can focus on configuring your JVM connections and quickly get a collector instance running.
The command line option below points to the collector's configuration.
In that XML file you'll find the host name and port number of the target mongoDB store and the TTL (Time To Live) value in seconds that will define how long the thread dumps will be kept. It also contains a list of paths to files and folders in which you'll describe the connections to your monitored JVMs (see paragraph "Connection list" below for more details :
<Collector> <servicePort>8080</servicePort> <dataTTL>86400</dataTTL> <db host="localhost" port="27017" /> <connectionFiles> <string>../conf/Connections.xml</string> <string>../conf/Connections.csv</string> </connectionFiles> </Collector>
There are two types of structure and syntax for specifying JMX connections. One is an XML tree structure that will allow you to use various connectors (agent connection with subscriptions, tail-kill connectors, etc), and the other one is a simple flat CSV structure, designed to make life easy for those who simply want to list up a bunch of JMX connections.
Templates of those files can be found in the 'conf' folder. They're ready to use, you can use as many files as you want, but need to make sure that they're correctly referenced in the collector's config (Collector.xml).
The XML File contains a hierarchy of generic connection groups. At each level, you can set as many attributes as you want, which will allow you to organize your connections as you wish (for example, you can start with attributes that represent the environement the JVM is part of, then at a lower level, the application name, then at a last level, node-specific attributes). Here's what that structure looks like :
An easier way to define connections can be used : the flat CSV structure. In the CSV file, each line defines a different connection and hence JVM. The first 5 columns are reserved for the connection properties, but starting at the 6th column you can set as many key-value pairs as you want :
Regardless of the connection file structure you're going for, don't neglict these attributes as they're going to be very important at query time. They're going to be the primary filter in your searches so that you don't load the entire store in your client's memory. The second being the time window.
Simply click on the Store radio button on your connection screen and click open. You'll notice an additional filter above your Thread timeline pane in the top part of the screen, with default time-based selectors.
Much like the stacktrace and node filters described in section 2 of this wiki, you can query your connection attributes (the ones you set in your XML or CSV connections config file) to find your stacktraces.
But here you can also use our time-based selectors to investigate a specific time window.
The same operators are available as with the stacktrace and node filters (and, not, or, (), etc).
Depending on the sampling rate you chose for each individual connection in your XML or CSV file, you'll be harvesting more or less thread dumps. Since your client memory is limited, you'll have to be careful and avoid selecting very large time windows, and also avoid using attribute-less queries or queries with combinations of attributes that aren't selective.
In most cases, you only need a few samples (a few minutes worth of data) harvested on a single JVM.
However, if you insist on retrieving large batches of thread dumps, you can increase your client memory size via the java -Xmx option. All you have to do is add this option to the JAVA_OPTS variable of the startClient file in your bin folder :
Here are a few bits of information on how we implemented certain aspects of the collector. They could be useful to you if you're going to use the collector extensively.
The TTL value set in the collector's config file is translated into an expiration constraint at index level in mongoDB. You can read https://docs.mongodb.org/manual/core/index-ttl/ to understand how that works. This way, no tedious housekeeping jobs have to be executed, mongoDB takes care of it for you.
There are two key collections in the store.
That collection will grow initially, but since we're only persisting distinct stacktraces (by calculating a hash value), that collection size will eventually approach a bounded value. You don't need to worry about the size of that collection anymore after a few days but expect big bumps when adding a completely new application to your connection list.
That collection grows continuously if you set the TTL value to 0. Otherwise, mongoDB will take care of removing the oldest entries for you every few minutes, thus effectively capping that collection.
The resulting effect of setting a TTL value is that you'll indirectly cap the maximum size that the thread dump collection will be able to reach. Setting this value to 0 will result in an uncapped collection size (fyi, this behaviour has been modified by us, that's not the way mongoDB interprets a 0 value).
Per default, the collector will only set an index on the hash value of the stacktrace collection (the attribute is called "hashcode"), and on the "timestamp" attribute of the threaddump collection.
If you notice that the stacktrace count and retrieval times are unusually high, it means you're either querying an "unreasonable" amount of stacktraces or that your stacktraces are not flagged and organized properly. You'll want to add more specific attributes in the definition of your JMX connections. In certain cases, you may want to add a custom secondary index on some of these attributes. Maybe even a compound index.