Web Development

  Homes arrow Web Development arrow Java: Performance Tuning and Memory Managemen...
 Webmaster Tools
 
Base64 Encoding 
Browser Settings 
CSS Coder 
CSS Navigation Menu 
Datetime Converter 
DHTML Tooltip 
Dig Utility 
DNS Utility 
Dropdown Menu 
Fetch Content 
Fetch Header 
Floating Layer 
htaccess Generator 
HTML to PHP 
HTML Encoder 
HTML Entities 
IP Convert 
Meta Tags 
Password Encryption
 
Password Strength
 
Pattern Extractor 
Ping Utility 
Pop-Up Window 
Regex Extractor 
Regex Match 
Scrollbar Color 
Source Viewer 
Syntax Highlighting 
URL Encoding 
Web Safe Colors 
Forums Sitemap 
Weekly Newsletter
 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us 
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
WEB DEVELOPMENT

Java: Performance Tuning and Memory Management Part 1 - Introduction
By: Developer Shed
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 5 stars5 stars5 stars5 stars5 stars / 1
    2004-04-17

    Table of Contents:

    Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     

    SEARCH DEV MECHANIC

    TOOLS YOU CAN USE

    advertisement

    Java: Performance Tuning and Memory Management Part 1 - Introduction
    by Wrox Books

    A typical application spends the majority of its time waiting for user input or for the completion of some other type of I/O operation, which is why multitasking is practical. However, some applications are CPU-bound, and advances in both hardware and software technology have allowed Java code to execute more quickly and to provide acceptable performance for those applications. Hardware improvements have come primarily in the form of faster processors, but the declining cost of memory also allows many applications to run more quickly. In addition, driven by competition among vendors, Java Virtual Machine implementations have become faster and more sophisticated, allowing Java code to execute much more quickly.

    Despite the changes that have taken place, you may find your Java code running more slowly than you (or your users) would like, and it's sometimes necessary to take steps to improve its speed. Another area that's sometimes a concern to Java programmers is that of memory utilization. Java applications can sometimes use a large amount of memory, and it's sometimes difficult to determine why that occurs.

    So this chapter:

    • Instructs on how to diagnose problem areas in your code.
    • Describes some of the many options that are available to improve the speed of execution.
    • Provides information that describes how you can identify the portions of your code that are responsible for consuming memory, and discusses ways to reduce the amount used.
    • Describes how to use the garbage collector effectively.

    However, before we can look at how to make our code run better we need to evaluate its current performance bottlenecks.

    Locating the Source of a Performance Problem

    Although it may seem obvious, the first thing you should do when you detect a problem with your application's performance is to determine where most of its time is spent. It's often said that the "80/20 rule" (sometimes called the "90/10 rule") applies to how much of a processor's time is spent executing an application's code. In other words, the implication is that a large portion (say, 80%) of the processor's time will be spent executing a small portion (the other 20%) of your application code. Assuming that this is true, and it usually is, you can significantly improve your application's performance by identifying the areas of your code that consume most of the processor's time (sometimes called hot spots) and changing that code so that it runs more quickly.

    One way to identify hot spots and other performance problems in your application is to use a commercial profiler application, such as OptimizeIt from Intuitive Systems or JProbe from Sitraka (formerly KL Group). These products are easy to use and will usually provide the functionality you need to locate the source of a performance problem. However, while reasonably priced, neither OptimizeIt nor JProbe is free, and they may not be available for the platform on which you're debugging.

    Another easy way to identify hot spots is to execute your application with HPROF enabled if the Java Virtual Machine you're using supports it. This feature is currently integrated into JavaSoft's Windows and Solaris Java 2 JVM implementations, and it uses the Java Virtual Machine Profiler Interface (JVMPI) to collect and record information about the execution of Java code. Among other things, JVMPI allows you to profile object creation/memory utilization, monitor contention, and execution times.

    HPROF Output

    When HPROF is enabled, its default behavior is to write a report to a disk file named java.hprof.txt once your application terminates. As we'll see, you can control the format and contents of the report to some extent, but the output will contain some or all of the following items of information:

    • Explanation/comments
    • Thread information
    • Trace entries
    • Monitor dump
    • Heap dump
    • Allocation sites
    • CPU samples/times

    Before we look at the sections of the report, let's have a quick look at some of the options we have when running HPROF.

    HPROF Options

    To view the options that are available with HPROF, you can execute the following command:

    java -Xrunhprof:help
    

    If the virtual machine implementation you're using supports HPROF, it will produce output similar to that shown below. Note that the -X indicates that this is a non-standard option that may or may not be supported in future JVM releases:

    Hprof usage: -Xrunhprof[:help]|[<option>=<value>, ...]
    Option Name and Value  Description             Default
    ---------------------  -----------             -------
    heap=dump|sites|all    heap profiling          all
    cpu=samples|times|old  CPU usage               off
    monitor=y|n            monitor contention      n
    format=a|b             ascii or binary output  a
    file=<file>            write data to file      java.hprof(.txt for ascii)
    net=<host>:<port>      send data over a socket write to file
    depth=<size>           stack trace depth       4
    cutoff=<value>         output cutoff point     0.0001
    lineno=y|n             line number in traces?  y
    thread=y|n             thread in traces?       n
    doe=y|n                dump on exit?           y
    Example: java -Xrunhprof:cpu=samples,file=log.txt,depth=3 FooClass
    

    We'll look at the uses of the more general options here, and cover the others as we discuss individually the parts of HPROF output.

    The doe Option

    It's acceptable in some cases to generate profile information when your application exits, which is HPROF's default behavior, but in other cases you'll want the information to be written during execution. You can prevent the profile from being written on exit by specifying a value of "n" for the doe ("dump on exit") parameter. To generate profile information during execution, rather than after the program has finished, you can press Ctrl-Break on Windows or Ctrl-\ (backslash) on Solaris.

    The format Option

    HPROF's default behavior is to create a report in human-readable (ASCII) form, although you can store the data in binary format using this option. If you do specify format=b, you must use a utility such as the Heap Analysis Tool (HAT) provided by JavaSoft to view the output. That utility is currently unsupported, but is useful because it provides an easy way to analyze memory utilization.

    When you execute the HAT utility, you must specify the name of a file that was created by HPROF and you may specify a port number or allow it to use its default (7000). When the utility runs, it creates a web server that listens on the specified port for HTTP requests, allowing you to view a summary of the HPROF data through a browser. You can do so by entering a URL of http://localhost:7000, (assuming that you're using the default port number) which will produce a display that lists the classes currently in use by the JVM. By following the various hyperlinks, you can examine different representations of the HPROF data, including one that lists the number of instances currently in existence for every class that's in use. In other words, the HAT utility provides a convenient way to easily browse the HPROF data and many different summaries of that data.

    The file Option

    HPROF normally sends its output to a disk file named java.hprof.txt if it's creating ASCII output or java.hprof if it's creating binary data (you specify format=b). However, you can use this option to specify that the output should be sent to a different file, as shown below:

    java -Xrunhprof:file=myfile.txt MyTest
    

    The net Option

    You can use this option to have HPROF's output sent over a network connection instead of being written to a disk file. For example, it might be useful to use the net option if you wish to record profile information for an application that's running on a middleware server. An example of how to use this option is shown below, where a host name and port number are specified:

    java -Xrunhprof:net=brettspc:1234 MyTest
    

    Before the MyTest class is executed, a socket connection is established to port 1234 on the machine with a host name of brettspc, and the profile information generated by HPROF will be transmitted over that socket connection.

    Explanation/Comments Section

    This section is identical for every file that's created by HPROF, and it provides a brief description of how to interpret the remainder of the output generated by the utility.

    Thread Summary Section

    This portion of the report generated by HPROF shows a summary of the threads that were used. To illustrate how that information appears, we'll define a simple application that creates and starts two threads:

    public class ThreadTest implements Runnable {
    public static void main(String[] args) {
    ThreadTest tt = new ThreadTest();
    Thread t1 = new Thread(tt, "First");
    t1.setDaemon(true);
    Thread t2 = new Thread(tt, "Second");
    t2.setDaemon(true);
    t1.start();
    t2.start();
    try {
    Thread.sleep(2000);
    } catch (Exception e) {}
    ;
    synchronized (tt) {
    tt.notify();
    } 
    try {
    Thread.sleep(2000);
    } catch (Exception e) {}
    ;
    } 
    public synchronized void run() {
    try {
    wait();
    } catch (InterruptedException ie) {}
    ;
    } 
    }
    

    This application creates two threads, each of which executes the run() method of a single ThreadTest instance, and starts those threads. It then immediately sleeps for two seconds to allow both threads to become blocked by the wait() method in run(), calls the notify() method to wake up one of the threads, and then sleeps for another two seconds. Therefore, one of the two threads that were created should exit prior to the main() method's completion (because it is removed from the wait queue when notify() is called), while the other should still be blocked by the wait() call. If we use HPROF to profile the thread activity within this application, we should see that behavior reflected in the output:

    THREAD START (obj=7c5e60, id = 1, name="Signal dispatcher", 
    group="system")
    THREAD START (obj=7c6770, id = 2, name="Reference Handler", 
    group="system")
    THREAD START (obj=7ca700, id = 3, name="Finalizer", group="system")
    THREAD START (obj=8427b0, id = 4, name="SymcJIT-LazyCompilation-PA", 
    group="main")
    THREAD START (obj=7c0e70, id = 5, name="main", group="main")
    THREAD START (obj=87d910, id = 6, name="First", group="main")
    THREAD START (obj=87d3f0, id = 7, name="Second", group="main")
    THREAD START (obj=842710, id = 8, name="SymcJIT-LazyCompilation-0", 
    group="main")
    THREAD END (id = 6)
    THREAD END (id = 5)
    THREAD START (obj=87e820, id = 9, name="Thread-0", group="main")
    THREAD END (id = 9)
    

    This portion of the HPROF output lists each thread that was started, and if the thread ended while profiling was taking place, that fact is recorded as well. Each entry identifies the address of the Thread object, the identifier ("id") value assigned to the thread, the name of the thread, and the name of the associated ThreadGroup.

    As expected, this output indicates that two threads ("First" and "Second") were started after the system-generated thread that executes the main() method ("main"). In addition, one of those two threads ("First" with an identifier value of 6) exits prior to the termination of the main thread as expected, while the other was still running when the profile information was created.

    Note that some of the threads listed above are dependent on the particular JVM implementation used, in this case the Symantec JIT JVM.

     

    Trace Entries Section

    While profile information is being recorded, HPROF generates a number of trace entries, which are simply stack traces like those you've seen when an uncaught exception is thrown by your application. Each trace created by HPROF is assigned a unique identifier value, and as we'll see later, trace identifiers can be useful in determining which execution paths represent hot spots within your application. An example of a trace entry created by HPROF is shown overleaf:

    TRACE 1856:
    java/lang/String.toCharArray(String.java:1899)
    java/lang/Thread.init(Thread.java:268)
    java/lang/Thread.<init>(Thread.java:398)
    ThreadTest.main(ThreadTest.java:7)
    

    This particular trace entry has been assigned an identifier value of 1856, and it represents the execution of line 1899 within the String class, and that line is part of the toCharArray() method. That method was in turn called from line 268 of the Thread class, and that line is part of the init() method, which was in turn called from line 398 of a Thread constructor ("<init>"), etc.

    Each line that identifies a class, method, source file, and line is called a frame, and there are four frames in the above example. By default, HPROF generates up to four frames for each trace entry, although as we'll see you can increase or decrease that value if it's helpful to do so.

    The depth Option

    You can use this option to specify the maximum number of frames that will be included in each trace entry. The default value for this parameter is four, and you will not normally need to increase that value unless four frames do not supply enough information to allow you to uniquely identify an execution path.

    The lineno Option

    Each frame normally includes the line number that was executed, but you can use this option to suppress the line numbers from the output listing. When you do so, each frame will contain the text "Unknown line" instead of the line number:

    TRACE 1056:
    java/util/Locale.toUpperCase(Locale.java:Unknown line)
    java/util/Locale.<init>(Locale.java:Unknown line)
    java/util/Locale.<clinit>(Locale.java:Unknown line)
    sun/io/CharacterEncoding.aliasName(CharacterEncoding.java:Unknown line)
    

    The thread Option

    HPROF always records the thread summary information in its report, but by default does not associate the trace entries with a specific thread. This is not a problem if your application is single-threaded in nature, but if your code supports multiple threads, it can be helpful for the trace entries to identify the thread with which they're associated. When thread=y is specified, each trace entry contains additional text that includes the identifier value of the thread associated with the entry as shown below:

    TRACE 1910: (thread=8)
    java/lang/Object.wait(Object.java:Native method)
    java/lang/Object.wait(Object.java:424)
    ThreadTest.run(ThreadTest.java:20)
    java/lang/Thread.run(Thread.java:479)
    TRACE 1664: (thread=1)
    java/lang/StringBuffer.toString(StringBuffer.java:1063)
    java/io/BufferedReader.readLine(BufferedReader.java:Compiled method)
    java/io/BufferedReader.readLine(BufferedReader.java:329)
    java/util/Properties.load(Properties.java:179)
    

    Monitor Dump Section

    The monitor dump section of HPROF output displays the status of each thread and a report of the monitors that are owned by threads and for which threads have invoked the wait() method. To provide an example of this output, we'll run a slightly modified version of the ThreadTest class defined earlier. This application will start two threads that execute an object's run() method, and once inside that method, the threads will invoke wait(). After a brief delay, the main thread that started those two new threads will wake up one of the threads with a call to notify(), but in this case, that thread will call sleep() instead of exiting:

    public class ThreadTest implements Runnable {
    public static void main(String[] args) {
    ThreadTest tt = new ThreadTest();
    Thread t1 = new Thread(tt, "First");
    t1.setDaemon(true);
    Thread t2 = new Thread(tt, "Second");
    t2.setDaemon(true);
    t1.start();
    t2.start();
    try {
    Thread.sleep(2000);
    } catch (Exception e) {}
    ;
    synchronized (tt) {
    tt.notify();
    } 
    try {
    Thread.sleep(2000);
    } catch (Exception e) {}
    ;
    } 
    public synchronized void run() {
    try {
    wait();
    } catch (InterruptedException ie) {}
    ;
    sleepAWhile(5000);
    } 
    protected synchronized void sleepAWhile(long time) {
    try {
    Thread.sleep(time);
    } catch (Exception e) {}
    ;
    } 
    }
    

    If this application is run with HPROF enabled, it will produce the following thread summary:

    THREAD START (obj=835e60, id = 2, name="Signal dispatcher", 
    group="system")
    THREAD START (obj=8367c0, id = 3, name="Reference Handler", 
    group="system")
    THREAD START (obj=83a700, id = 4, name="Finalizer", group="system")
    THREAD START (obj=8b17c0, id = 5, name="SymcJIT-LazyCompilation-PA", 
    group="main")
    THREAD START (obj=8b2cb0, id = 6, name="HPROF CPU profiler", 
    group="system")
    THREAD START (obj=830e70, id = 1, name="main", group="main")
    THREAD START (obj=8ece00, id = 7, name="First", group="main")
    THREAD START (obj=8ec940, id = 8, name="Second", group="main")
    THREAD START (obj=8b1e00, id = 9, name="SymcJIT-LazyCompilation-0", 
    group="main")
    THREAD END (id = 1)
    THREAD START (obj=8edb00, id = 10, name="Thread-0", group="main")
    THREAD END (id = 10)
    

    Unfortunately, at the time of writing the JDK 1.3 "Hotspot" virtual machine fails with an internal error when using the monitor dump HPROF option. This problem does not occur when using the -classic option to turn off the Hotspot JVM. This bug occurs in Sun's bug database (Bug ID 4378856).

    In addition, the HPROF output will include the monitor dump information listed below:

    MONITOR DUMP BEGIN
    THREAD 11, trace 1896, status: R
    THREAD 8, trace 1897, status: CW
    THREAD 7, trace 1898, status: CW
    THREAD 6, trace 1899, status: MW
    THREAD 9, trace 1888, status: CW
    THREAD 5, trace 1900, status: CW
    THREAD 4, trace 1901, status: CW
    THREAD 3, trace 1902, status: CW
    THREAD 2, trace 1903, status: R
    MONITOR ThreadTest(8ecb50)
    owner: thread 7, entry count: 2
    waiting to be notified: thread 8
    MONITOR SymantecJITCompilationThread(8b1e00) unowned
    waiting to be notified: thread 5
    MONITOR java/lang/ref/Reference$Lock(836b00) unowned
    waiting to be notified: thread 3
    MONITOR java/lang/ref/ReferenceQueue$Lock(8392a0) unowned
    waiting to be notified: thread 4
    RAW MONITOR "_Hprof CPU sampling lock"(0x8b2f70)
    owner: thread 6, entry count: 1
    RAW MONITOR "SymcJIT Lazy Queue Lock"(0x8b1d34) unowned
    waiting to be notified: thread 9
    RAW MONITOR "_hprof_dump_lock"(0x7d0ef0)
    owner: thread 11, entry count: 1
    waiting to enter: thread 6
    RAW MONITOR "Monitor cache lock"(0x7d0df0)
    owner: thread 11, entry count: 1
    RAW MONITOR "Thread queue lock"(0x7d0cc0)
    owner: thread 11, entry count: 1
    RAW MONITOR "Monitor registry"(0x7d0c70)
    owner: thread 11, entry count: 1
    MONITOR DUMP END
    

    The first portion of the monitor dump identifies each active thread, its state, and the trace entry that identifies the code the thread executed. The thread's state will be one of the following four values, although you should not normally see suspended threads in Java 2 applications:

     

    The second section of the monitor dump identifies the monitors that are currently owned and those that one or more threads are currently waiting on. In this case, the first thread created by our ThreadTest application owns the monitor of the ThreadTest object. The second thread had previously obtained the monitor, but implicitly released it by invoking the ThreadTest object's wait() method and is now waiting to be notified. The states of these two threads are reflected in the first entry in the monitor dump, which indicates that thread 7 owns the monitor of the ThreadTest object at location 8ecb50, as shown below:

    MONITOR ThreadTest(8ecb50)
    owner: thread 7, entry count: 2
    waiting to be notified: thread 8
    

    The entry count refers to the number of nested synchronized blocks the owner thread has entered, and is used by the Java Virtual Machine to determine when a thread should release the monitor. For example, when thread 7 first entered the run() method and successfully obtained the monitor, the entry count value was set to one. However, once that thread executed the sleepAWhile() method, which is synchronized on the same object, the entry count value was incremented to two. When the thread exits sleepAWhile(), the entry count will be reduced to one, and when it exits the run() method, the count will be decremented to zero and the monitor released.

    The monitor Option

    While HPROF is capable of including a monitor dump section in its report, it does not do so by default. To include the monitor information, you must include monitor=y in the list of parameters.

    Heap Dump Section

    The heap dump is perhaps the most complex part of the output generated by HPROF, but in practice, you will not normally need to analyze it. However, we'll look briefly at some of the entries you'll find.

    The heap dump is simply a list of the areas of memory that are allocated at the time that the profile information is recorded, and describes areas allocated for class definitions, objects, and arrays. These types are identified in the HPROF output with the abbreviations CLS, OBJ, and ARR, respectively, and each entry includes information that describes how it is used. For example, entries like those shown below are added to the output the first time an instance of java.util.Vector is created:

    CLS 7b6ac0 (name=java/util/Vector, trace=10)
    super    7b8530
    constant[6]  7a58a0
    constant[10]  7b9290
    constant[13]  7a4de0
    constant[16]  7c2240
    constant[19]  7b8530
    constant[23]  7b99d0
    constant[25]  7b6ac0
    CLS 7b8530 (name=java/util/AbstractList, trace=10)
    super    7b8f00
    constant[6]  7b8f00
    constant[7]  7b8530
    constant[12]  7b99d0
    CLS 7b8f00 (name=java/util/AbstractCollection, trace=10)
    super    7a4de0
    constant[6]  7a4de0
    constant[11]  7b8f00
    constant[12]  7b93c0
    CLS 7b93c0 (name=java/util/Collection, trace=10)
    super    7a4de0
    constant[2]  7b93c0
    CLS 7b99d0 (name=java/util/List, trace=10)
    super    7a4de0
    constant[2]  7b93c0
    constant[3]  7b99d0
    CLS 7b9290 (name=java/lang/Cloneable, trace=10)
    super    7a4de0
    constant[1]  7b9290
    OBJ 7ba560 (sz=20, trace=11, class=java/util/Vector@7b6ac0)
    elementData  7ba6a0
    ARR 7ba6a0 (sz=44, trace=12, nelems=10, elem type=java/lang/Object@7a4de0)
    [0]    83d8f0
    [1]    84b640
    

    The first entry represents the Vector class definition and describes the address at which it is located. In addition, it identifies the trace entry that corresponds to this memory allocation, the address of the class definition of Vector's superclass, and points to the constants defined in Vector. Constants include string and numeric constant values defined within the Vector class, and the superclass address is simply the pointer to the class definition of Vector's superclass, which is AbstractList. The second entry in the above listing is a CLS entry for AbstractList, because that class must also be loaded before an instance of Vector can be created. In fact, the entire superclass hierarchy from Vector to Object, and all the implemented interfaces, must be loaded although no entry for the Object class definition appears in the above section because it was already loaded.

    Once the hierarchy of class definitions has been loaded, the instance of Vector is created as illustrated by the OBJ entry shown above. If you examine the source code for Vector, you'll find that it has three instance fields defined like those shown below:

    protected Object[] elementData;
    protected int elementCount;
    protected int capacityIncrement;
    

    The two primitive values (elementCount and capacityIncrement) are stored in the area allocated for the Vector instance, but a separate area is allocated for the array of Object instances, and that fact is reflected in the ARR entry.

     

    The report created by HPROF allows you to determine what has been allocated, and identifies the code that caused the memory to be allocated. For example, we can use these entries to determine which section of code caused this Vector to be created and all the appropriate class definitions loaded. To do so, let's review the entry that identifies the creation of the Vector object:

    OBJ 7ba560 (sz=20, trace=11, class=java/util/Vector@7b6ac0)
    elementData  7ba6a0
    

    The trace portion of this entry identifies the trace entry that is associated with the creation of this object, and by searching the file created by HPROF for the trace entry with an identifier of 11, we find the following:

    TRACE 11:
    java/lang/ClassLoader.<clinit>(ClassLoader.java:1156)
    

    This indicates that line 1156 in ClassLoader caused a Vector to be created and its associated class definitions loaded, and the <clinit> "method" indicates that the Vector was created during the initialization of the static fields in ClassLoader. By examining line 1156 of ClassLoader, we find the following code:

    private static Vector loadedLibraryNames = new Vector();
    

    As you can see from this example, the heap dump allows you to easily associate each memory allocation with a specific location within the Java code. However, as we'll see later, there's an easier way to identify which portions of your code allocate large amounts of memory.

    The heap Option

    You can use this option to control the type of heap information provided by HPROF, and valid values are dump, sites, and all. The dump value causes HPROF to generate only the heap dump information in its output, while sites causes it to generate only the allocation site listing, and all will result in both sections being created. The default value is all.

    Allocation Sites Section

    As mentioned earlier, the heap dump lists each area of storage that's allocated by your code, but it does not provide an easy way to identify which areas of your code are allocating most of the memory that's used. Fortunately, the HPROF output includes a list of allocation sites that appear in descending order based on the amount of storage each site is allocated.

    To provide an example of this, we'll use the following MemoryTest application that allocates two byte arrays: one very large, and one relatively small. In this simple example, it's obvious which part of the code is responsible for allocating most of the memory. However, we'll see how HPROF's allocation site information can be used to identify the relevant section of code, and the approach illustrated here can just as easily be used with larger, more complex applications:

    public class MemoryTest {
    private static byte[] buffer1;
    private static byte[] buffer2;
    public static void main(String[] args) {
    allocateALittle();
    allocateALot();
    } 
    private static void allocateALittle() {
    buffer1 = allocateBuffer(10000);
    } 
    private static void allocateALot() {
    buffer2 = allocateBuffer(1000000);
    } 
    private static byte[] allocateBuffer(int length) {
    byte[] buffer = new byte[length];
    return buffer;
    } 
    }
    

    When this application is executed, output similar to the following will appear in the allocation site section of the report generated by HPROF:

    SITES BEGIN (ordered by live bytes) Sat Apr 15 09:43:35 2000
    percent         live       alloc'ed  stack class
    rank   self  accum    bytes objs   bytes objs trace name
    1 84.84% 84.84%  1000004    1 1000004    1  1882 [B
    2  1.39% 86.24%    16388    1   16388    1  1228 [C
    3  1.39% 87.63%    16388    1   16388    1  1235 [C
    4  0.99% 88.61%    11628    3   11628    3   984 [C
    5  0.85% 89.46%    10004    1   10004    1  1881 [B
    6  0.70% 90.16%     8196    1    8196    1  1226 [B
    7  0.70% 90.85%     8196    1    8196    1  1233 [B
    8  0.68% 91.54%     8068    1    8068    1   987 [S
    9  0.62% 92.15%     7252    3    7252    3   981 [C
    10  0.31% 92.46%     3612    1    3612    1   763 [L<Unknown>;
    11  0.22% 92.68%     2600   70    3188   85    57 [C
    12  0.21% 92.89%     2532    1    2532    1   988 [I
    13  0.20% 93.10%     2380   17    2380   17     1 java/lang/Class
    14  0.20% 93.30%     2356    1    2356    1  1220 [C
    15  0.17% 93.47%     2052    1    2052    1    71 [B
    16  0.10% 93.57%     1236    3    1236    3    56 [L<Unknown>;
    17  0.09% 93.66%     1028    1    1028    1   986 [B
    18  0.08% 93.75%      996   83    1020   85    57 java/lang/String
    19  0.08% 93.83%      980    7     980    7  1407 java/lang/Class
    20  0.07% 93.90%      840   42     840   42    58 java/util/
    Hashtable$Entry
    

    Each entry in this list represents storage allocations associated with a stack trace, and the entries are ordered based on most to least memory allocated. In this case, trace 1882 allocated roughly one megabyte of storage, which accounts for approximately 85% of the total storage in use. Note also that the fifth entry in the list indicates that trace 1881 allocated a much smaller 10K, which accounts for less than one percent of the storage in use.

    The information provided in this report is largely self-explanatory, with the first column providing a ranking of each trace based on how much memory it allocated. The second and third columns identify the amount of storage (as a percentage of the total) allocated by that trace, and a cumulative value for the entries so far listed, respectively. The two pairs of columns containing bytes and objs represent the number of bytes and objects currently in use ("live") and the numbers originally allocated ("alloc'ed").

    Finally, the last two columns identify the stack trace associated with the allocation(s) and the type of object created. In this case, most of the entries in the last column start with an open bracket, which indicates that an array of primitives was allocated, and in the case of MemoryTest, our allocation of an array of byte values is denoted [B. In other cases, however, the name of a class appears, which indicates that the allocation was due to object instantiation instead of the creation of an array. The following table identifies the "class name" that appears for each type that can be allocated:

     

    Now that we understand the information in the allocation site report, it can be used to identify the code that's responsible for allocating most of the storage in use by the application. The site report indicates that the code path associated with trace entry 1882 allocated approximately 85% of the storage in use, and by simply searching the output created by HPROF to locate trace 1882, we find the following listing:

    TRACE 1882: (thread=1)
    MemoryTest.allocateBuffer(MemoryTest.java:20)
    MemoryTest.allocateALot(MemoryTest.java:16)
    MemoryTest.main(MemoryTest.java:8)
    

    This indicates that most of the storage used by this application was allocated in the allocateBuffer() method when it was called from allocateALot(). As expected, the HPROF output identified the path that creates the large array of byte values. Similarly, trace entry 1881 identifies the execution path through the allocateALittle() method that was responsible for allocating a much smaller portion of storage:

    TRACE 1881: (thread=1)
    MemoryTest.allocateBuffer(MemoryTest.java:20)
    MemoryTest.allocateALittle(MemoryTest.java:12)
    MemoryTest.main(MemoryTest.java:7)
    

    CPU Samples and Times Section

    In addition to reporting on memory allocation, HPROF is able to provide information that identifies which portions of your code the processor spends the most time executing, and it offers two different ways of doing so. You need to specify that HPROF generate one of these sections using the cpu option.

    The first technique involves taking periodic samples during execution of the code and counting the number of times that the processor was found to be executing a certain code path. With a reasonably large number of samples, that approach will provide an accurate representation of time spent executing a given section of code.

    The second approach that HPROF is able to use is to simply record each method call and the time it takes to complete. The times are then added together for each execution path and reported as a percentage of the total execution time of the code.

    In this example, we'll use the first (sampling) approach with another simple application. As with the MemoryTest application, it's easy to identify the portion of the code that consumes the most resources, but again, the technique used here is applicable to larger and more complex applications as well:

    public class CPUTest {
    public static void main(String[] args) {
    executeQuickly();
    executeSlowly();
    } 
    private static void executeQuickly() {
    loopAWhile(100000);
    } 
    private static void executeSlowly() {
    loopAWhile(1000000);
    } 
    private static void loopAWhile(int count) {
    for (int i = 0; i < count; i++) {
    // Do something to slow down this loop
    Object o = new Object();
    } 
    } 
    }
    

    If this code is executed with HPROF enabled and cpu=sample specified, it will generate a section with CPU sample information like the one shown below (though the data will vary between runs):

    CPU SAMPLES BEGIN (total = 655) Sat Apr 15 10:57:51 2000
    rank   self  accum   count trace method
    1 78.32% 78.32%     513    77 CPUTest.loopAWhile
    2  7.94% 86.26%      52    75 CPUTest.loopAWhile
    3  0.76% 87.02%       5    64 sun/net/www/protocol/file/
    Handler.openConnection
    4  0.46% 87.48%       3     9 sun/misc/
    URLClassPath$JarLoader.getJarFile
    5  0.46% 87.94%       3    50 java/io/BufferedReader.readLine
    6  0.46% 88.40%       3    62 java/lang/Class.forName0
    7  0.31% 88.70%       2    25 java/util/jar/JarFile.getManifest
    8  0.31% 89.01%       2     7 sun/misc/URLClassPath$2.run
    9  0.31% 89.31%       2    15 java/util/zip/ZipFile.getEntry
    10  0.31% 89.62%       2    48 sun/io/ByteToCharISO8859_1.convert
    11  0.31% 89.92%       2    41 java/security/Policy$1.run
    12  0.31% 90.23%       2    74 CPUTest.executeQuickly
    13  0.31% 90.53%       2    11 java/io/Win32FileSystem.canonicalize
    14  0.31% 90.84%       2    44 java/util/Properties.load
    15  0.15% 90.99%       1    28 java/util/jar/Attributes.read
    16  0.15% 91.15%       1    38 java/io/FileInputStream.open
    

    Upon viewing this information, it's obvious that there is a hot spot in this application that's responsible for approximately 80% of its execution time, specifically the loopAWhile() method in CPUTest. Note, however, that there are actually two entries associated with loopAWhile(), one of which represents an execution path that was active when 513 of the 655 samples were taken, while the other was active only 52 times. By locating the trace information associated with these, we find that the first entry corresponds to loopAWhile() being called from executeSlowly(), while the second corresponds to a call from executeQuickly():

    TRACE 77:
    CPUTest.loopAWhile(CPUTest.java:Compiled method)
    CPUTest.executeSlowly(CPUTest.java:13)
    CPUTest.main(CPUTest.java:5)
    TRACE 75:
    CPUTest.loopAWhile(CPUTest.java:Compiled method)
    CPUTest.executeQuickly(CPUTest.java:9)
    CPUTest.main(CPUTest.java:4)
    

    The difference in these two execution paths illustrates an important point related to identifying hot spots in your code. It's sometimes not sufficient to simply identify the methods that the processor spends the most time executing, but also where those methods were called from (the execution path) and/or the parameter values passed.

    The cpu Option

    This option allows you to control how HPROF gathers information about where the processor spends it time in your code, and valid values are times, samples, and old:

    • times
      HPROF will record each method call and the amount of time spent executing the method each time it is called, and a "CPU Time" section will appear in the generated report.
    • samples
      HPROF periodically determines what code is being executed and the execution path that resulted in the execution of that code, and counts the number of times it finds the processor in that execution path. Once the report is generated, it will contain a "CPU Samples" section that lists execution paths in descending order based on how frequently they were found to be executing. By using this option, you can easily identify hot spots within your code as was done previously with the CPUTest application.
    • old
      HPROF generates profile information that's similar to that supported in earlier versions of Java. However, that information is not as detailed or useful as the new format, so you should not normally specify cpu=old.

    The cutoff Option

    As illustrated in this listing below, many of the samples represent an extremely small percentage of the code's total execution time:

    CPU SAMPLES BEGIN (total = 655) Sat Apr 15 10:57:51 2000
    rank   self  accum   count trace method
    1 78.32% 78.32%     513    77 CPUTest.loopAWhile
    2  7.94% 86.26%      52    75 CPUTest.loopAWhile
    3  0.76% 87.02%       5    64 sun/net/www/protocol/file/
    Handler.openConnection
    4  0.46% 87.48%       3     9 sun/misc/
    URLClassPath$JarLoader.getJarFile
    5  0.46% 87.94%       3    50 java/io/BufferedReader.readLine
    6  0.46% 88.40%       3    62 java/lang/Class.forName0
    7  0.31% 88.70%       2    25 java/util/jar/JarFile.getManifest
    8  0.31% 89.01%       2     7 sun/misc/URLClassPath$2.run
    9  0.31% 89.31%       2    15 java/util/zip/ZipFile.getEntry
    10  0.31% 89.62%       2    48 sun/io/ByteToCharISO8859_1.convert
    11  0.31% 89.92%       2    41 java/security/Policy$1.run
    12  0.31% 90.23%       2    74 CPUTest.executeQuickly
    13  0.31% 90.53%       2    11 java/io/Win32FileSystem.canonicalize
    14  0.31% 90.84%       2    44 java/util/Properties.load
    15  0.15% 90.99%       1    28 java/util/jar/Attributes.read
    16  0.15% 91.15%       1    38 java/io/FileInputStream.open
    

    By default, HPROF will include all execution paths that represent at least 0.01 percent of the processor's time, but you can increase that cutoff point using this option. For example, to eliminate all entries that represent less than one-fourth of one percent of the total execution time, you could specify cutoff=0.0025, which would have resulted in only the first 14 of the above entries being created.


    DISCLAIMER: The content provided in this article is not warranted or guaranteed by Developer Shed, Inc. The content provided is intended for entertainment and/or educational purposes in order to introduce to the reader key ideas, concepts, and/or product reviews. As such it is incumbent upon the reader to employ real-world tactics for security and implementation of best practices. We are not liable for any negative consequences that may result from implementing any information covered in our articles or tutorials. If this is a hardware review, it is not recommended to open and/or modify your hardware.

    More Web Development Articles
    More By Developer Shed

       

    WEB DEVELOPMENT ARTICLES

    - On Page SEO for New Domains
    - Improve Your Site`s Speed
    - Safari Books Online Review
    - Creating an Estore From the Ground Up
    - Most Common SEO Mistakes Developers Make
    - Making the Most of Your Titles and Meta Desc...
    - Five Ways Using Flash Can Damage Your Site
    - A Web Designer`s Guide to Colors
    - Use Webstarts to Create a Free Site
    - More Than Just Looks. How Your Web Design C...
    - How to Design Content Pages
    - Mint Review
    - Make Your WordPress Website Look Professional
    - How to Create a Mobile Web Site
    - Meta Tags: Still Useful?

    Developer Shed Affiliates

     



    © 2003-2018 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap