Introduction to Java Virtual Machine

If you are like me who is from an embedded background and never used Java before but suddenly there is a need to know Java as your project uses Java then this article is for you. One of the key reasons for the wide popularity of the Java programming language is its portability. Once a java program is written it can be once, executed from any java enabled machine. This has been made possible with the help of JRE(Java Runtime Environment). But many don't know the fact that JRE is the implementation of Java Virtual Machine (JVM), which analyzes the bytecode, interprets the code, and executes it.

JAVA(TM): Write once, run anywhere

It is essential, as a developer, that we know the architecture of the JVM, as it enables us to write code more efficiently. Let's quickly look at how programs written in Java language get executed on any Java-enabled machine.

The user writes his/her program and saves it in a .java file. The next step is using the java compiler to convert high-level java programs into bytecode. The output of the java compiler is a .class file. Finally, JVM is responsible to interpret the instruction from bytecodes present .class file to the underlying hardware. Many of us know this even though we are not actively using Java programs in our day-to-day projects.

Image 1: Java program compilation (simplified)

Let's get a bit more involved with JVM and try to understand how exactly underlying architecture allows the execution of JAVA programs. The below diagram shows a very high-level architectural diagram of JVM.

The journey from .java file to actual execution on hardware requires multiple stages. For example, once the java compiler generates bytecodes (.class file), JVM handles how to allocate memory and execute the instruction. We can divide JVM into three sub-systems and they are

Class loader sub-system
Run-time data area (various memory areas)
Execution engine

1. Class Loader Sub-System

The class loader sub-system is responsible for loading, linking, and initializing the java program i.e. .class file when it refers to a class for the first time at runtime. The class loader subsystem handles three tasks, loading, linking, and initialization.

1.1 Loading

Classes are loading using this component. There are three types of class loader, BootStrap, Extenstion, and Application class loader.

Bootstrap Class Loader is responsible for loading classes from the bootstrap classpath. These are nothing but rt.jar which contains all the Core Java APIs. The highest priority will be given to this.
Extension Class Loader is responsible for loading classes that are inside the ext folder (jre\lib).
Application Class Loader is responsible for loading the application-level class path i.e.path mentioned in environment variables etc.

These class loaders use Deligation Hierarchy Algorithm while loading class files.

1.2 Linking

At this stage output of the loading stage act as input to the linking stage. The linking stage has three sub-phases.

Verify - Generated byte code is verified by the Bytecode verifier to ensure generated byte codes are proper or not. If any discrepancy is identified then it throws a verification error. This way Java ensures once the Java program is compiled the resulting bytecode can be executed on any JVM-enabled device. Verify components to ensure proper validation before allowing bytecode to be executed.
Prepare - At this stage memory will be allocated for all the static variables used in the program and assigned with default values.
Resolve - All symbolic memory references are replaced with the original references from the method area.

1.3 Initialization

This is the final phase of Class loading and in this phase, all the static variables will be assigned with the original values.

2. Runtime Data Area

This subsystem contains various types of memory which are used to execute the program. Runtime data area is divided into five major components and they are as below.

Method Area - This memory area stores all the class-level data which include static variables. There is only one method area per JVM and hence it's a shared resource.
Heap Area - All the Java objects and their corresponding instance variables and arrays will be stored here. Similar to the method area, there is only one heap area per JVM. This makes data stored in these areas thread-unsafe.
Stack Area - For every thread, a separate runtime stack will be created. For each method call, one entry will be made in stack memory, also known as a stack frame. All the local variables used by the method will be created in the stack. The stack frame of each executing method in a thread contains frame data, local variable, and operand stack. The operand stack is the area used to store intermediate data of expressions. Frame data contains all the symbol corresponding to the method. It also contains catch block information in case of any exceptions.
PC Register - For every thread, Program Counter (PC) register will be created. PC registers are updated with the address of the next instruction which will be executed by a respective thread.
Native Method Stack - Native Method Stack holds native method information. For every thread, a separate native method stack will be created.

3. Execution Engine

The bytecodes which are assigned to the runtime data area will be executed by the execution engine. The execution engine is responsible for optimizing, and generating the target instruction compatible with underlying hardware along with garbage collection, security, and linking native library functions referenced in application code.

3.1 Interpreter - The interpreter interprets the bytecode. It interprets the bytecode faster but the execution speed is slow. If a code block(byte code) is repeatedly used then the execution engine will utilize the JIT compiler to generate the native code and use it to speed up the execution.

3.2 JIT compiler - The disadvantage of the Interpreter with respect to speed is removed by the JIT (Just In Time) compiler. For each repeated byte code, it compiles the entire byte code and changes it to native code. The resulting native code will be used for repeated method calls, thus improving the system's performance. JIT compiler has below sub-stages.

Intermediate Code Generator is responsible for producing intermediate code which is machine-independent.
Code Optimizer is responsible to optimize the above code for speed and space.
Target Code Generator is responsible for generating machine code or native code, which is compatible with the underlying hardware.
Profiler is a special component that is used to find hotspots in the code i.e. whether a method is called multiple times or not.

3.3 Garbage Collector

This component is used to remove unreferenced objects and utilize freed-up memory for further program usage. Unlike programming languages like C or C++ where the programmer is responsible for memory management for dynamically created objects, Java uses the Garbage Collector to manage underlying memory in JVM.

Java Native Interface(JNI)

Java Native Interface provides information and interfaces with native libraries with the execution engine which are used in java programs.

Native Method Libraries

This is a collection of native libraries.

EmbeddedHow

Introduction to Java Virtual Machine

1. Class Loader Sub-System

2. Runtime Data Area

3. Execution Engine

Recent Posts

コメント