Android Application Reverse Engineering

Dear readers,

This post is for those interested to reverse engineer Android apps to know the content in them or makes changes to them such as for Capture-The-Flag (CTF). The content will start from complete beginner to this and to intermediate level. Note that all this content is for educational purposes.

Introduction to APKs

All Android applications are in Android Application Package (APK) file formats (.apk file extension) right before you install it into your Android phones. Once installed, your application can be run using Dalvik Virtual Machine (DVM) on your phone. This is similar to what you have learned about Java Virtual Machine (JVM) if you are a Java developer. This is because the app that is created is in byte codes instead of compiled binaries just like Java desktop applications created. However, newer versions of Androids are now using Android Runtime (ART). The ART is an improved version of DVM where an ahead-of-time (AOT) compilation is introduced to improve the performance of the app. Just like JVM, DVM also acts as an interpreter to read the bytecodes which will slow down the process of app execution. You should be familiar with this if you have taken Compiler Techniques at University. Therefore, AOT will use the dex2oat tool to compile the DEX file in the APK file which makes the app executes faster.

APK contents

APK files are actually zip files. This means you can unzip them and access to the files stored in them directly. You can use tools such as 7Zip or unzip. Each APK files contain a number of important files shown in Fig 2a.

Fig 2a. Disassemble Android APK

Android Manifest

The Android Manifest file is an XML binary file that stores information of the app such as the permissions required (e.g android.permission.WRITE_EXTERNAL_STORAGE), the API level of the mobile devices that can install it, the launcher class, etc. As the Android Manifest file is in binary, tools such as xml2axml, are required to read/write to it. It is important to note that you must include additional permissions if your newly added Smali code into the APK requires permission(s) that are not already in the Android Manifest. An example of the permission section can be seen in Fig 2b.

Fig 2b. Example of permissions in Android Manifest XML file

The launcher class in the Android Manifest is useful for us to know which is the first activity that will be executed. This is often used by malware payload injection to inject in the onCreate() of the launcher class during the Android APK repackaging process to guarantee the execution of the payload. An example can be seen here. The launcher class can be identified through intent filters which are “android.intent.action.MAIN” and “android.intent.category.LAUNCHER” as shown in the blue box while the launcher class in the example is in the red box in Fig 2c.

Fig 2c. Launcher class and intent filters in Android Manifest XML file

Resources

The resource files consist of stuff such as asserts, your User Interfaces (UI) designs, etc. We won’t be touching on this for this post.

META-INF

The META-INF is actually a folder that consists of the previously signed certificate of the current APK. All APKs have to be signed before they can be installed on any Android devices/emulators. Once you make any changes to the APK, the APK has to be resigned. The signing process can be done using Jarsigner (found in Java SDK). For APK that is targeting newer APIs, APKsigner v2 may be required.

Classes Dex

The classes dex file (i.e., classes.dex) consists of byte codes of instructions of the logic of the app. This is where all the code that the developer has coded, is stored as byte codes. There may be more than 1 classes dex file but a majority of the APKs only have 1 classes dex file. As it is still in byte code, we will have to disassemble it into Smali, which is a human-readable assembly language. This can be done using Baksmali tool.

Alternate tool to make this simple

It may seem very confusing and messy as xml2axml tool is needed for binary Android Manifest, Baksmali tool for classes dex file, etc. This can be solved with a tool that has simplified the whole disassembling process. This tool is APKTool. APKTool will disassemble the classes dex file into Smali files, convert the binary Android Manifest XML file into human-readable XML file just like the one you saw if you had developed Android apps.

Class file name

For those new to this, it is common to see the trend between the class name and the file path to the Smali file are the same. However, not all class name matches the file path. Most of the time, they will be in the same expected directory. An example can be seen in Table 3.

Class nameFile path
Landroid/activity/A/android/activity/a.smali
Landroid/activity/a/android/activity/a.1.smali
Table 3. Example of class name mismatch file path

Smali file structure

When opening up a Smali file, we will be able to roughly see the file structure which begins with the class name, information of the class, the superclass, the interfaces, the annotations, the static and instance fields of the class, the direct methods, and finally the virtual methods. This can be seen in Fig 4a. You will be able to easily identify them as when opening up a Smali file, the beginning of the section is commented/labeled.

.class <accessor> <Class name>
.super <Super class>

# annotations
...

# static fields
...

# instance fields
...

# direct methods
...

# virtual methods
...

Class name

The class name usually consists of the full package path to it where each separator is by a slash (“/”) instead of the usual dot (“.”) in Java. The class name will always start with the capital letter ‘L’ and end with a semi-colon (“;”). An example of the library class String is “Ljava/lang/String;“.

In the example below, the class name starts with “L”, ends with “;”, and is a full path, for the current class, the superclass, and the interface implemented.

.class public Lcom/appyet/data/FeedItem;
.super Ljava/lang/Object;

# interfaces
.implements Ljava/io/Serializable;

Static fields & instance fields

These two fields should be quite understandable as static fields are class variables that were declared as static which are opposite of instance fields. Both begin with the following format:

.field <accessor (private/protected/public)> *static *final <variable name>:<type>*; *= *<value>

The bold asterisk (“*“) indicates that the item on the right of it is optional. Below shows examples of static and instance variables in a Smali file.

# static fields
.field public static final COLUMN_ARTICLE:Ljava/lang/String; = "Article"

# instance fields
.field private mArticle:Ljava/lang/String;
    .annotation runtime Lcom/j256/ormlite/field/DatabaseField;
        columnName = "Article"
        useGetSet = false
    .end annotation
.end field

Direct methods

The direct method consists of the constructor of the class, the private methods, and the static methods. Each method is sorted according to the name in ascending order. Therefore, when you insert a new method, make sure you have to adhere to this rule.

There are two kinds of constructors in which both of them will be stored in the direct method section. They are <init> and <cinit>. <init> is a constructor for instance of a class while <cinit> is a static constructor of a class.

Below is an example of a constructor:

.method public constructor <init>()V
    .registers 2

    invoke-direct {p0}, Ljava/lang/Object;-><init>()V

    sget-object v0, Lcom/appyet/data/FeedItem$DisplayModeEnum;->None:Lcom/appyet/data/FeedItem$DisplayModeEnum;

    iput-object v0, p0, Lcom/appyet/data/FeedItem;->mDisplayMode:Lcom/appyet/data/FeedItem$DisplayModeEnum;

    sget-object v0, Lcom/appyet/data/FeedItem$ArticleStatusEnum;->None:Lcom/appyet/data/FeedItem$ArticleStatusEnum;

    iput-object v0, p0, Lcom/appyet/data/FeedItem;->mArticleStatus:Lcom/appyet/data/FeedItem$ArticleStatusEnum;

    sget-object v0, Lcom/appyet/data/FeedItem$EnclosureStatusEnum;->None:Lcom/appyet/data/FeedItem$EnclosureStatusEnum;

    iput-object v0, p0, Lcom/appyet/data/FeedItem;->mEnclosureStatus:Lcom/appyet/data/FeedItem$EnclosureStatusEnum;

    return-void
.end method

Below is an example of a static constructor where there must be a static keyword as well:

.method static constructor <clinit>()V
    .locals 1

    new-instance v0, Landroid/support/v4/app/BackStackState$1;

    invoke-direct {v0}, Landroid/support/v4/app/BackStackState$1;-><init>()V

    sput-object v0, Landroid/support/v4/app/BackStackState;->CREATOR:Landroid/os/Parcelable$Creator;

    return-void
.end method

To call methods that are static, we will have to use invoke-static to call the static method instead. Otherwise, direct methods will be called using invoke-direct.

Examples of calling static method void a() and direct method void b() where both are from class Lcom/test;:

invoke-static {}, Lcom/test;->a()V
invoke-direct {}, Lcom/test;->b()V

Virtual methods

The virtual methods section consists of methods such as public and protected methods of the class. Then methods are also sorted based on the method names in ascending order. Therefore, when you insert a new method, make sure you have to adhere to this rule. In this virtual methods section, you can also find popular event methods such as onCreate() for Activity classes.

To call virtual methods, we will have to use invoke-virtual. Below is an example of calling protected void method c() from class Lcom/test;:

invoke-vritual {}, Lcom/test;->c()V

Registers

Registers are like variables in programming to store values. If you have learned assembly languages such as x86, x64, or ARM, it is the same register I am referring to. There are a total of 16 registers in each method starting from v0 to v15. Besides this, there are also parameter registers that start from p0. However, if there are parameters in a method, the last few registers, v<number>, will be used as parameter registers, p<number>. Table 9 shows an example where there are 3 parameters in a method and two local registers of the method. In this example, parameter register p0 and local register v2 refer to the same register. If there are no parameter registers, the method is free to use all 16 registers, v0 to v15, without worrying about modifying the value in the parameter registers.

Local registersParameter registers Description
v0First local register.
v1Second local register.
v2p0First parameter register.
v3p1Second parameter register.
v4p2Third parameter register.
Table 9. Example of existence of local and parameter registers in a method

Let’s say if we have a static method a(Ljava/lang/String;Lcom/test1;), p0 contains value for Ljava/lang/String;, while p1 contains value for Lcom/test1;.

However, it is different for instance methods. For an instance method, p0 contains “this“/”self” in Java/C++ or Python respectively. For example, we have a method b(Ljava/lang/String;), p0 will contain “this“/”self” while p1 contains the value for Ljava/lang/String;).

If you would like to see more details of registers in Smali, you can find them here.

Primitive types, method calls, and returns

As we know that class names begin with a letter ‘L‘, end with a semi-colon, and have a full package path to them, the primitive types are different. Examples of primitive types are integer, char, boolean, etc. Below shows the Smali representation of each primitive type. Note there is no need to include a semi-colon at the end of the representation.

Primitive typesSmali representations
VoidV
ByteB
ShortS
CharC
IntegerI
LongJ (Uses two registers)
FloatF
DoubleD
BooleanZ
Table 10. Smali representations of primitive types

After we have gone through the primitive types, we can now touch on the examples of method calls using primitive types, class names, and the return value. For method calls, they have the following format:

invoke-<direct/virtual/static,etc> {<registers as arguments>}, <classname>;-><methodname>(<method parameter types>)<return type>

Let’s say we have a method public int ab(int a, int b, boolean c, test d), class test‘s full path is Lcom/test, we can invoke it this way and obtaining the return value in register v2 (Must include the register p0 since is calling instance method. Same as passing “self” to method in Python):

invoke-virtual {p0, v0, v1, v4, v3}, Lcom/test;->ab(IIZLcom/test;)I

return-object v2

Note that when indicating the method’s argument, there is no need for spacing or comma. Therefore, the integer I, boolean Z, and class Lcom/test, can be seen sticking side by side.

Compiling and signing

When compiling the program, we can use the smali tool or APKTool to compile for us. If you would like to use smali tool to manually compile the files yourself, do remember to ZIP back the file into the .apk extension as APK files are ZIP files. Note that when you zip the file, you must zip it recursively. This means you have to include the “-r” flag if you are using the zip command-line tool. Or else, you are using 7Zip, do include the “a” and “-tzip” flags. To save yourself from trouble, use APKTool instead.

Fig 11. Assemble Smali files

Finally, all modified APKs must be resigned again as the META-INF files consist of signatures of the original APK before modification. If your modified APK is not signed, it will not be able to be installed in an Android emulator or physical device. To sign the APK, we have to first create a Keystore. This can be done using the keytool. Keytool can be found in your Java JDK folder in the bin folder. You can create a key using the following command:

> keytool -genkey -noprompt -alias alias_name -keyalg RSA -keysize 2048 -validity 10000 -keystore my-release-key.keystore -dname \"CN=test, OU=test, O=test, L=test,S=test, C=SG\" -storepass testing

After generating a keystore, we can then sign it using Jarsigner. Jarsigner can also be found in your Java JDK folder in the bin folder. This can done using the following command:

> jarsigner -verbose -sigalg SHA1withRSA -digestalg SHA1 -keystore 
my-release-key.keystore -storepass testing result.apk alias_name

The Jarsigner is only for version 1. If we have to sign using version 2 if the APK requires it (this applies to those apps that have higher API levels), this can be done easily by following the steps in this video using the tool in the video’s description.

If you are interested in other Smali instructions that are not written in this blog post such as goto instruction, etc, you can refer to this link here.

I hope this post has been helpful to you. Feel free to leave any comments below. You may also send me some tips if you like my work and want to see more of such content. Funds will mostly be used for my milk tea addiction. The link is here. 🙂

Advertisement

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.