Object serialization is an important aspect of any non-trivial program. When running around through the jungle one easily encounters these types of files.

As serialized files will get deserialized at some but by the application an interesting vulnerability comes into mind Insecure Object Deserialization. An Insecure Object Deserialization occurs when an application attempts to deserialize untrusted data. Without experience it can get pretty tricky to detect serialized files and thus one might overlook this type of vulnerability.

In the following I will demonstrate characteristics of serialized objects in Java, PHP and Python.

Java

Detecting serialized objects by Java is fairly easy. The serialized output will always start with a constant hex value: aced00057372. The table below should serve us as an explanation of these bytes:

Constant Hex value
STREAM_MAGIC aced
STREAM_VERSION 00005
TC_OBJECT 73
TC_CLASSDESC 72

So, whenever you see these hex values in a serialized file you know its serialization is made using Java!

PHP

Serialized objects by PHP are kind of unique, too. But you can’t just check for a fixed hex value to detect it. The cool part is, you can understand the internal logic of a serialized object by PHP without deserializing it. Nevertheless, I would recommend deserializing it when attempting to dig deeper just to avoid some headaches.

Lets just jump into some examples and examine these closely:

Plaintext Serialized
NULL N;
myVar = true myVar:1;
myVar = false myVar:0;
i = 1337 i:1337;
j = 1.337 j:1.337;
myString = “test” s:4:“test”;
myArray = array(1, 2, 3) myArray:3:{i:1;i:2;i:3;};
myArray = array(“ref” => 0) myarray:1:{s:3:“ref”;i:0};

We can see that each instruction gets transformed into some kind of shortform. We can still see the names of variables and their values. A NULL byte gets transformed into “N;” and boolean values in their respective values of 0 or 1. Integers and floats will be transformed into <variable name>:<variable value>;. Strings on the other hand will also include their respective length, thus: <variable name>:<length of string>:<variable value>;. Arrays kind of synergize the format of strings and integers and also introduce curly brackets: <variable name>::<length of array>:{<serialized variables>;}. Arrays with keys get a little bit more complicated as the content of the curly brackets also include the references, but all in all the concept overlaps

If you take this table as a few vocabularies or just remember this type format, you will be easily recognize PHP serialized objects!

Python

Using Python one can choose to serialize objects with JSON. But Python’s native serialization format is made possible with Pickle. Also note that serialization with JSON has the disadvantage of having restrictions like not being able to represent pointer sharing.

Lets just jump into an example:

import pickle

serializeMe = dict(myArray = [1, 3, 3, 7],
				 myString = "1337",
				 myNumber = 1337,
				 myBoolean = True,
				 myBoolean2 = False,
				 myNone = None)

return pickle.dumps(serializeMe, protocol=0)
b'(dp0\nVmyArray\np1\n(lp2\nI1\naI3\naI3\naI7\nasVmyString\np3\nV1337\np4\nsVmyNumber\np5\nI1337\nsVmyBoolean\np6\nI01\nsVmyBoolean2\np7\nI00\nsVmyNone\np8\nNs.'

As we can see, we defined a dictionary containing some data types (Array, String, Integer, Boolean, None). This way we can examine its behavior in these different scenarios.

The first thing we notice is that Python returns the serialized dictionary with raw bytes (also indicated by the prefix b), which also contains ASCII characters. Looking close enough, one could extract variable names and their values without deserialization. However, I explicitly defined the lowest protocol (protocol=0). Most applications will probably use the default, which is level 3 in Python v3.0 and level 4 in Python 3.8.

The following example will serialize the dictionary with the default of Python 3.8:

import pickle

serializeMe = dict(myArray = [1, 3, 3, 7],
				 myString = "1337",
				 myNumber = 1337,
				 myBoolean = True,
				 myBoolean2 = False,
				 myNone = None)

return pickle.dumps(serializeMe)
b'\x80\x04\x95`\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x07myArray\x94]\x94(K\x01K\x03K\x03K\x07e\x8c\x08myString\x94\x8c\x041337\x94\x8c\x08myNumber\x94M9\x05\x8c\tmyBoolean\x94\x88\x8c\nmyBoolean2\x94\x89\x8c\x06myNone\x94Nu.'

This time there are way more non-ASCII bytes. Thus, I just took all non-ASCII bytes out of the string above and tried to print these using echo -e. But nothing…, this won’t help. Not only that, but Pickle also has several other protocols to choose from for serialization, for example the “HIGHEST_PROTOCOL”.

import pickle

serializeMe = dict(myArray = [1, 3, 3, 7],
				 myString = "1337",
				 myNumber = 1337,
				 myBoolean = True,
				 myBoolean2 = False,
				 myNone = None)

return pickle.dumps(serializeMe, protocol=pickle.HIGHEST_PROTOCOL)
b'\x80\x05\x95`\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x07myArray\x94]\x94(K\x01K\x03K\x03K\x07e\x8c\x08myString\x94\x8c\x041337\x94\x8c\x08myNumber\x94M9\x05\x8c\tmyBoolean\x94\x88\x8c\nmyBoolean2\x94\x89\x8c\x06myNone\x94Nu.'

Comparing all these outputs:

b'(dp0\nVmyArray\np1\n(lp2\nI1\naI3\naI3\naI7\nasVmyString\np3\nV1337\np4\nsVmyNumber\np5\nI1337\nsVmyBoolean\np6\nI01\nsVmyBoolean2\np7\nI00\nsVmyNone\np8\nNs.'
b'\x80\x05\x95`\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x07myArray\x94]\x94(K\x01K\x03K\x03K\x07e\x8c\x08myString\x94\x8c\x041337\x94\x8c\x08myNumber\x94M9\x05\x8c\tmyBoolean\x94\x88\x8c\nmyBoolean2\x94\x89\x8c\x06myNone\x94Nu.'
b'\x80\x05\x95`\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x07myArray\x94]\x94(K\x01K\x03K\x03K\x07e\x8c\x08myString\x94\x8c\x041337\x94\x8c\x08myNumber\x94M9\x05\x8c\tmyBoolean\x94\x88\x8c\nmyBoolean2\x94\x89\x8c\x06myNone\x94Nu.'

The comparison reveals that Python v3.8 uses its “HIGHEST_PROTOCOL” as its default, thus the output 2. and 3. do not differ. Furthermore, it shows that the detection whether it is serialized or not could get tricky if we attempt to detect it based on its raw content.

The good part is, that we don’t need to dig that deep. Objects serialized with Pickle will start with a b as stated above and will end the string with a dot .. Thus, when seeing these in the wild we know it’s some Python serialized stuff :).