Object serialization is an important aspect of any non-trivial program. When running around through the jungle one easily encounters these types of files.
As serialized files will get deserialized at some but by the application an interesting vulnerability comes into mind Insecure Object Deserialization. An Insecure Object Deserialization occurs when an application attempts to deserialize untrusted data. Without experience it can get pretty tricky to detect serialized files and thus one might overlook this type of vulnerability.
In the following I will demonstrate characteristics of serialized objects in Java, PHP and Python.
Java
Detecting serialized objects by Java is fairly easy. The serialized output
will always start with a constant hex value: aced00057372
.
The table below should serve us as an explanation of these bytes:
Constant | Hex value |
---|---|
STREAM_MAGIC | aced |
STREAM_VERSION | 00005 |
TC_OBJECT | 73 |
TC_CLASSDESC | 72 |
So, whenever you see these hex values in a serialized file you know its serialization is made using Java!
PHP
Serialized objects by PHP are kind of unique, too. But you can’t just check for a fixed hex value to detect it. The cool part is, you can understand the internal logic of a serialized object by PHP without deserializing it. Nevertheless, I would recommend deserializing it when attempting to dig deeper just to avoid some headaches.
Lets just jump into some examples and examine these closely:
Plaintext | Serialized |
---|---|
NULL | N; |
myVar = true | myVar:1; |
myVar = false | myVar:0; |
i = 1337 | i:1337; |
j = 1.337 | j:1.337; |
myString = “test” | s:4:“test”; |
myArray = array(1, 2, 3) | myArray:3:{i:1;i:2;i:3;}; |
myArray = array(“ref” => 0) | myarray:1:{s:3:“ref”;i:0}; |
We can see that each instruction gets transformed into some kind of
shortform. We can still see the names of variables and their values.
A NULL byte gets transformed into “N;” and boolean values in their
respective values of 0 or 1.
Integers and floats will be transformed into <variable name>:<variable value>;
.
Strings on the other hand will also include their respective length, thus:
<variable name>:<length of string>:<variable value>;
.
Arrays kind of synergize the format of strings and integers and also
introduce curly brackets:
<variable name>::<length of array>:{<serialized variables>;}
.
Arrays with keys get a little bit more complicated as the content of the
curly brackets also include the references, but all in all the concept
overlaps
If you take this table as a few vocabularies or just remember this type format, you will be easily recognize PHP serialized objects!
Python
Using Python one can choose to serialize objects with JSON. But Python’s native serialization format is made possible with Pickle. Also note that serialization with JSON has the disadvantage of having restrictions like not being able to represent pointer sharing.
Lets just jump into an example:
import pickle
serializeMe = dict(myArray = [1, 3, 3, 7],
myString = "1337",
myNumber = 1337,
myBoolean = True,
myBoolean2 = False,
myNone = None)
return pickle.dumps(serializeMe, protocol=0)
b'(dp0\nVmyArray\np1\n(lp2\nI1\naI3\naI3\naI7\nasVmyString\np3\nV1337\np4\nsVmyNumber\np5\nI1337\nsVmyBoolean\np6\nI01\nsVmyBoolean2\np7\nI00\nsVmyNone\np8\nNs.'
As we can see, we defined a dictionary containing some data types (Array, String, Integer, Boolean, None). This way we can examine its behavior in these different scenarios.
The first thing we notice is that Python returns the serialized dictionary
with raw bytes (also indicated by the prefix b
), which also contains ASCII
characters. Looking close enough, one could extract variable names and their
values without deserialization. However, I explicitly defined the lowest
protocol (protocol=0
). Most applications will probably use the default, which is
level 3 in Python v3.0 and level 4 in Python 3.8.
The following example will serialize the dictionary with the default of Python 3.8:
import pickle
serializeMe = dict(myArray = [1, 3, 3, 7],
myString = "1337",
myNumber = 1337,
myBoolean = True,
myBoolean2 = False,
myNone = None)
return pickle.dumps(serializeMe)
b'\x80\x04\x95`\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x07myArray\x94]\x94(K\x01K\x03K\x03K\x07e\x8c\x08myString\x94\x8c\x041337\x94\x8c\x08myNumber\x94M9\x05\x8c\tmyBoolean\x94\x88\x8c\nmyBoolean2\x94\x89\x8c\x06myNone\x94Nu.'
This time there are way more non-ASCII bytes. Thus, I just took all
non-ASCII bytes out of the string above and tried to print these using
echo -e
.
But nothing…, this won’t help.
Not only that, but Pickle also has several other protocols to choose from for
serialization, for example the “HIGHEST_PROTOCOL”.
import pickle
serializeMe = dict(myArray = [1, 3, 3, 7],
myString = "1337",
myNumber = 1337,
myBoolean = True,
myBoolean2 = False,
myNone = None)
return pickle.dumps(serializeMe, protocol=pickle.HIGHEST_PROTOCOL)
b'\x80\x05\x95`\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x07myArray\x94]\x94(K\x01K\x03K\x03K\x07e\x8c\x08myString\x94\x8c\x041337\x94\x8c\x08myNumber\x94M9\x05\x8c\tmyBoolean\x94\x88\x8c\nmyBoolean2\x94\x89\x8c\x06myNone\x94Nu.'
Comparing all these outputs:
b'(dp0\nVmyArray\np1\n(lp2\nI1\naI3\naI3\naI7\nasVmyString\np3\nV1337\np4\nsVmyNumber\np5\nI1337\nsVmyBoolean\np6\nI01\nsVmyBoolean2\np7\nI00\nsVmyNone\np8\nNs.'
b'\x80\x05\x95`\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x07myArray\x94]\x94(K\x01K\x03K\x03K\x07e\x8c\x08myString\x94\x8c\x041337\x94\x8c\x08myNumber\x94M9\x05\x8c\tmyBoolean\x94\x88\x8c\nmyBoolean2\x94\x89\x8c\x06myNone\x94Nu.'
b'\x80\x05\x95`\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x07myArray\x94]\x94(K\x01K\x03K\x03K\x07e\x8c\x08myString\x94\x8c\x041337\x94\x8c\x08myNumber\x94M9\x05\x8c\tmyBoolean\x94\x88\x8c\nmyBoolean2\x94\x89\x8c\x06myNone\x94Nu.'
The comparison reveals that Python v3.8 uses its “HIGHEST_PROTOCOL” as its default, thus the output 2. and 3. do not differ. Furthermore, it shows that the detection whether it is serialized or not could get tricky if we attempt to detect it based on its raw content.
The good part is, that we don’t need to dig that deep.
Objects serialized with Pickle will start with a b
as stated above and
will end the string with a dot .
. Thus, when seeing these in the wild we know
it’s some Python serialized stuff :).