.NET Deobfuscation
Intro
In the last post, we ended up with a couple .NET Framework malware samples that had obfuscation applied to them.
.NET is a runtime environment that executes intermediate code (CIL) in a managed way, using a just-in-time compiler to translate it into native machine code at runtime. In order to make that possible, a .NET binary (more commonly called assembly) has clearly laid out information about contained types, fields, methods, and more. In short, there’s much more structure and less ambiguity when compared to native binaries, and there is very mature tooling support for working with those structures programmatically.
This post will provide an overview over a couple of obfuscation techniques that are commonly encountered in .NET assemblies and how to deal with them. Often, you’ll find tools that automatically deobfuscate everything for you, but the question is, what if those tools aren’t working correctly in your particular case? It’s good to know how exactly they work their magic and, if required, to be able to fix or extend them. At times, this post will take on a tutorial-like tone, because there are not many resources at all that explain how to extend de4dot.
Common obfuscation techniques
Our survey of techniques won’t be exhaustive, we will just look at some of the most commonly used ones.
We’ll be working with de4dot’s code, because it’s an excellent framework for .NET code deobfuscation. It provides many little helpers and strategies that can be easily customized/extended to undo most common types of obfuscation.
Technique: Packing
The classic. The actual malware is encrypted and/or compressed, unpacked into memory at runtime, and then loaded into the process.
In most cases, this involves a more or less obvious call to Assembly.Load
, followed by some reflection code to invoke a method and begin execution.
Take, for example, the following decompiled code:
internal static void SetupConnection()
{
ResolverParamWrapper.SortConnection().GetTypes().Where(new Func<Type, bool>(ResolverParamWrapper.<>c.factory.QueryConnection))
.First<Type>()
.InvokeMember(MapperFactoryAdapter.LoginConnection(-857235653 ^ -1181772793 ^ <Module>{4b736177-9522-4a80-a461-378c599d9c8d}.m_a5d71990efab4dc4a1801d81bc710517.m_327fb7ca8cde492882cd8691cd3b5b85), BindingFlags.InvokeMethod, null, null, null);
}
private static Assembly SortConnection()
{
return Assembly.Load(ParamsConsumerComp.ResetConnection());
}
SetupConnection()
is directly called by the entry point of the packed malware.
Essentially, we have the following sequence of events:
SortConnection()
is called.ResetConnection()
is called to retrieve a byte array. If we were to look into that method, we’d see a huge byte array of around 300 KB and some crypto logic.- The byte array is loaded as a new .NET assembly and returned.
- Back in
SetupConnection()
, a type is looked up based on some criteria. InvokeMember
is used on the type to get execution going.
If you’re wondering about the absurd method names: the obfuscator probably auto-generated them in some fashion to make things look somewhat legit when taking a very superficial glance.
Apart from some light obfuscation (you might have noticed the XOR operations and long hexadecimal field names, which are part of a string encryption mechanism), this could be called a text book example of .NET packing. Sometimes it won’t be quite so easy to spot - for example, the assembly loading could be buried deep into the call tree or control flow flattening might have been applied.
The most straight-forward way of dealing with this is to load the sample in dnSpyEx, set a breakpoint in Assembly.Load
, and save the unpacked byte array to disk.
Alternatively, a tool like DotDumper could prove useful, especially in cases where the assembly loading part isn’t blatantly obvious in the packer. DotDumper is a neat tool in general, since it can reveal quite a lot about a sample via hooking and detailed logging.
Technique: Delegates / Proxies
Using delegates is a very popular technique to obfuscate method calls that are being made. The way this works is to wrap the call into an object (the delegate) and invoking the delegate instead of the original method. The severity of this obfuscation can range from ‘annoying’, where you can see what’s being called by just checking where the object is constructed, to ‘impossible by hand’, where it’s not possible to discern the call target without tooling support.
A simplistic example might look like this:
delegate void ObfuscatedDelegate();
static void CallMe() => Console.WriteLine("CallMe was called!");
static void Main()
{
ObfuscatedDelegate obfuscatedCall = new ObfuscatedDelegate(CallMe);
obfuscatedCall();
}
The construction of the delegate object doesn’t necessarily need to happen directly before the call; it could also be initialized as a static variable, for example, and then be used whenever CallMe
should be called.
You can see how this adds an indirection, because you need to look elsewhere for what obfuscatedCall()
actually means.
For a single method it’s manageable, but often enough, all calls in the assembly will get this treatment.
In the following we have a more complicated example, where each call target gets its own separate class:
internal sealed class SpecificationValue : MulticastDelegate
{
public extern bool Invoke(object);
public static bool WriteWorker(object A_0, SpecificationValue A_1)
{
return A_1(A_0);
}
public extern SpecificationValue(object, IntPtr);
static SpecificationValue()
{
AnnotationParserInstance.RateIndexer(typeof(SpecificationValue).TypeHandle);
}
internal static SpecificationValue CountIssuer;
}
And the accompanying call:
var result = SpecificationValue.WriteWorker(model, SpecificationValue.CountIssuer)
We can see the method is supposed to take an object and returns a boolean, but we can’t see what function is actually called.
WriteWorker
takes the original call arguments plus an instance of the SepcificationValue
class, which is taken from the static CountIssuer
field, and then calls the delegate.
However, the field does not have an apparent value assigned to it.
We can see that the static constructor is doing something, and quite conspicuously, it passes a TypeHandle
parameter, which is an indication that reflection is used to dynamically manipulate the class at runtime.
It stands to reason that this RateIndexer
call is responsible for assigning a value to CountIssuer
.
public static void RateIndexer(RuntimeTypeHandle value)
{
Type typeFromHandle = Type.GetTypeFromHandle(value);
if (AnnotationParserInstance.m_InterpreterModel == null)
{
lock (AnnotationParserInstance._SpecificationModel)
{
Dictionary<int, int> mapping = new Dictionary<int, int>();
BinaryReader binaryReader = new BinaryReader(typeof(AnnotationParserInstance).Assembly.GetManifestResourceStream("Value.Expression"));
binaryReader.BaseStream.Position = 0L;
byte[] array = binaryReader.ReadBytes((int)binaryReader.BaseStream.Length);
binaryReader.Close();
// <snip>, decrypt array and fill mapping
AnnotationParserInstance.m_InterpreterModel = mapping;
}
}
FieldInfo[] fields = typeFromHandle.GetFields(BindingFlags.Static | BindingFlags.NonPublic | BindingFlags.GetField);
for (int m = 0; m < fields.Length; m++)
{
FieldInfo fieldInfo = fields[m];
int metadataToken = fieldInfo.MetadataToken;
int targetToken = AnnotationParserInstance.m_InterpreterModel[metadataToken];
bool isVirtualCall = (targetToken & 0x40000000) > 0;
targetToken &= 0x3fffffff;
MethodInfo methodInfo = (MethodInfo)typeof(AnnotationParserInstance).Module.ResolveMethod(targetToken, typeFromHandle.GetGenericArguments(), new Type[0]);
if (methodInfo.IsStatic)
{
fieldInfo.SetValue(null, Delegate.CreateDelegate(fieldInfo.FieldType, methodInfo));
}
else
{
// <snip>, dynimcally generate code for calling non-static method
}
}
}
The code above has been shortened and prettied up to only contain the most important pieces. Breaking it down, it performs the following steps:
- It checks if the token mapping has already been loaded. If not:
- An assembly resource is looked up and loaded into a byte array.
- The byte array is decrypted with a custom algorithm.
- Pairs of 32-bit integers are read from the byte array and inserted into a dictionary.
- This dictionary will be the token mapping in
m_InterpreterModel
.
- All static fields in the type of interest (our delegate class) are looked up. In the previous example, this is essentially just the
CountIssuer
field. - For each such field, its metadata token is used as an index into the token mapping dictionary to obtain a new token.
- The new token is resolved into a method, which in turn is used to create a delegate object.
- The delegate is assigned to the static field, solving the mystery of where its value comes from.
So, to sum it up, we get a mapping from static field tokens in the delegate types to the actual call target.
>> Short primer on .NET metadata tokens
Internally, types, methods, fields, and so on are not referenced by their name. Rather, .NET uses so-called metadata tokens, which are 32-bit integers that uniquely identify an entity in the current assembly. The integers consist of a metadata table index (upper 8 bits) and row index into the respective table (lower 24 bits, also called RID).
dnSpy shows the tokens as comments by default, and it also shows the decimal representation of the RID (0x927 equals 2343):
// Token: 0x04000927 RID: 2343 internal static SpecificationValue CountIssuer;
The table indexes are defined in the ECMA-335 spec - some important ones are:
- 02: TypeDef
- 04: Field
- 06: Method
- 0A: MemberRef (fields/methods located in other assemblies)
Table contents can be viewed directly in dnSpy by expanding the Tables Stream under PE -> Storage Stream #n in the Assembly Explorer.
Some examples from our obfuscated assembly:
04000479 -> 0A0001D3 System.Void System.Environment::Exit(System.Int32)
0400047A -> 060009EC System.Void ProtoBuf.Listeners.MessageModelListener::EnableDatabase()
0400047B -> 060009F2 System.Void Dbpkc.Maps.RoleAuthenticationMapping::kLjw4iIsCLsZtxc4lksN0j()
0400047C -> 0A0001D4 System.Void System.Threading.Timer::Dispose()
As can be seen, the targets can be either references to the class library (0A) or references to methods in the malware (06).
Deobfuscation with de4dot
On the surface, breaking this type of obfuscation is easy.
Obtain the mapping, find all delegate call sites, and replace the calls with their real targets.
In fact, finding the call sites and replacing them is something that is pretty much entirely handled by de4dot - you just need to tell it how to identify the delegate classes and provide the mapped call.
There are ProxyCallFixer
classes that can be inherited from when adding code for a new deobfuscator.
The classes are numbered from 1 to 4, which is not terribly helpful, but comments above the classes describe what sort of situation they apply to.
In our case, ProxyCallFixer3
would be the most appropriate, because our call sites consist of static methods, and a delegate instance is pushed as last parameter:
// Fixes proxy calls that call a static method with the instance of
// a delegate as the last arg, which then calls the Invoke method.
// ...push args...
// ldsfld delegate instance
// call static method
public abstract class ProxyCallFixer3 : ProxyCallFixer1
When inheriting from this class, there’s a bare minimum of three things you need to do:
- Override
GetCallInfo
- Override
CheckCctor
- Call
SetDelegateCreatorMethod
in constructor
That’s all. Next, you instantiate your fixer class, call Deobfuscate
, and it’ll work its magic. You don’t need to worry about things like replacing IL instructions at call sites - it’s all handled for you.
GetCallInfo
We’re going to start with GetCallInfo
, because it’s pretty straight-forward.
It’s called when de4dot wants to map a delegate field to the actual called method.
You need to provide the called method and the IL opcode for the call, which is mostly either going to be Call
or Callvirt
.
protected override void GetCallInfo(object context, FieldDef field, out IMethod calledMethod, out OpCode callOpcode) {
calledMethod = null;
callOpcode = null;
if (_mapping == null || !_mapping.TryGetValue(field.MDToken.ToInt32(), out var mappedToken)) {
return;
}
bool virtFlag = (mappedToken & 0x40000000) > 0;
mappedToken &= 0x3fffffff;
calledMethod = module.ResolveToken(mappedToken) as IMethod;
callOpcode = virtFlag ? OpCodes.Callvirt : OpCodes.Call;
}
You may recognize some of the code from the RateIndexer
method above, especially the bit masking.
This is not surprising, because the token values obtained from the mapping need to be interpreted in the same way that the obfuscator handles them.
SetDelegateCreatorMethod
One slightly less intuitive thing that you need to do is call SetDelegateCreatorMethod
- for example in the constructor of your derived class.
It’s easy to miss because the compiler doesn’t force you to do it, but if you don’t, your new code will simply do nothing, because the base class will bail early in a bunch of places.
It expects a MethodDef
parameter, which, as the name says, is the obfuscator method that creates delegates.
In our case, that’s the RateIndexer
method.
Finding this method programmatically can, for example, be done by iterating over all types in the assembly, inspecting those that have a static constructor (cctor), and checking if the cctor has a single call to a method taking a RuntimeTypeHandle
parameter.
In the end, de4dot itself doesn’t do much with this information - it’s mainly used to detect that the current binary can in fact be deobfuscated with the custom proxy call fixer.
CheckCctor
Finally, you need to override the method object CheckCctor(ref TypeDef type, MethodDef cctor)
.
This method is called for every delegate type that has a static constructor and has at least one field.
You need to analyze whether this is in fact a delegate type that can be handled.
One could re-use parts of the logic from above to, again, check if we see a call to a method that looks like RateIndexer
.
If the type/cctor looks fitting, an object called context needs to be returned.
This can be any object of your choosing, as long as it is not null.
It is simply passed to each call of GetCallInfo
and de4dot itself doesn’t do anything with it.
You can use it to store extra info you’ve gathered in CheckCctor
so you don’t need to find the info twice, but many obfuscator types don’t require anything here, meaning you can return new object()
or even this
, since it literally does not matter.
What about the encrypted mapping?
So far, we’ve glossed over the fact that we still need to get hold of the token-to-token mapping and have just assumed that there’s a _mapping
in GetCallInfo
.
Since we’ve established that the decryption code is individually generated per protected binary, we have little choice but to run that exact code and apply it to the byte array of the encrypted resource.
Finding the resource itself and fetching its data using dnlib is trivial - just check what string is being passed to a call to GetManifestResourceStream
.
For the decryption code, we have several options with de4dot:
- Load the assembly and directly call the decryption code via reflection. This is the easiest method by far, but also the most dangerous one, since you’ll be executing arbitrary code from a potentially malicious binary.
- Isolate the decryption code and copy it to a new temporary in-memory assembly. This works well if the code has very local behavior and does not make any calls. It’s reasonably safe, because you can add safety checks for the IL code to disallow certain instructions (skipping deobfuscation if something seems fishy).
- Use IL emulation - de4dot offers an instruction emulator for algorithmic code (no calls, no allocations) that would be applicable to our case. The tricky part is to find the proper start and end point for emulation - but the same also goes for the option above.
We initially went for the isolation path, and it worked well.
As we became more familiar with this obfuscator, we realized that de4dot already had support for it (albeit broken)! It turns out that what we’ve been dealing with is in fact .NET Reactor, one of the more popular obfuscators still around.
The existing code in de4dot is relying on instruction emulation.
It was broken because it had trouble identifying the proper start/end points for emulation, thus being unable to properly decrypt the resource containing the mapping.
The old code would first locate the end point (which is marked by a series of conv
instructions), and from there it would iterate upwards to find the start.
It did so by counting how many unique local variables it encountered.
This is a somewhat brittle strategy, because temporary variables can easily be added, thus their count may fluctuate.
We changed it up to look for a ldelem
instruction instead and then go down again until there is a ldloc
.
You can find the diff for our changes here.
/* 0x00033826 91 */ IL_0136: ldelem.u1
/* 0x00033827 60 */ IL_0137: or
/* 0x00033828 130A */ IL_0138: stloc.s V_10
// <start of crypto algo>
// num3 = num3;
/* 0x0003382A 1109 */ IL_013A: ldloc.s V_9
/* 0x0003382C 1309 */ IL_013C: stloc.s V_9
// uint num9 = num3;
/* 0x0003382E 1109 */ IL_013E: ldloc.s V_9
// uint num10 = num3;
/* 0x00033830 1109 */ IL_0140: ldloc.s V_9
// uint num11 = 114135216U;
/* 0x00033832 20B090CD06 */ IL_0142: ldc.i4 114135216
/* 0x00033837 FE0E2600 */ IL_0147: stloc V_38
// .......
Before and after
WatcherWatcher.WriteWorker(WorkerWatcher.WriteWorker(WorkerWatcher.ConnectWorker), new UnhandledExceptionEventHandler(MappingTagDescriptor.SetupFactory), WatcherWatcher.ConcatWorker);
FieldWatcher.WriteWorker(FieldWatcher.CallWorker);
AppDomain.CurrentDomain.UnhandledException += MappingTagDescriptor.SetupFactory;
Authentication.ForgotFactory();
The first call adds an unhandled exception handler.
After deobfuscation, the decompiler is able to turn it into idiomatic C# code with the +=
syntax.
The SetupFactory
and ForgotFactory
methods still sound weird, but they’re not delegates - they’re proper methods in the malware that were renamed by the obfuscator.
Technique: String encryption
As the author of a string encryption scheme, you need to put in some work. This is because de4dot supports multiple strategies for dealing with string encryption, and if you implement a naive scheme, de4dot will be capable of breaking it without having to write a single line of custom deobfuscation code.
The strategies de4dot supports via the --strtyp
parameter are:
- Static: Specialized code for the detected obfuscator is used in order to deobfuscate strings without directly executing code from the target assembly. This only works when de4dot has an implementation for the particular obfuscator.
- Delegate: The target assembly is loaded and the original, unmodified string decrypter method is invoked for each string.
- Emulate: This is somewhat misnamed. There is no instruction emulation going on; it does actually execute IL code. In terms of methodology, it’s similar to the Delegate decrypter type. The difference is that the code is processed/rewritten and checked for anti-deobfuscation measures first. Such measures include checking the calling assembly, or checking the stack frame for entries that are not supposed to be there (if de4dot calls the decryption method, de4dot’s methods will show up in the frame). Any discovered checks are patched in such a way that nothing incriminating shows up.
Delegate and Emulate require an additional --strtok <token>
parameter that specifies the decrypter method, which has to be identified by manually looking at the assembly. The parameter may be specified multiple times if multiple such methods exist.
Consider this simple example that has two AES-encrypted strings accessed via an index:
public static class StrCrypto
{
private static readonly string[] _strings =
{
"9guJzfcL9QVZQkZLtBT/ug==",
"He8w3I012LDGTbGD7x+4VQ=="
};
public static string GetString(int index)
{
var crypted = Convert.FromBase64String(_strings[index]);
var aes = new RijndaelManaged();
aes.Mode = CipherMode.ECB;
aes.Padding = PaddingMode.PKCS7;
var decryptor = aes.CreateDecryptor(Encoding.ASCII.GetBytes("any16charswilldo"), null);
var decrypted = decryptor.TransformFinalBlock(crypted, 0, crypted.Length);
return Encoding.UTF8.GetString(decrypted);
}
}
// Usage, e.g., in Main():
Console.WriteLine(StrCrypto.GetString(0));
Console.WriteLine(StrCrypto.GetString(1));
If you compile this and run de4dot --strtyp delegate --strtok 06000003
, you’ll obtain the following code with all references to GetString
gone. Note: 06000003
is the token of GetString
and may differ in your binary.
// After de4dot
Console.WriteLine("Hello");
Console.WriteLine("World");
So the question is, what does one need to do so de4dot’s delegate/emulate strategies don’t work out of the box? The answer is, not much.
// In StrCrypto, introduce a static field:
public static int Offset = -2;
// In GetString, add it to the index parameter:
var crypted = Convert.FromBase64String(_strings[Offset + index]);
// In calls to GetString, pass a sum:
Console.WriteLine(StrCrypto.GetString(StrCrypto.Offset + 2));
Console.WriteLine(StrCrypto.GetString(StrCrypto.Offset + 3));
If you now attempt deobfuscation, you’ll get WARNING: Could not find all arguments to method System.String ConsoleApplication1.StrCrypto::GetString(System.Int32) (06000003), instr: IL_0006: add
.
It chokes on the sum, because de4dot will not attempt to obtain the value of fields.
That’s because, for the generic case, it’s not clear whether a field’s value is really constant or if it might be assigned in multiple places.
Another complication is that the field’s initial value (-2
) is not part of the metadata for the field declaration or anything like that.
It’s set in IL code in an auto-generated static constructor, so if you want the value, you need to parse the code.
This is the approach .NET Reactor has taken, except that their fields are not static (but still constant) and that they have many such fields.
.NET Reactor
Similar to the method delegates we saw earlier, the strings are stored in another embedded .NET resource. It’s encrypted with two layers: AES and more of the custom crypto we saw earlier. The decryption method that is called whenever a string is needed takes an offset parameter. At this offset in the resource, there’s a 32-bit integer specifying the length of the string, followed by UTF-16 string data.
de4dot has a static string deobfuscator for .NET Reactor.
In this instance, the de4dot code was suffering from the same emulation bounds problem we already fixed.
With the fix, the string resource was getting decrypted successfully - however, the actual string deobfuscation still failed due to the above-mentioned field technique.
Typically, you can avail yourself of de4dot’s inlining facilities and simply call staticStringInliner.Add(theDecryptMethod, (method, gim, args) => stringDecrypter.Decrypt((int)args[0]))
.
This will look for all references to the obfuscator’s decrypt method (theDecryptMethod
), pass the argument of each call site to your static decrypter, and then replace the call with a direct load of the decrypted string (that’s why it’s called inliner).
Sadly, it runs into the same warning we saw earlier about not being able to find all parameters, so we cannot make use of the inliner for this version of .NET Reactor.
So, we ended up manually iterating over code, checking for string decryption calls and properly parsing the argument (consisting of an XOR of a field and a 32-bit constant).
The constant field values are obtained by first looking for a method that has many ldc.i4
(load integer constant) and stfld
(store field) instructions, and then going through them one by one and pairing them up to get a field name → constant mapping.
With de4dot’s little helpers and generally good support for working with IL code, the fix for this came out to around 80 lines - not too bad at all.
Before and after
ManagementObjectSearcher managementObjectSearcher = new ManagementObjectSearcher(
AnnotationParserInstance.RegisterIndexer(
2006988233 ^ <Module>{eda734e7-e71c-4abd-ae26-a8ec222af5ee}.m_d30267c0a00f44b5bec1a2d65bcab519.m_75c1500b23e14cdb9b27f078275c425a
),
AnnotationParserInstance.RegisterIndexer(
1912974033 ^ <Module>{eda734e7-e71c-4abd-ae26-a8ec222af5ee}.m_d30267c0a00f44b5bec1a2d65bcab519.m_cd6b2680ab2d441381513c7d9cf6dfd5
)
);
ManagementObjectSearcher managementObjectSearcher = new ManagementObjectSearcher("root\\SecurityCenter2", "SELECT * FROM AntiVirusProduct");
Note: In a raw protected assembly, there will be more bitwise operations around the XOR. They’re folded by de4dot’s constant folding pass, which is always applied as part of generic code deobfuscation.
de4dot situation
de4dot has been around since 2011, and it is still able to make a difference for many binaries even when only relying on its generic algorithms, but unfortunately, the original project has been unmaintained since 2020. As a result, it has been forked many times over the years. Whenever someone wanted to implement a new deobfuscator, they’d fork the original repo and add their code. Sometimes, other forks were used as a base, e.g., mobile46’s fork was popular for a time.
This led to a lot of fragmentation, which is unfortunate for a couple of reasons:
- Forks that applied minor fixes to existing code are very hard to discover, because you cannot search for them with terms like
<new obufscator name> de4dot
. - Often, it’s not exactly clear which obfuscator was used on an assembly. This is where de4dot’s detection logic comes in handy, but if there are 5 forks for 5 different obfuscators, that means you need to try 5 different binaries.
Since it seems there has been no concerted effort to combine the fixes and additions people have made over the years into a single build, we have decided to do so.
The result: de4dotEx!
This fork contains code for deobfuscating ConfuserEx, DoubleZero, VirtualGuard and more. It also contains various fixes for existing deobfuscators made by mobile46, andywu188, KOLANICH, kant2002 and others.
We’ll try to keep integrating any new forks that come up, and of course, we’d also be delighted to accept pull requests!
PS: We are aware that there are tools like NETReactorSlayer that are excellent for unprotecting .NET Reactor binaries. Nevertheless, we found this to be a good opportunity to get more familiar with .NET internals and deobfuscation tooling development.
The examples in this post are from the following sample (SHA256):
adf938304aad6e63955e2f404436d9d5d86b3e37b3ed450028ca3c4023c9e3da