Notes on Pyarmor BCC Mode

May 30, 2025 •
Hendrik Eckardt
Hendrik Eckardt's Bild

Hendrik Eckardt

Active since: 2019

Passionate about reverse engineering, loves deep dives into low-level affairs.

pyarmor

Intro

Throughout the last month, it has been brought to our attention by multiple parties that threat actors have increasingly started using BCC mode for protecting their Python scripts.

Dashingsoft, the company behind Pyarmor, describes BCC mode as follows on their homepage:

Converts some Python functions to C functions and compiles them into machine instructions using high optimization options for irreversible obfuscation.

We assume BCC is short for something akin to “Bytecode to C Compilation”.

From the sound of it, this does not bode well. In today’s post, we’ll delve into this form of obfuscation and see what can be done to get some insight into the protected code for the purpose of malware analysis.

Enter BCC

The first question is: Where can we even find the functions that have been compiled to native code? It turns out, the tooling we developed as part of the last blog post already dumped it - albeit by accident. Some users of the script were confused that they weren’t able to unmarshal any Python data after initially decrypting the Pyarmor blob, because… well, see for yourself.

Screenshot of a hex dump showing an ELF file at offset 0x10
Figure 1: That's not marshaled Python data!

Seeing an ELF (Executable and Linkable Format) file here is quite unusual, especially since the malware in question is targeting Windows. However, that’s not the only unusual thing: The ELF structure is quite malformed and IDA, for example, refuses to load the file because it can’t make heads or tails of the headers. Our suspicion is that Pyarmor produces an ELF object during its compilation process and then heavily customizes the header for its own purposes.

At runtime, VirtualAlloc is used to allocate a single area of memory with RWX permissions that the entire ELF is written into. The Pyarmor runtime then performs some computations to retrieve the following information from the header:

  • A pointer to a function table, describing which functions are present in the binary and where they begin. Located at 0x160 in Figure 1.
  • An offset where the Pyarmor runtime should place a pointer to an “interface” object that allows the BCC code to interact with the outside world. Located at 0xE0 in Figure 1.
Screenshot of BCC function table in the binary
Figure 2: Function table in the ELF

As can be seen in Figure 2, each table entry consists of 4 qwords: A pointer to the null-terminated function name (not very expressive; it’s bcc_ followed by the line number in the original script), a pointer to the function itself, and two values that are always 1 and 0 here. This happens to be a perfect match for the PyMethodDef struct. The value denoted as 1 is the method flags, METH_VARARGS. The 0 is for the function docstring pointer, which Pyarmor has no reason to supply.

How are BCC functions called?

So far, we’ve dumped the BCC ELF file. Where is the Python module that we expected in the first place, which should contain bytecode that calls the BCC parts in some fashion?

It turns out there are two encrypted Pyarmor blobs in the bytes-string passed to __pyarmor__ when BCC mode is used. The first is the ELF, the second one the compiled Python module. We adjusted our tooling so that it is aware of this situation and saves both.

The disassembly of a BCC stub function looks like this:

Disassembly of <code object main at 0x73c1cb604ab0, file "<frozen main>", line 41>:
 41           0 RESUME                   0

  1           2 LOAD_CONST               0 (None)
              4 STORE_FAST               0 (__assert_bcc__)

  2           6 PUSH_NULL
              8 LOAD_CONST               1 ('__pyarmor_bcc_54438__')
             10 NOP
             12 PRECALL                  0
             16 CALL                     0

 41          26 RETURN_VALUE

There’s nothing spectacular going on here - it assigns an __assert_bcc__ variable and then does a call. The call is on a string constant __pyarmor_bcc_54438__ - we’d expect it to go to bcc_41 in the ELF. How do the two relate?

If you remember our previous blog post on Pyarmor, you may remember we encountered some extra data in code objects marshaled by Pyarmor. At the time we ignored it and didn’t do a deeper analysis, because it didn’t really matter for our purposes. That has changed now. We figured out that it’s essentially a mapping that describes how to patch the co_consts of the code object being loaded. The patches inject native function references.

There are multiple types of patches, the most important ones are:

  • Replace constant with reference to one of three Pyarmor runtime functions: C_ASSERT_ARMORED, C_ENTER_CO_OBJECT or C_LEAVE_CO_OBJECT. In the example above, this happens for constant index 0, which is specified as None. The patch data dictates that it gets overwritten with a reference to C_ASSERT_ARMORED.
  • Replace constant with a PyCMethod that references a definition in the BCC function table. This happens for constant index 1 in the example. The patch data maps it to BCC table index 0, which is bcc_41 as expected.

The previous contents of the constant are ignored - as far as we understand, the __pyarmor_bcc_54438__ string may as well have been None without breaking anything.

All in all, this nifty injection/replacement logic enables everything to work together seamlessly, without requiring further trickery such as intercepting CALL instructions in the bytecode.

We’ve expanded our tool repository with a new bcc_info.py script that parses the data and prints the mappings, among other things.

Interactions between BCC code, Pyarmor, and Python

The native code does not exist in a vacuum, or in other words, it’s not at all self-contained. It needs to have frequent interactions with the Python interpreter for tasks such as retrieving object attributes and calling other functions. This also goes for calling other BCC functions: they aren’t called directly - instead, it takes the roundtrip through the interpreter.

Another interesting topic is math. Consider the following Python expression: (100000000000 + 300000000000) ** 2. While the addition would work with a standard C + operator (given that 64-bit integer types are used), an exponentiation operator is not directly available, and the final result is so large that it would require using a library for working with big numbers. Pyarmor thus cannot translate any numeric operation directly into native code. The numbers stay wrapped into PyNumber objects, just as they would during “normal” script execution, and the native code needs to call into the Python interpreter to perform any kind of operation such as addition, multiplication, and so on.

All of these interactions are made possible by a special struct that the Pyarmor runtime shares with the BCC code. It contains mostly function pointers, but also a handful of object references.

We’ve reverse engineered the most important parts of the structure.

enum GLOBAL_OPS
{
  GLOBAL_DELETE = 0x0,
  GLOBAL_GET = 0x1,
  GLOBAL_RETURN_GLOBALS = 0x2,
  GLOBAL_SPECIAL_ENTER = 0x4, /* __enter__ */
  GLOBAL_SPECIAL_EXIT = 0x5, /* __exit__ */
  GLOBAL_SET_MIN = 0x10, /* anything above means Set (pointer value instead of int) */
};

struct bcc_ftable
{
  _QWORD p_stdin;
  void (__fastcall *memset)(void *, _QWORD, _QWORD);
  _QWORD p_stderr;
  _QWORD fprintf;
  void *(__fastcall *PyNumber_operator)(void *a, void *b, int operatortype);
  __int64 (*build_collection)(__int64 colltype, __int64 count, ...);
  __int64 *(__fastcall *call_python_func)(__int64 bDontCall, __int64 *pyCallable, int bArgsRequired, int bKwArgsRequired, __int64 *argsTuple, __int64 *kwargsDict);
  int (__fastcall *set_exception_if_none_was_raised)(int mode);
  void *(__fastcall *comparison)(void *unused, int operatortype, void *left, void *right);
  _QWORD qword48;
  _QWORD fetch_exception;
  _QWORD string_format;
  void *(__fastcall *globals_operation)(void *unused, void *key, GLOBAL_OPS modeOrValueForSet);
  _QWORD op_mkfunc_not_available;
  void *(__fastcall *iter_next)(void *);
  _QWORD qword78;
  _QWORD qword80;
  _QWORD update_exception_info;
  _QWORD qword90;
  __int64 (__fastcall *unpack_values)(void *unused, void *input, int maxCount, void **output);
  _QWORD qwordA0;
  _QWORD new_function;
  _QWORD qwordB0;
  _QWORD import_stuff;
  _BYTE gapC0[32];
  _QWORD Py_NoneStruct;
  _QWORD Py_TrueStruct;
  _QWORD Py_FalseStruct;
  _QWORD gapF8;
  int (__fastcall *PyBytes_AsStringAndSize)(void *obj, char **buffer, _QWORD *length);
  void *(__fastcall *PyCell_Get)(void *cell);
  void *(__fastcall *PyCell_New)(void *ob);
  int (__fastcall *PyCell_Set)(void *cell, void *value);
  void (__fastcall *PyErr_Clear)();
  void *(__fastcall *PyErr_Occurred)();
  void (__fastcall *PyErr_SetObject)(void *type, void *value);
  void *(__fastcall *PyEval_GetGlobals)();
  void *(__fastcall *PyImport_ImportModule)(const char *name);
  void *(__fastcall *PyImport_ImportModuleLevel)(const char *name, void *globals, void *locals, void *fromlist, int level);
  int (__fastcall *PyList_Append)(void *list, void *item);
  void *(__fastcall *PyList_New)(__int64 len);
  void *(*PyObject_CallFunction_SizeT)(void *callable, const char *format, ...);
  void *(*PyObject_CallFunctionObjArgs)(void *callable, ...);
  void *(*PyObject_CallMethod_SizeT)(void *obj, const char *name, const char *format, ...);
  int (__fastcall *PyObject_DelItem)(void *, void *key);
  void *(__fastcall *PyObject_GetAttr)(void *, void *attr_name);
  void *(__fastcall *PyObject_GetItem)(void *, void *key);
  void *(__fastcall *PyObject_GetIter)(void *);
  int (__fastcall *PyObject_IsTrue)(void *);
  int (__fastcall *PyObject_SetAttr)(void *, void *attr_name, void *v);
  int (__fastcall *PyObject_SetItem)(void *, void *key, void *v);
  int (__fastcall *PySet_Add)(void *, void *key);
  void *(__fastcall *PySet_New)(void *iterable);
  void *(__fastcall *PySlice_New)(void *start, void *stop, void *step);
  void *(__fastcall *PyTuple_GetItem)(void *p, __int64 pos);
  void (__fastcall *Py_DecRef)(void *);
  void (__fastcall *Py_IncRef)(void *);
};

In the first half we have a couple C language functions and pointers, followed by helper functions defined in the Pyarmor runtime. The second half consists of pointers to functions and structs exposed by the standard Python C API.

The struct is created in the __pyarmor__ C implementation:

  ftable_->memset = memset;
  ftable_->p_stderr = v67(2LL);
  ftable = *bcc_state_;
  ftable->fprintf = fprintf;
  ftable->PyNumber_operator = bcc_PyNumber_operator;
  ftable->build_collection = bcc_build_collection;
  ftable->call_python_func = bcc_call_python_func;
  ftable->set_exception_if_none_was_raised = bcc_set_exception_if_none_was_raised;
  ftable->comparison = bcc_comparison;
  ftable->qword48 = sub_648D2580;
  [ ... ]

An example of BCC code

Let’s study a small example of what Python code that was translated to C could look like.

  call_python_func = ftable->call_python_func;
  v9 = (_DWORD *)v4[3];
  if ( v9 )
  {
    ++*v9;
    v10 = ftable->globals_operation(v23, v9, GLOBAL_GET);
    v7 = ftable;
  }
  else
  {
    v10 = 0LL;
  }
  v7->Py_DecRef(v9);
  v11 = (_DWORD *)v4[4];
  if ( v11 && (++*v11, v10) )
    v12 = (__int64 *)ftable->PyObject_GetAttr(v10, v11);
  else
    v12 = 0LL;
  ftable->Py_DecRef(v10);
  ftable->Py_DecRef(v11);
  build_collection = ftable->build_collection;
  if ( v26 )
    ++*(_DWORD *)v26;
  v14 = (__int64 *)build_collection(1LL, 1LL);
  v15 = call_python_func(0LL, v12, 1, 0, v14, 0LL);

The code performs the following steps:

  1. A global variable is retrieved by name.
  2. An attribute is retrieved from the object pointed to by the variable.
  3. The retrieved attribute happens to be a Callable, which is called at the end.

If we imagine the global was sys, and the attribute was exit, we’d have something like sys.exit(argument). The argument resides in v26, which is initially set somewhere above in code not shown here; unfortunately, the decompiler doesn’t seem to understand the variadic nature of the build_collection function and never shows more than two parameters (it also doesn’t offer the option to add them manually in this case).

The example shows the effect of BCC mode: a single line of Python code is easily blown into 30 lines of C pseudocode. While it’s not too hard to follow what’s going on, it’s not so easy to intuit at a glance what the code is doing as a whole. Everything is quite spread out and interspersed with low-level details such as reference counting.

If you analyze a BCC code blob, you’ll find that it doesn’t contain any strings to speak of, nor any other constants. In the example above, we saw references such as v4[3] and v4[4]. This pattern can be found in all BCC functions, and the indexes mostly increase, with some rarely being reused. It stands to reason that we found the usages of the constants - next, we need to figure out where to find the actual values.

Constant In All Other Things

At the beginning of each BCC function you’ll see something like this:

v4 = *(__int64 **)(a1 + 8LL * *(int *)(a1 + 16) + 16);

Followed by references to v4 starting with array index 3.

The first question is, what is a1? We’ve seen that BCC functions are created as PyCMethod instances, which get PyObject *self as first parameter. While normally, method means a function that is part of a class and is invoked on object instances of said class, the Pyarmor authors decided to repurpose it: they pass co_consts as self.

The co_consts of the example in the previous section looks like this: (None, '__pyarmor_bcc_54440__', ('sys', 'exit')). That last tuple looks interesting - it would be a good fit. Thus, we’re looking for the access pattern tuple-in-a-tuple.

The items in a (native) Python tuple start at offset 24, while the size resides at offset 16 (note: these values depend on the Python version and build settings). With the knowledge that our tuple of interest is the last one, we can rewrite the code in multiple steps in order to make sense of it:

v4 = *(__int64 **)(a1 + 8LL * a1->count + 16);
v4 = *(__int64 **)(a1 + 8LL * (a1->count - 1) + 24);
v4 = a1->items[a1->count - 1]; // last one in co_consts

sys_str = v4[3]; // aka v4 + 24
sys_str = v4->items[0];
exit_str = v4[4]; // aka v4 + 24 + 8
exit_str = v4->items[1];

Now, it would be quite annoying to have to manually look up every single constant, especially in larger methods with dozens of values. We developed an IDA script that works with the output of the bcc_info.py script and adds comments to all constant references in the decompiler view (ida_annotate_bcc.py on GitHub). As an added bonus, it creates and names functions in the ELF if they don’t exist yet.

Conclusion

Analyzing samples protected with BCC mode can be a formidable challenge, especially without tooling support. The translation to C makes any existing tools that work with Python bytecode completely useless, and the clever separation of code and data adds further complications.

We hope that this research and the support scripts we’ve provided will prove helpful to the community in analyzing threats protected with this obfuscation.


Pyarmor-Tooling repository on GitHub: https://github.com/GDATAAdvancedAnalytics/Pyarmor-Tooling