Want to have fancy syntax highlighting for syzkaller snippets? I implemented it for syzlang — the language for describing syscall interfaces. The highlighting is based on Rouge — the default syntax highlighter on GitHub Pages. Syzkaller programs can be highlighted as well.

The pull request with the highlighter hasn’t been merged yet, as Rouge is experiencing technical difficulties. For now, highlighting has to be set up manually.

Background

The main reason I decided to implement this: I was working on a text that includes syzlang snippets, and I wanted them highlighted.

There’s another thought to this, though. I see many people publishing deep technical content, both security-related and otherwise. And a lot of it (including mine) suffers from the same problem: poor presentation. Figuring out better ways of presenting technical things is a thing I’m interested to explore. Properly highlighting relevant code snippets is a small step in that direction.

Process

My goal was to have syzlang syntax highlighted in a GitHub Pages–hosted article, so I started looking for a way to add custom highlighting rules to code snippets. I already knew that GitHub Pages use Jekyll as a static site generator. After googling around for a bit, I found out that Jekyll relies on a Ruby library called Rouge for highlighting snippets.

I looked through Rouge’s docs and found out that adding support for another language requires implementing a new lexer. A lexer is a module that parses code and splits it into a series of tokens, e.g., Keyword, Name::Function, or plain Text. Each token gets highlighted according to the used theme.

Fast forward a few evenings, and the implementation was ready. Even though I had never written any Ruby code before, writing a new lexer was straightforward. The abundance of examples helped a lot. Funny thing: I ended up with almost twice as much code for tests than for the lexer itself.

Initially, I tried a simple lexer implementation of splitting the code word-by-word and highlighting every word that matches a keyword. But that results in syscall arguments and struct fields getting highlighted improperly when they are named as one of the keywords. Instead, I made the lexer fully parse the structure of each statement.

Besides a lexer for syzlang, I also implemented one for syzkaller programs. Syzkaller programs are less convoluted than syzlang in terms of syntax. Thus, using the simple word-by-word parsing approach (while still keeping track of parentheses) worked perfectly.

Demos

Here’s the result:

resource fd_hiddev[fd]

syz_open_dev$hiddev(dev ptr[in, string["/dev/usb/hiddev#"]],
		    id intptr, flags flags[open_flags]) fd_hiddev (timeout[50])

ioctl$HIDIOCGCOLLECTIONINDEX(fd fd_hiddev, cmd const[HIDIOCGCOLLECTIONINDEX],
			     arg ptr[in, hiddev_usage_ref])

hiddev_usage_ref {
	report_type	int32[HID_REPORT_TYPE_MIN:HID_REPORT_TYPE_MAX]
	report_id	flags[hid_report_ids, int32]
	field_index	int32
	usage_index	int32
	usage_code	int32
	value		int32
}

hid_report_ids = 1, 2, 3, HID_REPORT_ID_UNKNOWN,
		 HID_REPORT_ID_FIRST, HID_REPORT_ID_NEXT


Use the slider switch on the right to check out the code without syntax highlighting.

What gets highlighted: syscall names, including the variant — the part that comes after $; keywords; types; numbers; and strings. The colors depend on the used theme.

Notice how only one instance of flags is highlighted in the arguments of syz_open_dev$hiddev. Thanks to the lexer implementation that differentiates names from types.

Another example:

syz_usb_connect$hid(				 # connects a USB-HID device
	speed flags[usb_device_speed],		 # device speed
	dev_len len[dev],			 # device descriptor's length
	dev ptr[in, usb_device_descriptor_hid],	 # USB-HID device descriptor
	descs ptr[in, vusb_connect_descriptors]	 # USB descriptors requested
						 #   during enumeration
) fd_usb_hid (timeout[3000], prog_timeout[3000])

syz_usb_control_io$hid(fd fd_usb_hid,
		       descs ptr[in, vusb_descriptors_hid],
		       resps ptr[in, vusb_responses_hid]) (timeout[300])


Highlighter uses less strict parsing rules than syzkaller. This allows splitting declarations into multiple lines and adding after-line comments. Useful for readability.

Here’s a snippet with a syzkaller program:

r0 = syz_usb_connect$hid(0x0, 0x36, &(0x7f0000000000)={...}, 0x0)
syz_usb_control_io$hid(r0, 0x0, 0x0)
syz_usb_control_io(r0, &(0x7f0000000340)={...}, 0x0)
r1 = syz_open_dev$hiddev(&(0x7f0000000740)='/dev/usb/hiddev#\x00', 0x0, 0x0)
ioctl$HIDIOCGCOLLECTIONINDEX(r1, 0x4018480c,
			&(0x7f0000000000)={0x2, 0xffffffff, 0x0, 0x0, 0x400})


For programs, the two important parts are syscall names and resources. Syscalls provide the structure, and resources bind syscalls together. Both are highlighted.

Being able to easily spot resources is crucial when reading complex programs like this one:

r0 = socket$can_j1939(AUTO, AUTO, AUTO)
ioctl$ifreq_SIOCGIFINDEX_vcan(r0, AUTO, &AUTO={'vxcan0\x00', <r1=>0x0})
bind$can_j1939(r0, &AUTO={AUTO, r1, 0x0, {0x0, 0x0, 0x0, 0x0}, 0x0}, AUTO)
r2 = socket$can_j1939(AUTO, AUTO, AUTO)
ioctl$ifreq_SIOCGIFINDEX_vcan(r2, AUTO, &AUTO={'vxcan1\x00', <r3=>0x0})
bind$can_j1939(r2, &AUTO={AUTO, r3, 0x0, {0x0, 0x0, 0x0, 0x0}, 0x0}, AUTO)
connect$can_j1939(r2, &AUTO={AUTO, r3, 0x0, {0x0, 0x0, 0x0, 0x0}, 0x0}, AUTO)
sendmsg$can_j1939(r2, &AUTO={0x0, 0x0, &AUTO={&AUTO='data', AUTO},
			     0x1, 0x0, 0x0, 0x0}, 0x0)
recvmsg$can_j1939(r0, &AUTO={0x0, 0x0, &AUTO=[{&AUTO='----', AUTO}],
			     0x1, 0x0, 0x0, 0x0}, 0x0)


Usage

Setup

The pull request with the highlighter hasn’t been merged yet, so GitHub Pages don’t support highlighting syzlang snippets right now. I hope that the code will be merged in a few months, and the changes will be included in the next Rouge release. When that happens, the highlighter will become available on GitHub Pages once the new release is picked up.

For now, the highlighter has to be set up manually. The two options of doing that are:

  1. Deploy Jekyll on another hosting with the custom Rouge gem.
  2. Rely on GitHub Pages but generate the site locally still using the same custom Rouge gem.

I’m not providing instructions for these; there are plenty online. I took approach #1.

Snippets

Once you’ve set up Jekyll with syzkaller-supported Rouge, use the syzlang language tag:

``` syzlang
socket(domain flags[socket_domain], type flags[socket_type], proto int32) sock
```

This gets rendered as:

socket(domain flags[socket_domain], type flags[socket_type], proto int32) sock

For syzkaller programs use syzprog:

``` syzprog
r0 = socket(AUTO, AUTO, AUTO)
```

The result:

r0 = socket(AUTO, AUTO, AUTO)

Theme

The default Jekyll’s _syntax-highlighting.scss theme relies heavily on using bold fonts. They distract, so instead, I’m using a custom theme with almost no bold fonts and GitHub-like colors.

There are also several standard themes from which you can choose.

Future

A couple of highlighting-related ideas for the future.

It would be cool to have syzlang code highlighted on GitHub. However, GitHub requires the language to be used in at least 200 repos before syntax highlighting for it can be added. I doubt this is the case for syzlang right now. Before usage can be measured, syzkaller needs to standardize file extensions.

Another thing that would be great is to have Jekyll highlight inline snippets. I’ve failed to find any references of anyone doing this, but I don’t think it’s hard to implement.

About me

I’m a software engineer focusing on Linux kernel security. I worked on bug-finding tools — KASAN, fuzzers — syzkaller, and mitigations — Arm Memory Tagging Extension. I wrote a few kernel exploits for the bugs I found.

Occasionally, I’m having fun with hardware hacking, teaching, and other random stuff.

Follow me @andreyknvl on Twitter for notifications about new blog posts.