🌈 Highlighting syzkaller descriptions syntax with Rouge

Want to have fancy syntax highlighting for syzkaller snippets? I implemented it for syzlang — the language for describing syscall interfaces. The highlighting is based on Rouge — the default syntax highlighter on GitHub Pages. syzkaller programs can be highlighted as well.

🏙 Background
🏗 Process
🗃 Demos
⚙ Usage
🚀 Future

⬅ Note the interactive table of contents on the left.

~~The pull request with the highlighter has been merged into Rouge but hasn’t propagated to GitHub Pages yet.~~ ~~For now, highlighting has to be set up manually.~~

Update from 2024: The support for highlighting syzlang snippets has been merged into Rouge and pulled into GitHub Pages. So no need to set up a custom Rouge gem anymore; the instructions below are just for reference. However note that syzkaller added a few new syzlang features since I implemented the highlighting, so snippets with those will not be highlighted properly.

🏙 Background

The main reason I decided to implement this: I was working on a text that includes syzlang snippets, and I wanted them highlighted.

There’s another thought to this, though. I see many people publishing deep technical content, both security-related and otherwise. And a lot of it (including mine) suffers from the same problem: poor presentation. Figuring out better ways of presenting technical things is a thing I’m interested to explore. Properly highlighting relevant code snippets is a small step in that direction.

🏗 Process

My goal was to have syzlang syntax highlighted in a GitHub Pages–hosted article, so I started looking for a way to add custom highlighting rules to code snippets. I already knew that GitHub Pages use Jekyll as a static site generator. After googling around for a bit, I found out that Jekyll relies on a Ruby library called Rouge for highlighting snippets.

I looked through Rouge’s docs and found out that adding support for another language requires implementing a new lexer. A lexer is a module that parses code and splits it into a series of tokens, e.g., Keyword, Name::Function, or plain Text. Each token gets highlighted according to the used theme.

Fast forward a few evenings, and the implementation was ready. Even though I had never written any Ruby code before, writing a new lexer was straightforward. The abundance of examples helped a lot. Funny thing: I ended up with almost twice as much code for tests than for the lexer itself.

Initially, I tried a simple lexer implementation of splitting the code word-by-word and highlighting every word that matches a keyword. But that results in syscall arguments and struct fields getting highlighted improperly when they are named as one of the keywords. Instead, I made the lexer fully parse the structure of each statement.

Besides a lexer for syzlang, I also implemented one for syzkaller programs. syzkaller programs are less convoluted than syzlang in terms of syntax. Thus, using the simple word-by-word parsing approach (while still keeping track of parentheses) worked perfectly.

🗃 Demos

Here’s the result:

resource fd_hiddev[fd]

syz_open_dev$hiddev(dev ptr[in, string["/dev/usb/hiddev#"]],
		    id intptr, flags flags[open_flags]) fd_hiddev (timeout[50])

ioctl$HIDIOCGCOLLECTIONINDEX(fd fd_hiddev, cmd const[HIDIOCGCOLLECTIONINDEX],
			     arg ptr[in, hiddev_usage_ref])

hiddev_usage_ref {
	report_type	int32[HID_REPORT_TYPE_MIN:HID_REPORT_TYPE_MAX]
	report_id	flags[hid_report_ids, int32]
	field_index	int32
	usage_index	int32
	usage_code	int32
	value		int32
}

hid_report_ids = 1, 2, 3, HID_REPORT_ID_UNKNOWN,
		 HID_REPORT_ID_FIRST, HID_REPORT_ID_NEXT

Use the slider switch on the right to check out the code without syntax highlighting.

What gets highlighted: syscall names, including the variant — the part that comes after $; keywords; types; numbers; and strings. The colors depend on the used theme.

Notice how only one instance of flags is highlighted in the arguments of syz_open_dev$hiddev. Thanks to the lexer implementation that differentiates names from types.

Another example:

syz_usb_connect$hid(				 # connects a USB-HID device
	speed flags[usb_device_speed],		 # device speed
	dev_len len[dev],			 # device descriptor's length
	dev ptr[in, usb_device_descriptor_hid],	 # USB-HID device descriptor
	descs ptr[in, vusb_connect_descriptors]	 # USB descriptors requested
						 #   during enumeration
) fd_usb_hid (timeout[3000], prog_timeout[3000])

syz_usb_control_io$hid(fd fd_usb_hid,
		       descs ptr[in, vusb_descriptors_hid],
		       resps ptr[in, vusb_responses_hid]) (timeout[300])

Highlighter uses less strict parsing rules than syzkaller. This allows splitting declarations into multiple lines and adding after-line comments. Useful for readability.

Here’s a snippet with a syzkaller program:

r0 = syz_usb_connect$hid(0x0, 0x36, &(0x7f0000000000)={...}, 0x0)
syz_usb_control_io$hid(r0, 0x0, 0x0)
syz_usb_control_io(r0, &(0x7f0000000340)={...}, 0x0)
r1 = syz_open_dev$hiddev(&(0x7f0000000740)='/dev/usb/hiddev#\x00', 0x0, 0x0)
ioctl$HIDIOCGCOLLECTIONINDEX(r1, 0x4018480c,
			&(0x7f0000000000)={0x2, 0xffffffff, 0x0, 0x0, 0x400})

For programs, the two important parts are syscall names and resources. Syscalls provide the structure, and resources bind syscalls together. Both are highlighted.

Being able to easily spot resources is crucial when reading complex programs like this one:

r0 = socket$can_j1939(AUTO, AUTO, AUTO)
ioctl$ifreq_SIOCGIFINDEX_vcan(r0, AUTO, &AUTO={'vxcan0\x00', <r1=>0x0})
bind$can_j1939(r0, &AUTO={AUTO, r1, 0x0, {0x0, 0x0, 0x0, 0x0}, 0x0}, AUTO)
r2 = socket$can_j1939(AUTO, AUTO, AUTO)
ioctl$ifreq_SIOCGIFINDEX_vcan(r2, AUTO, &AUTO={'vxcan1\x00', <r3=>0x0})
bind$can_j1939(r2, &AUTO={AUTO, r3, 0x0, {0x0, 0x0, 0x0, 0x0}, 0x0}, AUTO)
connect$can_j1939(r2, &AUTO={AUTO, r3, 0x0, {0x0, 0x0, 0x0, 0x0}, 0x0}, AUTO)
sendmsg$can_j1939(r2, &AUTO={0x0, 0x0, &AUTO={&AUTO='data', AUTO},
			     0x1, 0x0, 0x0, 0x0}, 0x0)
recvmsg$can_j1939(r0, &AUTO={0x0, 0x0, &AUTO=[{&AUTO='----', AUTO}],
			     0x1, 0x0, 0x0, 0x0}, 0x0)

⚙ Usage

🛠 Setup

The support for highlighting syzlang snippets has been merged into Rouge but hasn’t been included in a release yet. Once the changes are included, and GitHub Pages pick up the new release, the highlighter will become available there.

For now, the highlighter has to be set up manually. The two options of doing that are:

Deploy Jekyll on another hosting with the custom Rouge gem.
Rely on GitHub Pages but generate the site locally still using the same custom Rouge gem.

I took approach #1.

✂ Snippets

Once you’ve set up Jekyll with syzkaller-supported Rouge, use the syzlang language tag:

``` syzlang
socket(domain flags[socket_domain], type flags[socket_type], proto int32) sock
```

This gets rendered as:

socket(domain flags[socket_domain], type flags[socket_type], proto int32) sock

For syzkaller programs use syzprog:

``` syzprog
r0 = socket(AUTO, AUTO, AUTO)
```

The result:

r0 = socket(AUTO, AUTO, AUTO)

🎨 Theme

The default Jekyll’s _syntax-highlighting.scss theme relies heavily on using bold fonts. They distract, so instead, I’m using a custom theme with almost no bold fonts and GitHub-like colors.

There are also several standard themes from which you can choose.

🚀 Future

A couple of highlighting-related ideas for the future.

It would be cool to have syzlang code highlighted on GitHub. However, GitHub requires the language to be used in at least 200 repos before syntax highlighting for it can be added. I doubt this is the case for syzlang right now. Before usage can be measured, syzkaller needs to standardize file extensions.

Another thing that would be great is to have Jekyll highlight inline snippets. I’ve failed to find any references of anyone doing this, but I don’t think it’s hard to implement.

💜 Thank you for reading!

🐵 About me

I’m a security researcher and a software engineer focusing on the Linux kernel.

I contributed to several security-related Linux kernel subsystems and tools, including KASAN — a fast dynamic bug detector, syzkaller — a production-grade kernel fuzzer, and Arm Memory Tagging Extension — an exploit mitigation. I also wrote a few Linux kernel exploits for the bugs I found.

Occasionally, I’m having fun with hardware hacking, teaching, and other random stuff.

Follow me @andreyknvl on X, @andreyknvl.bsky.social on Bluesky, @xairy@infosec.exchange on Mastodon, or @xairy on LinkedIn for notifications about new articles, talks, and training sessions.