Want to have fancy syntax highlighting for syzkaller snippets? I implemented it for syzlang โ€” the language for describing syscall interfaces. The highlighting is based on Rouge โ€” the default syntax highlighter on GitHub Pages. syzkaller programs can be highlighted as well.

โฌ… Note the interactive table of contents on the left.

The pull request with the highlighter has been merged into Rouge but hasnโ€™t propagated to GitHub Pages yet. For now, highlighting has to be set up manually.

๐Ÿ™ Background

The main reason I decided to implement this: I was working on a text that includes syzlang snippets, and I wanted them highlighted.

Thereโ€™s another thought to this, though. I see many people publishing deep technical content, both security-related and otherwise. And a lot of it (including mine) suffers from the same problem: poor presentation. Figuring out better ways of presenting technical things is a thing Iโ€™m interested to explore. Properly highlighting relevant code snippets is a small step in that direction.

๐Ÿ— Process

My goal was to have syzlang syntax highlighted in a GitHub Pagesโ€“hosted article, so I started looking for a way to add custom highlighting rules to code snippets. I already knew that GitHub Pages use Jekyll as a static site generator. After googling around for a bit, I found out that Jekyll relies on a Ruby library called Rouge for highlighting snippets.

I looked through Rougeโ€™s docs and found out that adding support for another language requires implementing a new lexer. A lexer is a module that parses code and splits it into a series of tokens, e.g., Keyword, Name::Function, or plain Text. Each token gets highlighted according to the used theme.

Fast forward a few evenings, and the implementation was ready. Even though I had never written any Ruby code before, writing a new lexer was straightforward. The abundance of examples helped a lot. Funny thing: I ended up with almost twice as much code for tests than for the lexer itself.

Initially, I tried a simple lexer implementation of splitting the code word-by-word and highlighting every word that matches a keyword. But that results in syscall arguments and struct fields getting highlighted improperly when they are named as one of the keywords. Instead, I made the lexer fully parse the structure of each statement.

Besides a lexer for syzlang, I also implemented one for syzkaller programs. syzkaller programs are less convoluted than syzlang in terms of syntax. Thus, using the simple word-by-word parsing approach (while still keeping track of parentheses) worked perfectly.

๐Ÿ—ƒ Demos

Hereโ€™s the result:

resource fd_hiddev[fd]

syz_open_dev$hiddev(dev ptr[in, string["/dev/usb/hiddev#"]],
		    id intptr, flags flags[open_flags]) fd_hiddev (timeout[50])

			     arg ptr[in, hiddev_usage_ref])

hiddev_usage_ref {
	report_id	flags[hid_report_ids, int32]
	field_index	int32
	usage_index	int32
	usage_code	int32
	value		int32

hid_report_ids = 1, 2, 3, HID_REPORT_ID_UNKNOWN,

Use the slider switch on the right to check out the code without syntax highlighting.

What gets highlighted: syscall names, including the variant โ€” the part that comes after $; keywords; types; numbers; and strings. The colors depend on the used theme.

Notice how only one instance of flags is highlighted in the arguments of syz_open_dev$hiddev. Thanks to the lexer implementation that differentiates names from types.

Another example:

syz_usb_connect$hid(				 # connects a USB-HID device
	speed flags[usb_device_speed],		 # device speed
	dev_len len[dev],			 # device descriptor's length
	dev ptr[in, usb_device_descriptor_hid],	 # USB-HID device descriptor
	descs ptr[in, vusb_connect_descriptors]	 # USB descriptors requested
						 #   during enumeration
) fd_usb_hid (timeout[3000], prog_timeout[3000])

syz_usb_control_io$hid(fd fd_usb_hid,
		       descs ptr[in, vusb_descriptors_hid],
		       resps ptr[in, vusb_responses_hid]) (timeout[300])

Highlighter uses less strict parsing rules than syzkaller. This allows splitting declarations into multiple lines and adding after-line comments. Useful for readability.

Hereโ€™s a snippet with a syzkaller program:

r0 = syz_usb_connect$hid(0x0, 0x36, &(0x7f0000000000)={...}, 0x0)
syz_usb_control_io$hid(r0, 0x0, 0x0)
syz_usb_control_io(r0, &(0x7f0000000340)={...}, 0x0)
r1 = syz_open_dev$hiddev(&(0x7f0000000740)='/dev/usb/hiddev#\x00', 0x0, 0x0)
ioctl$HIDIOCGCOLLECTIONINDEX(r1, 0x4018480c,
			&(0x7f0000000000)={0x2, 0xffffffff, 0x0, 0x0, 0x400})

For programs, the two important parts are syscall names and resources. Syscalls provide the structure, and resources bind syscalls together. Both are highlighted.

Being able to easily spot resources is crucial when reading complex programs like this one:

r0 = socket$can_j1939(AUTO, AUTO, AUTO)
ioctl$ifreq_SIOCGIFINDEX_vcan(r0, AUTO, &AUTO={'vxcan0\x00', <r1=>0x0})
bind$can_j1939(r0, &AUTO={AUTO, r1, 0x0, {0x0, 0x0, 0x0, 0x0}, 0x0}, AUTO)
r2 = socket$can_j1939(AUTO, AUTO, AUTO)
ioctl$ifreq_SIOCGIFINDEX_vcan(r2, AUTO, &AUTO={'vxcan1\x00', <r3=>0x0})
bind$can_j1939(r2, &AUTO={AUTO, r3, 0x0, {0x0, 0x0, 0x0, 0x0}, 0x0}, AUTO)
connect$can_j1939(r2, &AUTO={AUTO, r3, 0x0, {0x0, 0x0, 0x0, 0x0}, 0x0}, AUTO)
sendmsg$can_j1939(r2, &AUTO={0x0, 0x0, &AUTO={&AUTO='data', AUTO},
			     0x1, 0x0, 0x0, 0x0}, 0x0)
recvmsg$can_j1939(r0, &AUTO={0x0, 0x0, &AUTO=[{&AUTO='----', AUTO}],
			     0x1, 0x0, 0x0, 0x0}, 0x0)

โš™ Usage

๐Ÿ›  Setup

The support for highlighting syzlang snippets has been merged into Rouge but hasnโ€™t been included in a release yet. Once the changes are included, and GitHub Pages pick up the new release, the highlighter will become available there.

For now, the highlighter has to be set up manually. The two options of doing that are:

  1. Deploy Jekyll on another hosting with the custom Rouge gem.
  2. Rely on GitHub Pages but generate the site locally still using the same custom Rouge gem.

I took approach #1.

โœ‚ Snippets

Once youโ€™ve set up Jekyll with syzkaller-supported Rouge, use the syzlang language tag:

``` syzlang
socket(domain flags[socket_domain], type flags[socket_type], proto int32) sock

This gets rendered as:

socket(domain flags[socket_domain], type flags[socket_type], proto int32) sock

For syzkaller programs use syzprog:

``` syzprog
r0 = socket(AUTO, AUTO, AUTO)

The result:

r0 = socket(AUTO, AUTO, AUTO)

๐ŸŽจ Theme

The default Jekyllโ€™s _syntax-highlighting.scss theme relies heavily on using bold fonts. They distract, so instead, Iโ€™m using a custom theme with almost no bold fonts and GitHub-like colors.

There are also several standard themes from which you can choose.

๐Ÿš€ Future

A couple of highlighting-related ideas for the future.

It would be cool to have syzlang code highlighted on GitHub. However, GitHub requires the language to be used in at least 200 repos before syntax highlighting for it can be added. I doubt this is the case for syzlang right now. Before usage can be measured, syzkaller needs to standardize file extensions.

Another thing that would be great is to have Jekyll highlight inline snippets. Iโ€™ve failed to find any references of anyone doing this, but I donโ€™t think itโ€™s hard to implement.

๐Ÿ’œ Thank you for reading!

๐Ÿง Support

Just in case you found this article particularly useful.

Bitcoin 1LiaK6wwNTnKGBq6n583yJj6BHWcF1FGiE
Ethereum 0x7A3268383AD9ea129d143999eb09197D830D7e25
Cardano addr1v8vsqgm2sjz8mux8rxufmdwrveya5ue29u37tafscdkf30cnm5vah

๐Ÿฑ About me

Iโ€™m a security researcher and a software engineer focusing on the Linux kernel.

I contributed to several security-related Linux kernel subsystems and tools: KASAN โ€” a fast dynamic bug detector, syzkaller โ€” a production-grade kernel fuzzer, and Arm Memory Tagging Extension โ€” an exploit mitigation.

I also wrote a few Linux kernel exploits for the bugs I found.

Occasionally, Iโ€™m having fun with hardware hacking, teaching, and other random stuff.

Follow me @andreyknvl on Twitter or @xairy on LinkedIn for notifications about new articles and talks.