BitTorrent bencode format tools

Project Nayuki


BitTorrent bencode format tools

The BitTorrent file distribution system defines its own format for serializing structured data, known as bencode or bencoding. It is used in .torrent files and in network communications with trackers. This page provides tools for encoding, decoding, and inspecting bencoded data.

Bencode supports four types of values: integer, byte string (representing text or raw binary data), list (sequence of bencode values), and dictionary (each key is a byte string, and each value is a bencode value).

The name “bencode” is used by Wikipedia, and I think it makes the most sense. By contrast, the official specification documents refer to the act of “bencoding” and the resulting “bencoded” data, but do not seem to call the format as “bencode”.

Torrent file inspector (JavaScript)

Choose file:

Decoded result:

Source code

Includes a library, test suite, and runnable demo program.

Java
Python
Rust
TypeScript / JavaScript

Comparison with JSON

Bencode has striking similarities with JSON. Both formats support these basic four types with unbounded size: integer, string, list/array/sequence, dictionary/object/mapping. Hence, knowing one of these technologies helps in understanding the other one.

The two formats have many differences as well:

  • BitTorrent became popular around the year , whereas JSON became popular around . The usage of JSON in the real world has greatly eclipsed BitTorrent or bencode, so there is a natural bias to view bencode through the lens of JSON even though JSON was adopted later (though not necessarily invented later).

  • On the output side, bencode produces raw bytes, while JSON produces Unicode text (typically serialized as UTF-8). This makes JSON easier to read and write by hand in a text editor, whereas bencode often requires a hex editor.

  • On the input side, bencode has byte strings, while JSON has Unicode text strings.

  • JSON allows arbitrary amounts of whitespace before and after tokens; bencode is a tight encoding without whitespace.

  • JSON supports three additional special values: null, false, true.

  • Bencode requires canonical numbers, i.e. no negative zero and no leading zeros. JSON supports negative zero (-0), numbers with leading zeros (e.g. 001), numbers with a decimal point (e.g. 12.34), and scientific notation (e.g. 567e+8).

  • Bencode enforces a canonical representation for dictionaries (keys in ascending order with no repeats). JSON allows dictionary keys to be ordered arbitrarily, and does not mandate a behavior for handling duplicate keys.

  • Given an input data structure, there is exactly one way to bencode it. Meanwhile, there are many ways to encode the structure as valid JSON data, due to options like whitespace, number formats, and character escapes in strings.

More info