sorsarre | Entries tagged with programming

While writing the previous rant, it has occurred to me that many binary formats are not, in fact, context-free.

It is often the case that the structure you are trying to read depends on elements already encountered earlier in the stream. This means that no DSL that could be invented for binary parsing cannot escape managing state if it is to be a full-blown parser that produces a usable model, object or otherwise. And that state management can be pretty damn complicated.

One glorious example would be MPEG-TS (MPEG Transport Stream). That format has at least three layers of binary syntax. Lower layers have to be parsed through, then selected bits of them will have to be fed into the next layer's parser, and so on, ad nauseam.

Another example would be the messier varieties of ISOBMFF like Apple QuickTime or Sony camera output. With those, you literally have to check the flags already encountered elsewhere to parse them correctly. Long story short, any DSL sufficiently complex for creating a complete binary parser that spews out an object model implies inventing a whole programming language, control flow and all.

I have struggled with the issue for quite some time, ever since I had first run into binary formats while working at MainConcept. So far I couldn't find a silver bullet. I can only say that the best bet so far is code generation — it saves the effort of writing most of the boilerplate, but would still give reasonable leeway for the crazier cases.

I wonder what is the actual matter with the binary parsing.

I mean, it's not that much more particularly hard than the bona fide text parsing. It does get a bit convoluted when error recovery is involved, yes, but in the end, why not? You google up text parsing and you've got yourself a whole lot of stuff, ranging from small parser combinator libraries to full-blown parser generators with bells and whistles like ANTLR. But try the same with binary parsing and it's all half-assed solutions which can't do bit twiddling or are written by someone trying to go all declarative (unfortunately, json kind of declarative, fields and structs) without going into any of the uglier binary format issues.

Если по чесноку, то к комментариям я всегда относился как-то наплевательски. Программы, как правило, жили недолго, и поэтому оные содержали больше разных смехуечков, нежели четкое описание происходящего.

В очередной раз сел читать прошлогодний код. Только сейчас начал осознавать реальную ценность комментариев и документации. Ибо ни одного внятного комментария (помимо "ГДЕ БЛ#ТЬ МОЯ ДВОЙНАЯ ДИСПЕТЧЕРИЗАЦИЯ?!"), некоторые вещи просто в упор неясны и в некоторых местах откровенно присутствует чОрная магия™. В курсовой описание поверхностное, так как расплываться мыслью по древу и вдаваться в детали, коих тьма, было не люто комильфо. Отлично. Еще неделю придется потратить на разбор этого добра и добавление комментариев и написание хотя бы минимальной документации к коду (doxygen, привет).
Радует то, что, во всяком случае, не придется за месяц все это истерично делать, как в прошлом году.

Current Mood: busy
Crossposts: http://sorsarre.livejournal.com/23217.html

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

Coffee overdose

and The Shrieking Trousers of Impending Doom

Entries tagged with programming

Binary Parsing (follow-up)

Binary Parsers

Comments / Docs

Profile

April 2025

Syndicate

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags