| <html> |
| <title> |
| PyASN1 codecs |
| </title> |
| <head> |
| </head> |
| <body> |
| <center> |
| <table width=60%> |
| <tr> |
| <td> |
| <h3> |
| 2. PyASN1 Codecs |
| </h3> |
| |
| <p> |
| In ASN.1 context, |
| <a href=http://en.wikipedia.org/wiki/Codec>codec</a> |
| is a program that transforms between concrete data structures and a stream |
| of octets, suitable for transmission over the wire. This serialized form of |
| data is sometimes called <i>substrate</i> or <i>essence</i>. |
| </p> |
| |
| <p> |
| In pyasn1 implementation, substrate takes shape of Python 3 bytes or |
| Python 2 string objects. |
| </p> |
| |
| <p> |
| One of the properties of a codec is its ability to cope with incomplete |
| data and/or substrate what implies codec to be stateful. In other words, |
| when decoder runs out of substrate and data item being recovered is still |
| incomplete, stateful codec would suspend and complete data item recovery |
| whenever the rest of substrate becomes available. Similarly, stateful encoder |
| would encode data items in multiple steps waiting for source data to |
| arrive. Codec restartability is especially important when application deals |
| with large volumes of data and/or runs on low RAM. For an interesting |
| discussion on codecs options and design choices, refer to |
| <a href=http://directory.apache.org/subprojects/asn1/>Apache ASN.1 project</a> |
| . |
| </p> |
| |
| <p> |
| As of this writing, codecs implemented in pyasn1 are all stateless, mostly |
| to keep the code simple. |
| </p> |
| |
| <p> |
| The pyasn1 package currently supports |
| <a href=http://en.wikipedia.org/wiki/Basic_encoding_rules>BER</a> codec and |
| its variations -- |
| <a href=http://en.wikipedia.org/wiki/Canonical_encoding_rules>CER</a> and |
| <a href=http://en.wikipedia.org/wiki/Distinguished_encoding_rules>DER</a>. |
| More ASN.1 codecs are planned for implementation in the future. |
| </p> |
| |
| <a name="2.1"></a> |
| <h4> |
| 2.1 Encoders |
| </h4> |
| |
| <p> |
| Encoder is used for transforming pyasn1 value objects into substrate. Only |
| pyasn1 value objects could be serialized, attempts to process pyasn1 type |
| objects will cause encoder failure. |
| </p> |
| |
| <p> |
| The following code will create a pyasn1 Integer object and serialize it with |
| BER encoder: |
| </p> |
| |
| <table bgcolor="lightgray" border=0 width=100%><TR><TD> |
| <pre> |
| >>> from pyasn1.type import univ |
| >>> from pyasn1.codec.ber import encoder |
| >>> encoder.encode(univ.Integer(123456)) |
| b'\x02\x03\x01\xe2@' |
| >>> |
| </pre> |
| </td></tr></table> |
| |
| <p> |
| BER standard also defines a so-called <i>indefinite length</i> encoding form |
| which makes large data items processing more memory efficient. It is mostly |
| useful when encoder does not have the whole value all at once and the |
| length of the value can not be determined at the beginning of encoding. |
| </p> |
| |
| <p> |
| <i>Constructed encoding</i> is another feature of BER closely related to the |
| indefinite length form. In essence, a large scalar value (such as ASN.1 |
| character BitString type) could be chopped into smaller chunks by encoder |
| and transmitted incrementally to limit memory consumption. Unlike indefinite |
| length case, the length of the whole value must be known in advance when |
| using constructed, definite length encoding form. |
| </p> |
| |
| <p> |
| Since pyasn1 codecs are not restartable, pyasn1 encoder may only encode data |
| item all at once. However, even in this case, generating indefinite length |
| encoding may help a low-memory receiver, running a restartable decoder, |
| to process a large data item. |
| </p> |
| |
| <table bgcolor="lightgray" border=0 width=100%><TR><TD> |
| <pre> |
| >>> from pyasn1.type import univ |
| >>> from pyasn1.codec.ber import encoder |
| >>> encoder.encode( |
| ... univ.OctetString('The quick brown fox jumps over the lazy dog'), |
| ... defMode=False, |
| ... maxChunkSize=8 |
| ... ) |
| b'$\x80\x04\x08The quic\x04\x08k brown \x04\x08fox jump\x04\x08s over \ |
| t\x04\x08he lazy \x04\x03dog\x00\x00' |
| >>> |
| >>> encoder.encode( |
| ... univ.OctetString('The quick brown fox jumps over the lazy dog'), |
| ... maxChunkSize=8 |
| ... ) |
| b'$7\x04\x08The quic\x04\x08k brown \x04\x08fox jump\x04\x08s over \ |
| t\x04\x08he lazy \x04\x03dog' |
| </pre> |
| </td></tr></table> |
| |
| <p> |
| The <b>defMode</b> encoder parameter disables definite length encoding mode, |
| while the optional <b>maxChunkSize</b> parameter specifies desired |
| substrate chunk size that influences memory requirements at the decoder's end. |
| </p> |
| |
| <p> |
| To use CER or DER encoders one needs to explicitly import and call them - the |
| APIs are all compatible. |
| </p> |
| |
| <table bgcolor="lightgray" border=0 width=100%><TR><TD> |
| <pre> |
| >>> from pyasn1.type import univ |
| >>> from pyasn1.codec.ber import encoder as ber_encoder |
| >>> from pyasn1.codec.cer import encoder as cer_encoder |
| >>> from pyasn1.codec.der import encoder as der_encoder |
| >>> ber_encoder.encode(univ.Boolean(True)) |
| b'\x01\x01\x01' |
| >>> cer_encoder.encode(univ.Boolean(True)) |
| b'\x01\x01\xff' |
| >>> der_encoder.encode(univ.Boolean(True)) |
| b'\x01\x01\xff' |
| >>> |
| </pre> |
| </td></tr></table> |
| |
| <a name="2.2"></a> |
| <h4> |
| 2.2 Decoders |
| </h4> |
| |
| <p> |
| In the process of decoding, pyasn1 value objects are created and linked to |
| each other, based on the information containted in the substrate. Thus, |
| the original pyasn1 value object(s) are recovered. |
| </p> |
| |
| <table bgcolor="lightgray" border=0 width=100%><TR><TD> |
| <pre> |
| >>> from pyasn1.type import univ |
| >>> from pyasn1.codec.ber import encoder, decoder |
| >>> substrate = encoder.encode(univ.Boolean(True)) |
| >>> decoder.decode(substrate) |
| (Boolean('True(1)'), b'') |
| >>> |
| </pre> |
| </td></tr></table> |
| |
| <p> |
| Commenting on the code snippet above, pyasn1 decoder accepts substrate |
| as an argument and returns a tuple of pyasn1 value object (possibly |
| a top-level one in case of constructed object) and unprocessed part |
| of input substrate. |
| </p> |
| |
| <p> |
| All pyasn1 decoders can handle both definite and indefinite length |
| encoding modes automatically, explicit switching into one mode |
| to another is not required. |
| </p> |
| |
| <table bgcolor="lightgray" border=0 width=100%><TR><TD> |
| <pre> |
| >>> from pyasn1.type import univ |
| >>> from pyasn1.codec.ber import encoder, decoder |
| >>> substrate = encoder.encode( |
| ... univ.OctetString('The quick brown fox jumps over the lazy dog'), |
| ... defMode=False, |
| ... maxChunkSize=8 |
| ... ) |
| >>> decoder.decode(substrate) |
| (OctetString(b'The quick brown fox jumps over the lazy dog'), b'') |
| >>> |
| </pre> |
| </td></tr></table> |
| |
| <p> |
| Speaking of BER/CER/DER encoding, in many situations substrate may not contain |
| all necessary information needed for complete and accurate ASN.1 values |
| recovery. The most obvious cases include implicitly tagged ASN.1 types |
| and constrained types. |
| </p> |
| |
| <p> |
| As discussed earlier in this handbook, when an ASN.1 type is implicitly |
| tagged, previous outermost tag is lost and never appears in substrate. |
| If it is the base tag that gets lost, decoder is unable to pick type-specific |
| value decoder at its table of built-in types, and therefore recover |
| the value part, based only on the information contained in substrate. The |
| approach taken by pyasn1 decoder is to use a prototype pyasn1 type object (or |
| a set of them) to <i>guide</i> the decoding process by matching [possibly |
| incomplete] tags recovered from substrate with those found in prototype pyasn1 |
| type objects (also called pyasn1 specification object further in this paper). |
| </p> |
| |
| <table bgcolor="lightgray" border=0 width=100%><TR><TD> |
| <pre> |
| >>> from pyasn1.codec.ber import decoder |
| >>> decoder.decode(b'\x02\x01\x0c', asn1Spec=univ.Integer()) |
| Integer(12), b'' |
| >>> |
| </pre> |
| </td></tr></table> |
| |
| <p> |
| Decoder would neither modify pyasn1 specification object nor use |
| its current values (if it's a pyasn1 value object), but rather use it as |
| a hint for choosing proper decoder and as a pattern for creating new objects: |
| </p> |
| |
| <table bgcolor="lightgray" border=0 width=100%><TR><TD> |
| <pre> |
| >>> from pyasn1.type import univ, tag |
| >>> from pyasn1.codec.ber import encoder, decoder |
| >>> i = univ.Integer(12345).subtype( |
| ... implicitTag=tag.Tag(tag.tagClassContext, tag.tagFormatSimple, 40) |
| ... ) |
| >>> substrate = encoder.encode(i) |
| >>> substrate |
| b'\x9f(\x0209' |
| >>> decoder.decode(substrate) |
| Traceback (most recent call last): |
| ... |
| pyasn1.error.PyAsn1Error: |
| TagSet(Tag(tagClass=128, tagFormat=0, tagId=40)) not in asn1Spec |
| >>> decoder.decode(substrate, asn1Spec=i) |
| (Integer(12345), b'') |
| >>> |
| </pre> |
| </td></tr></table> |
| |
| <p> |
| Notice in the example above, that an attempt to run decoder without passing |
| pyasn1 specification object fails because recovered tag does not belong |
| to any of the built-in types. |
| </p> |
| |
| <p> |
| Another important feature of guided decoder operation is the use of |
| values constraints possibly present in pyasn1 specification object. |
| To explain this, we will decode a random integer object into generic Integer |
| and the constrained one. |
| </p> |
| |
| <table bgcolor="lightgray" border=0 width=100%><TR><TD> |
| <pre> |
| >>> from pyasn1.type import univ, constraint |
| >>> from pyasn1.codec.ber import encoder, decoder |
| >>> class DialDigit(univ.Integer): |
| ... subtypeSpec = constraint.ValueRangeConstraint(0,9) |
| >>> substrate = encoder.encode(univ.Integer(13)) |
| >>> decoder.decode(substrate) |
| (Integer(13), b'') |
| >>> decoder.decode(substrate, asn1Spec=DialDigit()) |
| Traceback (most recent call last): |
| ... |
| pyasn1.type.error.ValueConstraintError: |
| ValueRangeConstraint(0, 9) failed at: 13 |
| >>> |
| </pre> |
| </td></tr></table> |
| |
| <p> |
| Similarily to encoders, to use CER or DER decoders application has to |
| explicitly import and call them - all APIs are compatible. |
| </p> |
| |
| <table bgcolor="lightgray" border=0 width=100%><TR><TD> |
| <pre> |
| >>> from pyasn1.type import univ |
| >>> from pyasn1.codec.ber import encoder as ber_encoder |
| >>> substrate = ber_encoder.encode(univ.OctetString('http://pyasn1.sf.net')) |
| >>> |
| >>> from pyasn1.codec.ber import decoder as ber_decoder |
| >>> from pyasn1.codec.cer import decoder as cer_decoder |
| >>> from pyasn1.codec.der import decoder as der_decoder |
| >>> |
| >>> ber_decoder.decode(substrate) |
| (OctetString(b'http://pyasn1.sf.net'), b'') |
| >>> cer_decoder.decode(substrate) |
| (OctetString(b'http://pyasn1.sf.net'), b'') |
| >>> der_decoder.decode(substrate) |
| (OctetString(b'http://pyasn1.sf.net'), b'') |
| >>> |
| </pre> |
| </td></tr></table> |
| |
| <a name="2.2.1"></a> |
| <h4> |
| 2.2.1 Decoding untagged types |
| </h4> |
| |
| <p> |
| It has already been mentioned, that ASN.1 has two "special case" types: |
| CHOICE and ANY. They are different from other types in part of |
| tagging - unless these two are additionally tagged, neither of them will |
| have their own tag. Therefore these types become invisible in substrate |
| and can not be recovered without passing pyasn1 specification object to |
| decoder. |
| </p> |
| |
| <p> |
| To explain the issue, we will first prepare a Choice object to deal with: |
| </p> |
| |
| <table bgcolor="lightgray" border=0 width=100%><TR><TD> |
| <pre> |
| >>> from pyasn1.type import univ, namedtype |
| >>> class CodeOrMessage(univ.Choice): |
| ... componentType = namedtype.NamedTypes( |
| ... namedtype.NamedType('code', univ.Integer()), |
| ... namedtype.NamedType('message', univ.OctetString()) |
| ... ) |
| >>> |
| >>> codeOrMessage = CodeOrMessage() |
| >>> codeOrMessage.setComponentByName('message', 'my string value') |
| >>> print(codeOrMessage.prettyPrint()) |
| CodeOrMessage: |
| message=b'my string value' |
| >>> |
| </pre> |
| </td></tr></table> |
| |
| <p> |
| Let's now encode this Choice object and then decode its substrate |
| with and without pyasn1 specification object: |
| </p> |
| |
| <table bgcolor="lightgray" border=0 width=100%><TR><TD> |
| <pre> |
| >>> from pyasn1.codec.ber import encoder, decoder |
| >>> substrate = encoder.encode(codeOrMessage) |
| >>> substrate |
| b'\x04\x0fmy string value' |
| >>> encoder.encode(univ.OctetString('my string value')) |
| b'\x04\x0fmy string value' |
| >>> |
| >>> decoder.decode(substrate) |
| (OctetString(b'my string value'), b'') |
| >>> codeOrMessage, substrate = decoder.decode(substrate, asn1Spec=CodeOrMessage()) |
| >>> print(codeOrMessage.prettyPrint()) |
| CodeOrMessage: |
| message=b'my string value' |
| >>> |
| </pre> |
| </td></tr></table> |
| |
| <p> |
| First thing to notice in the listing above is that the substrate produced |
| for our Choice value object is equivalent to the substrate for an OctetString |
| object initialized to the same value. In other words, any information about |
| the Choice component is absent in encoding. |
| </p> |
| |
| <p> |
| Sure enough, that kind of substrate will decode into an OctetString object, |
| unless original Choice type object is passed to decoder to guide the decoding |
| process. |
| </p> |
| |
| <p> |
| Similarily untagged ANY type behaves differently on decoding phase - when |
| decoder bumps into an Any object in pyasn1 specification, it stops decoding |
| and puts all the substrate into a new Any value object in form of an octet |
| string. Concerned application could then re-run decoder with an additional, |
| more exact pyasn1 specification object to recover the contents of Any |
| object. |
| </p> |
| |
| <p> |
| As it was mentioned elsewhere in this paper, Any type allows for incomplete |
| or changing ASN.1 specification to be handled gracefully by decoder and |
| applications. |
| </p> |
| |
| <p> |
| To illustrate the working of Any type, we'll have to make the stage |
| by encoding a pyasn1 object and then putting its substrate into an any |
| object. |
| </p> |
| |
| <table bgcolor="lightgray" border=0 width=100%><TR><TD> |
| <pre> |
| >>> from pyasn1.type import univ |
| >>> from pyasn1.codec.ber import encoder, decoder |
| >>> innerSubstrate = encoder.encode(univ.Integer(1234)) |
| >>> innerSubstrate |
| b'\x02\x02\x04\xd2' |
| >>> any = univ.Any(innerSubstrate) |
| >>> any |
| Any(b'\x02\x02\x04\xd2') |
| >>> substrate = encoder.encode(any) |
| >>> substrate |
| b'\x02\x02\x04\xd2' |
| >>> |
| </pre> |
| </td></tr></table> |
| |
| <p> |
| As with Choice type encoding, there is no traces of Any type in substrate. |
| Obviously, the substrate we are dealing with, will decode into the inner |
| [Integer] component, unless pyasn1 specification is given to guide the |
| decoder. Continuing previous code: |
| </p> |
| |
| <table bgcolor="lightgray" border=0 width=100%><TR><TD> |
| <pre> |
| >>> from pyasn1.type import univ |
| >>> from pyasn1.codec.ber import encoder, decoder |
| |
| >>> decoder.decode(substrate) |
| (Integer(1234), b'') |
| >>> any, substrate = decoder.decode(substrate, asn1Spec=univ.Any()) |
| >>> any |
| Any(b'\x02\x02\x04\xd2') |
| >>> decoder.decode(str(any)) |
| (Integer(1234), b'') |
| >>> |
| </pre> |
| </td></tr></table> |
| |
| <p> |
| Both CHOICE and ANY types are widely used in practice. Reader is welcome to |
| take a look at |
| <a href=http://www.cs.auckland.ac.nz/~pgut001/pubs/x509guide.txt> |
| ASN.1 specifications of X.509 applications</a> for more information. |
| </p> |
| |
| <a name="2.2.2"></a> |
| <h4> |
| 2.2.2 Ignoring unknown types |
| </h4> |
| |
| <p> |
| When dealing with a loosely specified ASN.1 structure, the receiving |
| end may not be aware of some types present in the substrate. It may be |
| convenient then to turn decoder into a recovery mode. Whilst there, decoder |
| will not bail out when hit an unknown tag but rather treat it as an Any |
| type. |
| </p> |
| |
| <table bgcolor="lightgray" border=0 width=100%><TR><TD> |
| <pre> |
| >>> from pyasn1.type import univ, tag |
| >>> from pyasn1.codec.ber import encoder, decoder |
| >>> taggedInt = univ.Integer(12345).subtype( |
| ... implicitTag=tag.Tag(tag.tagClassContext, tag.tagFormatSimple, 40) |
| ... ) |
| >>> substrate = encoder.encode(taggedInt) |
| >>> decoder.decode(substrate) |
| Traceback (most recent call last): |
| ... |
| pyasn1.error.PyAsn1Error: TagSet(Tag(tagClass=128, tagFormat=0, tagId=40)) not in asn1Spec |
| >>> |
| >>> decoder.decode.defaultErrorState = decoder.stDumpRawValue |
| >>> decoder.decode(substrate) |
| (Any(b'\x9f(\x0209'), '') |
| >>> |
| </pre> |
| </td></tr></table> |
| |
| <p> |
| It's also possible to configure a custom decoder, to handle unknown tags |
| found in substrate. This can be done by means of <b>defaultRawDecoder</b> |
| attribute holding a reference to type decoder object. Refer to the source |
| for API details. |
| </p> |
| |
| <hr> |
| |
| </td> |
| </tr> |
| </table> |
| </center> |
| </body> |
| </html> |