`struct` — 패킹 된 바이너리 데이터로 바이트열을 해석¶

소스 코드: Lib/struct.py

This module converts between Python values and C structs represented as Python bytes objects. Compact format strings describe the intended conversions to/from Python values. The module’s functions and objects can be used for two largely distinct applications, data exchange with external sources (files or network connections), or data transfer between the Python application and the C layer.

참고

When no prefix character is given, native mode is the default. It packs or unpacks data based on the platform and compiler on which the Python interpreter was built. The result of packing a given C struct includes pad bytes which maintain proper alignment for the C types involved; similarly, alignment is taken into account when unpacking. In contrast, when communicating data between external sources, the programmer is responsible for defining byte ordering and padding between elements. See 바이트 순서, 크기 및 정렬 for details.

Several struct functions (and methods of Struct) take a buffer argument. This refers to objects that implement the 버퍼 프로토콜 and provide either a readable or read-writable buffer. The most common types used for that purpose are bytes and bytearray, but many other types that can be viewed as an array of bytes implement the buffer protocol, so that they can be read/filled without additional copying from a bytes object.

함수와 예외¶

이 모듈은 다음과 같은 예외와 함수를 정의합니다:

exception struct.error¶: 여러 상황에서 발생하는 예외; 인자는 무엇이 잘못되었는지 설명하는 문자열입니다.

struct.pack(format, v1, v2, ...)¶: v1, v2, … 값을 포함하고 포맷 문자열 format에 따라 패킹 된 바이트열 객체를 반환합니다. 인자는 포맷이 요구하는 값과 정확히 일치해야 합니다.

struct.pack_into(format, buffer, offset, v1, v2, ...)¶: Pack the values v1, v2, … according to the format string format and write the packed bytes into the writable buffer buffer starting at position offset. Note that offset is a required argument. A negative offset counts from the end of buffer.

struct.unpack(format, buffer)¶: 포맷 문자열 format에 따라 버퍼 buffer(아마도 pack(format, ...)으로 패킹 된)에서 언 패킹 합니다. 정확히 하나의 항목을 포함하더라도 결과는 튜플입니다. 바이트 단위의 버퍼 크기는 (calcsize()에 의해 반영되는) 포맷이 요구하는 크기와 일치해야 합니다.

struct.unpack_from(format, /, buffer, offset=0)¶: Unpack from buffer starting at position offset, according to the format string format. The result is a tuple even if it contains exactly one item. The buffer’s size in bytes, starting at position offset, must be at least the size required by the format, as reflected by calcsize(). A negative offset counts from the end of buffer.

struct.iter_unpack(format, buffer)¶

포맷 문자열 format에 따라 버퍼 buffer에서 이터레이션을 통해 언 패킹 합니다. 이 함수는 모든 내용이 소비될 때까지 버퍼에서 같은 크기의 청크를 읽는 이터레이터를 반환합니다. 바이트 단위의 버퍼 크기는 (calcsize()에 의해 반영되는) 포맷이 요구하는 크기의 배수여야 합니다.

각 이터레이션은 포맷 문자열에 지정된 대로 튜플을 산출합니다.

Added in version 3.4.

struct.calcsize(format)¶: 포맷 문자열 format에 해당하는 구조체(pack(format, ...)에 의해 생성되는 바이트열 객체)의 크기를 반환합니다.

포맷 문자열¶

포맷 문자열은 데이터를 패킹과 언 패킹할 때 데이터 배치를 설명합니다. 이들은 패킹/언 패킹 될 데이터형을 지정하는 포맷 문자로 구축됩니다. 또한, 특수 문자가 바이트 순서, 크기 및 정렬을 제어합니다. 각 포맷 문자열은 데이터의 전체 속성을 설명하는 선택적 접두사 문자와 실제 데이터 값과 패딩을 설명하는 하나 이상의 포맷 문자로 구성됩니다.

바이트 순서, 크기 및 정렬¶

기본적으로, C형은 기계의 네이티브 형식과 바이트 순서로 표현되며, 필요하면 (C 컴파일러에서 사용하는 규칙에 따라) 패드 바이트로 건너뛰어 적절하게 정렬됩니다. 이 동작은 패킹된 구조체의 바이트가 해당 C 구조체의 메모리 배치와 정확히 일치하도록 선택되었습니다. 네이티브 바이트 순서와 패딩을 사용할지 아니면 표준 포맷을 사용할지는 응용 프로그램에 달려 있습니다.

또는, 다음 표에 따라, 포맷 문자열의 첫 번째 문자를 사용하여 패킹 된 데이터의 바이트 순서, 크기 및 정렬을 표시할 수 있습니다:

문자	바이트 순서	크기	정렬
`@`	네이티브	네이티브	네이티브
`=`	네이티브	표준	none
`<`	리틀 엔디안	표준	none
`>`	빅 엔디안	표준	none
`!`	네트워크 (= 빅 엔디안)	표준	none

첫 번째 문자가 이들 중 하나가 아니면, '@'로 가정합니다.

참고

The number 1023 (0x3ff in hexadecimal) has the following byte representations:

03 ff in big-endian (>)
ff 03 in little-endian (<)

파이썬 예제:

>>> import struct
>>> struct.pack('>h', 1023)
b'\x03\xff'
>>> struct.pack('<h', 1023)
b'\xff\x03'

네이티브 바이트 순서는 호스트 시스템에 따라 빅 엔디안이나 리틀 엔디안입니다. 예를 들어, 인텔 x86, AMD64 (x86-64) 및 애플 M1은 리틀 엔디안입니다; IBM z 와 많은 레거시 아키텍처는 빅 엔디안입니다. 시스템의 엔디안을 확인하려면 sys.byteorder를 사용하십시오.

네이티브 크기와 정렬은 C 컴파일러의 sizeof 표현식을 사용하여 결정됩니다. 이것은 항상 네이티브 바이트 순서와 결합합니다.

표준 크기는 포맷 문자에만 의존합니다; 포맷 문자 섹션의 표를 참조하십시오.

'@'과 '='의 차이점에 유의하십시오; 둘 다 네이티브 바이트 순서를 사용하지만, 후자는 크기와 정렬이 표준화됩니다.

'!' 형식은 IETF RFC 1700에 정의된 대로 항상 빅 엔디안인 네트워크 바이트 순서를 나타냅니다.

네이티브가 아닌 바이트 순서(강제 바이트 스와핑)를 표시하는 방법은 없습니다; '<'나 '>'를 적절히 선택하십시오.

노트:

패딩은 연속되는 구조체 멤버 간에만 자동으로 추가됩니다. 인코딩된 구조체의 시작이나 끝에는 패딩이 추가되지 않습니다.
네이티브가 아닌 크기와 정렬을 사용할 때는 패딩이 추가되지 않습니다, 예를 들어 ‘<’, ‘>’, ‘=’ 및 ‘!’ 에서.
구조체의 끝을 특정 형의 정렬 요구 사항에 맞추려면, 반복 횟수가 0인 해당 형의 코드로 포맷을 끝내십시오. 예를 참조하십시오.

포맷 문자¶

포맷 문자는 다음과 같은 의미가 있습니다; C와 파이썬 값 사이의 변환은 형을 주면 분명해야 합니다. ‘표준 크기’ 열은 표준 크기를 사용할 때 패킹 된 값의 크기를 바이트 단위로 나타냅니다; 즉, 포맷 문자열이 '<', '>', '!' 또는 '=' 중 하나로 시작하는 경우입니다. 네이티브 크기를 사용할 때, 패킹 된 값의 크기는 플랫폼에 따라 다릅니다.

포맷	C형	파이썬 형	표준 크기	노트
`x`	패드 바이트	값이 없습니다		(7)
`c`	char	길이가 1인 bytes	1
`b`	signed char	int	1	(2)
`B`	unsigned char	int	1	(2)
`?`	_Bool	bool	1	(1)
`h`	short	int	2	(2)
`H`	unsigned short	int	2	(2)
`i`	int	int	4	(2)
`I`	unsigned int	int	4	(2)
`l`	long	int	4	(2)
`L`	unsigned long	int	4	(2)
`q`	long long	int	8	(2)
`Q`	unsigned long long	int	8	(2)
`n`	`ssize_t`	int		(2), (3)
`N`	`size_t`	int		(2), (3)
`e`	_Float16	float	2	(4), (6)
`f`	float	float	4	(4)
`d`	double	float	8	(4)
`F`	float complex	complex	8	(10)
`D`	double complex	complex	16	(10)
`s`	char[]	bytes		(9)
`p`	char[]	bytes		(8)
`P`	void*	int		(2), (5)

버전 3.3에서 변경: 'n'과 'N' 포맷에 대한 지원이 추가되었습니다.

버전 3.6에서 변경: 'e' 포맷에 대한 지원이 추가되었습니다.

버전 3.14에서 변경: Added support for the 'F' and 'D' formats.

The array and ctypes modules, as well as third-party modules like numpy, use similar – but slightly different – type codes.

노트:

'?' 변환 코드는 C99 이후의 C 표준이 정의한 _Bool 형에 해당합니다. 표준 모드에서는, 1바이트로 표현됩니다.
정수 변환 코드 중 하나를 사용하여 정수가 아닌 값을 패킹하려고 할 때, 정수가 아닌 값에 __index__() 메서드가 있으면 패킹 전에 해당 메서드가 호출되어 인자를 정수로 변환합니다.

버전 3.2에서 변경: 정수가 아닌 값에서 __index__() 메서드를 사용하는 것을 추가했습니다.
'n'과 'N' 변환 코드는 (기본값이나 '@' 바이트 순서 문자로 선택된) 네이티브 크기에만 사용할 수 있습니다. 표준 크기의 경우, 응용 프로그램에 맞는 다른 정수 포맷을 사용할 수 있습니다.
'f', 'd' 및 'e' 변환 코드의 경우, 패킹 된 표현은 플랫폼에서 사용하는 부동 소수점 형식과 관계없이 IEEE 754 binary32, binary64 또는 binary16 형식을 사용합니다 (각각 'f', 'd' 또는 'e').
'P' 포맷 문자는 (기본값이나 '@' 바이트 순서 문자로 선택된) 네이티브 바이트 순서에만 사용할 수 있습니다. 바이트 순서 문자 '='는 호스트 시스템에 따라 리틀이나 빅 엔디안 순서를 사용하도록 선택합니다. struct 모듈은 이를 네이티브 순서로 해석하지 않아서, 'P' 형식을 사용할 수 없습니다.
The IEEE 754 binary16 “half precision” type was introduced in the 2008 revision of the IEEE 754 standard. It has a sign bit, a 5-bit exponent and 11-bit precision (with 10 bits explicitly stored), and can represent numbers between approximately 6.1e-05 and 6.5e+04 at full precision. This type is not widely supported by C compilers: it’s available as _Float16 type, if the compiler supports the Annex H of the C23 standard. On a typical machine, an unsigned short can be used for storage, but not for math operations. See the Wikipedia page on the half-precision floating-point format for more information.
When packing, 'x' inserts one NUL byte.
The 'p' format character encodes a “Pascal string”, meaning a short variable-length string stored in a fixed number of bytes, given by the count. The first byte stored is the length of the string, or 255, whichever is smaller. The bytes of the string follow. If the byte string passed in to pack() is too long (longer than the count minus 1), only the leading count-1 bytes of the string are stored. If the byte string is shorter than count-1, it is padded with null bytes so that exactly count bytes in all are used. Note that for unpack(), the 'p' format character consumes count bytes, but that the bytes object returned can never contain more than 255 bytes. When packing, arguments of types bytes and bytearray are accepted.
For the 's' format character, the count is interpreted as the length of the byte string, not a repeat count like for the other format characters; for example, '10s' means a single 10-byte string mapping to or from a single Python byte string, while '10c' means 10 separate one byte character elements (e.g., cccccccccc) mapping to or from ten different Python byte objects. (See 예 for a concrete demonstration of the difference.) If a count is not given, it defaults to 1. For packing, the byte string is truncated or padded with null bytes as appropriate to make it fit. For unpacking, the resulting bytes object always has exactly the specified number of bytes. As a special case, '0s' means a single, empty byte string (while '0c' means 0 characters). When packing, arguments of types bytes and bytearray are accepted.
For the 'F' and 'D' format characters, the packed representation uses the IEEE 754 binary32 and binary64 format for components of the complex number, regardless of the floating-point format used by the platform. Note that complex types (F and D) are available unconditionally, despite complex types being an optional feature in C. As specified in the C11 standard, each complex type is represented by a two-element C array containing, respectively, the real and imaginary parts.

포맷 문자 앞에는 정수 반복 횟수가 올 수 있습니다. 예를 들어, 포맷 문자열 '4h'는 'hhhh'와 정확히 같습니다.

포맷 사이의 공백 문자는 무시됩니다; 횟수와 형식 사이에는 공백이 없어야 합니다.

정수 형식 ('b', 'B', 'h', 'H', 'i', 'I', 'l', 'L', 'q', 'Q') 중 하나를 사용하여 값 x를 패킹할 때, x가 해당 포맷의 유효한 범위를 벗어나면 struct.error가 발생합니다.

버전 3.1에서 변경: 이전에는, 일부 정수 포맷은 범위를 벗어난 값을 래핑하고 struct.error 대신 DeprecationWarning을 발생시켰습니다.

'?' 포맷 문자의 경우, 반환 값은 True나 False입니다. 패킹할 때, 인자 객체의 논리값이 사용됩니다. 네이티브나 표준 bool 표현에서 0이나 1이 패킹 되고, 언 패킹할 때 모든 0이 아닌 값은 True가 됩니다.

예¶

참고

Native byte order examples (designated by the '@' format prefix or lack of any prefix character) may not match what the reader’s machine produces as that depends on the platform and compiler.

Pack and unpack integers of three different sizes, using big endian ordering:

>>> from struct import *
>>> pack(">bhl", 1, 2, 3)
b'\x01\x00\x02\x00\x00\x00\x03'
>>> unpack('>bhl', b'\x01\x00\x02\x00\x00\x00\x03')
(1, 2, 3)
>>> calcsize('>bhl')
7

Attempt to pack an integer which is too large for the defined field:

>>> pack(">h", 99999)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
struct.error: 'h' format requires -32768 <= number <= 32767

Demonstrate the difference between 's' and 'c' format characters:

>>> pack("@ccc", b'1', b'2', b'3')
b'123'
>>> pack("@3s", b'123')
b'123'

언 패킹 된 필드는 변수에 대입하거나 결과를 네임드 튜플로 감싸서 이름을 붙일 수 있습니다:

>>> record = b'raymond   \x32\x12\x08\x01\x08'
>>> name, serialnum, school, gradelevel = unpack('<10sHHb', record)

>>> from collections import namedtuple
>>> Student = namedtuple('Student', 'name serialnum school gradelevel')
>>> Student._make(unpack('<10sHHb', record))
Student(name=b'raymond   ', serialnum=4658, school=264, gradelevel=8)

The ordering of format characters may have an impact on size in native mode since padding is implicit. In standard mode, the user is responsible for inserting any desired padding. Note in the first pack call below that three NUL bytes were added after the packed '#' to align the following integer on a four-byte boundary. In this example, the output was produced on a little endian machine:

>>> pack('@ci', b'#', 0x12131415)
b'#\x00\x00\x00\x15\x14\x13\x12'
>>> pack('@ic', 0x12131415, b'#')
b'\x15\x14\x13\x12#'
>>> calcsize('@ci')
8
>>> calcsize('@ic')
5

다음 포맷 'llh0l'는 플랫폼의 long이 4바이트 경계에 정렬된다고 가정할 때 끝에 2개의 패드 바이트를 추가합니다:

>>> pack('@llh0l', 1, 2, 3)
b'\x00\x00\x00\x01\x00\x00\x00\x02\x00\x03\x00\x00'

모듈 array: 동종 데이터의 패킹 된 바이너리 저장소.
모듈 json: JSON 인코더와 디코더.
모듈 pickle: 파이썬 객체 직렬화.

응용¶

Two main applications for the struct module exist, data interchange between Python and C code within an application or another application compiled using the same compiler (native formats), and data interchange between applications using agreed upon data layout (standard formats). Generally speaking, the format strings constructed for these two domains are distinct.

네이티브 포맷¶

When constructing format strings which mimic native layouts, the compiler and machine architecture determine byte ordering and padding. In such cases, the @ format character should be used to specify native byte ordering and data sizes. Internal pad bytes are normally inserted automatically. It is possible that a zero-repeat format code will be needed at the end of a format string to round up to the correct byte boundary for proper alignment of consecutive chunks of data.

Consider these two simple examples (on a 64-bit, little-endian machine):

>>> calcsize('@lhl')
24
>>> calcsize('@llh')
18

Data is not padded to an 8-byte boundary at the end of the second format string without the use of extra padding. A zero-repeat format code solves that problem:

>>> calcsize('@llh0l')
24

The 'x' format code can be used to specify the repeat, but for native formats it is better to use a zero-repeat format like '0l'.

By default, native byte ordering and alignment is used, but it is better to be explicit and use the '@' prefix character.

표준 포맷¶

When exchanging data beyond your process such as networking or storage, be precise. Specify the exact byte order, size, and alignment. Do not assume they match the native order of a particular machine. For example, network byte order is big-endian, while many popular CPUs are little-endian. By defining this explicitly, the user need not care about the specifics of the platform their code is running on. The first character should typically be < or > (or !). Padding is the responsibility of the programmer. The zero-repeat format character won’t work. Instead, the user must explicitly add 'x' pad bytes where needed. Revisiting the examples from the previous section, we have:

>>> calcsize('<qh6xq')
24
>>> pack('<qh6xq', 1, 2, 3) == pack('@lhl', 1, 2, 3)
True
>>> calcsize('@llh')
18
>>> pack('@llh', 1, 2, 3) == pack('<qqh', 1, 2, 3)
True
>>> calcsize('<qqh6x')
24
>>> calcsize('@llh0l')
24
>>> pack('@llh0l', 1, 2, 3) == pack('<qqh6x', 1, 2, 3)
True

The above results (executed on a 64-bit machine) aren’t guaranteed to match when executed on different machines. For example, the examples below were executed on a 32-bit machine:

>>> calcsize('<qqh6x')
24
>>> calcsize('@llh0l')
12
>>> pack('@llh0l', 1, 2, 3) == pack('<qqh6x', 1, 2, 3)
False

클래스¶

The struct module also defines the following type:

class struct.Struct(format)¶

포맷 문자열 format에 따라 바이너리 데이터를 쓰고 읽는 새 Struct 객체를 반환합니다. Struct 객체를 한 번 만들고 메서드를 호출하는 것은 포맷 문자열이 한 번만 컴파일 되기 때문에 같은 포맷으로 모듈 수준 함수를 호출하는 것보다 효율적입니다.

참고

모듈 수준 함수에 전달된 최신 포맷 문자열의 컴파일된 버전이 캐시 되므로, 몇 가지 포맷 문자열만 사용하는 프로그램은 단일 Struct 인스턴스 재사용에 대해 신경 쓸 필요가 없습니다.

컴파일된 Struct 객체는 다음 메서드와 어트리뷰트를 지원합니다:

pack(v1, v2, ...)¶: pack() 함수와 동일하고, 컴파일된 포맷을 사용합니다. (len(result)는 size와 같게 됩니다.)

pack_into(buffer, offset, v1, v2, ...)¶: pack_into() 함수와 동일하고, 컴파일된 포맷을 사용합니다.

unpack(buffer)¶: unpack() 함수와 동일하고, 컴파일된 포맷을 사용합니다. 바이트 단위의 버퍼 크기는 size와 같아야 합니다.

unpack_from(buffer, offset=0)¶: unpack_from() 함수와 동일하고, 컴파일된 포맷을 사용합니다. offset 위치에서 시작하는 바이트 단위의 버퍼 크기는 size 이상이어야 합니다.

iter_unpack(buffer)¶: iter_unpack() 함수와 동일하고, 컴파일된 포맷을 사용합니다. 바이트 단위의 버퍼 크기는 size의 배수이어야 합니다.

Added in version 3.4.

format¶: 이 Struct 객체를 구성하는 데 사용된 포맷 문자열.

버전 3.7에서 변경: 포맷 문자열형은 이제 bytes 대신 str입니다.

size¶: format에 해당하는 구조체(pack() 메서드에 의해 생성된 바이트열 객체)의 계산된 크기.

버전 3.13에서 변경: The repr() of structs has changed. It is now:

>>> Struct('i')
Struct('i')

`struct` — 패킹 된 바이너리 데이터로 바이트열을 해석¶

함수와 예외¶

포맷 문자열¶

바이트 순서, 크기 및 정렬¶

포맷 문자¶

예¶

응용¶

네이티브 포맷¶

표준 포맷¶

클래스¶

목차

이전 항목

다음 항목

This page

struct — 패킹 된 바이너리 데이터로 바이트열을 해석¶

함수와 예외¶

포맷 문자열¶

바이트 순서, 크기 및 정렬¶

포맷 문자¶

예¶

응용¶

네이티브 포맷¶

표준 포맷¶

클래스¶

`struct` — 패킹 된 바이너리 데이터로 바이트열을 해석¶