Movatterモバイル変換

[0]ホーム

UNIX System Home The Single UNIX Specification UNIX 03 UNIX V7 Mailing Lists White Papers

64-bit and Data Size Neutrality

The Single UNIX Specification Version 4

The Single UNIX Specification Version 3

Data Size Neutrality and 64-bit Support

The Single UNIX Specification, Version 2 provides enhanced support for 64-bit programming modelsby being n-bit clean and data size neutral.This article is a brief introduction to 64-bit programming models,data size neutrality, and application porting issues.

Introduction

When the UNIX operating system was first created in 1969 it wasdeveloped to run on a 16-bit computer architecture.The C language of the time supported 16-bit integer and pointer datatypes and also supported a 32-bit integer data type that could beemulated on hardware which did not support 32-bit arithmetic operations.

When 32-bit computer architectures, which supported 32-bitinteger arithmetic operators and 32-bit pointers, were introducedin the late 1970s, the UNIX operating system was quickly ported to this newclass of hardware platforms.The C language data model developed to support these 32-bit architecturesquickly evolved to consist of a 16-bit short-integer type,a 32-bit integer type, and a 32-bit pointer.During the 1980s, this was the predominant C data model availablefor 32-bit UNIX platforms.

To describe these two data models in modern terms, the 16-bit UNIXplatforms used an IP16 data model, while 32-bit UNIX platforms use theILP32 programming model.The notation describes the width assigned to the basic data types;for example, ILP32 denotes thatint(I),long(L), andpointer(P) types are all 32-bit entities.This notation is used extensively throughout this article.

The first UNIX standardization effort was begun in 1983 bya /usr/group committee.This work was merged into the work program of the IEEE POSIX committeein 1985.By 1988, both POSIX and X/Open committees had developed detailedstandards and specifications that were based upon the predominantUNIX implementations of the time.These committees endeavored to develop architecture-neutral definitionsthat could be implemented on any hardware architecture.

The transition from 16-bit to 32-bit processor architectures happenedquite rapidly just before the UNIX standardization work was begun.Since the specifications were based on existing practice and thepredominant data model did not change during this gestation period,some dependencies upon the ILP32 data model were inadvertentlyincorporated into the final specifications.

Most of today's 32-bit UNIX platforms use the ILP32 data model.However another data model, the LP32 model, is also very popularfor other operating systems.The majority of C-language programs written for Microsoft Windows 3.1are written for the Win-16 API which uses the LP32 data model.The Apple Macintosh also uses the LP32 data model.

32-bit platforms have a number of limitations which are increasinglya source of frustration to developers of large applications, such asdatabases, who wish to take advantage of advances in computer hardware.There is much discussion today in the computer industry about thebarrier presented by 32-bit addresses.32-bit pointers can only address 4GB of virtual address space.There are ways of overcoming this limitation, but application developmentis more complicated and performance is significantly reduced.Until recently the size of a data file could not exceed 4GB.However, the 4GB file size limitation was overcome by the LargeFile Summit extensions which are included in XSH, Issue 5.

Disk storage has been improving in real density at the rate of70% compounded annually, and drives of 8GB and larger are readily available.Memory prices have not dropped as sharply, but 64MB chips arereadily available.CPU processing power continues to increase by about 50% every 18 months,providing the power to process ever larger quantities of data.This conjunction of technological forces, along with the continueddemand for systems capable of supporting ever larger databases andsimulation models and full-motion video, have generated requirements forsupport of larger addressing structures.

A number of 64-bit processors are now available, and the transitionfrom 32-bit to 64-bit architectures is rapidly occurring amongstall the major hardware vendors.64-bit UNIX platforms do not suffer from the file size or flataddress space limitations of 32-bit platforms.Applications can access files that occupy terabytes of disk spacebecause 64-bit file offsets are possible.Similarly, applications can now theoretically access terabytes ofmemory because pointers can be 64 bits.More physical memory results in faster operations.The performance of memory-mapped file access, caching, and swapping,is greatly improved.64-bit virtual addresses simplify the design of large applications.All the major database vendors now support 64-bit platforms becauseof dramatically improved performance for very large databaseapplications available on very large memory (VLM) systems.

The world is currently dominated by 32-bit computers, a situation thatis likely to continue to exist for the near future.These computers run 16 or 32-bit applications or some mixture of the two.Meanwhile, 64-bit computers will run 32-bit code, 64-bit code, ormixtures of the two (and perhaps even some 16-bit code).New 64-bit applications and operating systems must integrate smoothlyinto this environment.Key issues facing the computing industry are the interchange ofdata between 64 and 32-bit systems (in some cases on the same system)and the cost of maintaining software in both environments.Such interchange is especially needed for large application suitessuch as database systems, where one may want to distributemost of the applications as 32-bit binaries that run across a largeinstalled base, but be able to choose 64-bits for a few crucial applications.

64-bit Data Models

Prior to the introduction of 64-bit platforms, it was generallybelieved that the introduction of 64-bit UNIX operating systems wouldnaturally use the ILP64 data model.However, this view was too simplistic and overlooked optimizationsthat could be obtained by choosing a different data model.

Unfortunately, the C programming language does not provide a mechanismfor adding new fundamental data types.Thus, providing 64-bit addressing and integer arithmetic capabilitiesinvolves changing the bindings or mappings of the existing data typesor adding new data types to the language.

ISO/IEC 9899:1990, Programming Languages - C (ISO C) left the definition of theshort int,theint,thelong int,and thepointerdeliberately vague to avoid artificially constraining hardwarearchitectures that might benefit from defining these data typesindependent from the other.The only constraints were thatintsmust be no smaller thanshorts,andlongsmust be no smaller thanints,andsize_tmust represent the largest unsigned type supported by an implementation.It is possible, for instance, to define ashortas 16 bits, anintas 32 bits, alongas 64 bits and apointeras 128 bits.The relationship between the fundamental data types can be expressed as:

sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long) = sizeof(size_t)

Ignoring non-standard types, all three of the following64-bit pointer data models satisfy the above relationship:

LP64 (also known as 4/8/8)
ILP64 (also known as 8/8/8)
LLP64 (also known as 4/4/8).

The differences between the three models lies in the non-pointerdata types.The table below details the data types for the above three data modelsand includes LP32 and ILP32 for comparison purposes.

Data Type	LP32	ILP32	ILP64	LLP64	LP64
char	8	8	8	8	8
short	16	16	16	16	16
int32			32
int	16	32	64	32	32
long	32	32	64	32	64
long long (int64)				64
pointer	32	32	64	64	64

When the width of one or more of the C data types is changed,applications may be affected in various ways.These effects fall into two main categories:

Data objects, such as a structure, defined with one of the 64-bitdata types will be different in size from those declared inan identical way on a 16 or 32-bit system.
Common assumptions about the relationships between the fundamentaldata types may no longer be valid in a 64-bit data model.Applications which depend on those relationships often cease to workproperly when compiled on a 64-bit platform.A typical assumption made by many application developers is that:
sizeof(int) = sizeof(long) = sizeof(pointer)
This relationship is not codified in any C programming standard, but itis valid for the ILP32 data model.However, it is not valid for two of the three 64-bit data modelsdescribed above, nor is it valid for the LP32 data model.

The ILP64 data model attempts to maintain the relationship betweenint,long,andpointerwhich exists in the ILP32 model by making all three types the same size.Assignment of a pointer to anintor alongwill not result in data loss.

The downside of this model is that it depends on the addition of anew 32-bit data type such asint32to handle true 32-bit quantities.There is thus a potential for conflict with existingtypedefsin applications.An application which was developed on an ILP32 platform, andsubsequently ported to an ILP64 platform, may be forced to makefrequent use of theint32data type to preserve the size and alignment of data because ofinteroperability requirements or binary compatibility withexisting data files.

The LLP64 data model preserves the relationship betweenintandlongby leaving both as 32-bit types.Data objects, such as structures, which do not contain pointers willbe the same size as on a 32-bit system.This model is sometimes described as a 32-bit model with 64-bit addresses.Most of the run-time problems associated with the assumptions betweenthe sizes of the data types are related to the assumption that apointer will fit in anint.To solve this class of problems,intorlongvariables which should be 64-bits in length are changed tolong long(orint64),a non-standard data type.This data model is thus again dependent on the introduction of a newdata type.Again there is potential for conflict with existingtypedefsin applications.

The LP64 data model takes the middle road.8, 16, and 32-bit scalar types (char,short,andint)are provided to support objects that must maintain size andalignment with 32-bit systems.A 64-bit type,long,is provided to support the full arithmetic capabilities, and isavailable to use in conjunction with pointer arithmetic.Applications that assign addresses to scalar objects need to specifythe object aslonginstead ofint.

In the LP64 data model, data types are natural.Each scalar type is larger than the preceding type.No new data types are required.As a language design issue, the purpose of havinglongin the language anticipates cases where there is an integral typelonger thanint.The fact thatintandlongrepresent different width data types is a natural and common senseapproach, and is the standard in the PC world whereintis 16-bits andlongis 32-bits.

A major test for any C data model is its ability tosupport the very large existing UNIX applications code base.The investment in code, experience, and data surrounding theseapplications is the largest determiner of the rate at which newtechnology is adopted and spread.In addition, it must be easy for an application developer tobuild code which can be used in both existing and new environments.

The UNIX development community is driven technically by a setof API agreements embodied in standards and specifications documentsfrom groups such as X/Open, IEEE, ANSI, and ISO.These documents were developed over many years to codify existingpractice and define agreement on new capabilities.As a result these specifications are of major value to the systemdevelopers, application developers, and end-users.There are numerous test suites which verify that implementations correctlyembody the details of a specification and certify that fact tointerested parties.Any 64-bit data model cannot invalidate large portions of thesespecifications and expect to achieve wide adoption.

A number of vendors have extensive experience with the LP64 data model.By far, the largest body of existing 32-bit code already modifiedfor 64-bit environments runs on LP64 platforms.Experience has shown that it is relatively easy to modify existingcode so that it can be compiled on either an 32-bit or 64-bit platform.Interoperability with existing ILP32 platforms is well proven andis not an issue.At least one LP64-based operating system (Digital UNIX V4.0)has met and passed the majority of existing verification suites andhas obtained the UNIX 95 brand.

A small number of ILP64-based platforms have also shipped.These have demonstrated that it is feasible to complete theimplementation of an ILP64 environment.However, as of early 1997, no LLP64 or ILP64-based systems haveachieved the same level of standards conformance or met therequirements of the UNIX 95 brand.

Although the number of applications written in C requiring a largevirtual address space is growing rapidly, there has not beena requirement to date for a 64-bitintdata type.The majority of existing 64-bit applications previously ran onlyon 32-bit platforms, and had no expectation of a greater range for theintdata type.The extra 32 bits of data space in a 64-bitintwould appear to be wasted.Any future applications which require a larger scalar data type canuse thelongtype.

Nearly all applications moving from a 32-bit platform require someminor modifications to handle 64-bit pointers, especially whereerroneous assumptions about the relative size ofintandpointerdata types were made.Common assumptions about the relative sizes ofint,char,short,andfloatdata types generally do not cause problems on LP64 platforms(since the sizes of those data types are identical to those on anILP32 platform), but do so on an ILP64 platform.

Other language implementations will continue to support a 32-bitinttype.For example, the FORTRAN-77 standard requires that the type INTEGERbe the same size as REAL, which is half the size of DOUBLE PRECISION.This, together with customer expectations, means that FORTRAN-77implementations will generally leave INTEGER as a 32-bit type,even on 64-bit platforms.A significant number of applications use C and FORTRAN together,either calling each other or sharing data files.Such applications have been amongst the first to move to 64-bitenvironments.Experience has shown that it is usually easier to modify the datasizes and types on the C side than the FORTRAN side of such applications.These applications will continue to require a 32-bitintdata type in C regardless of the size of theintdata type.

In 1995, a number of major UNIX vendors agreed to standardize on theLP64 data model for a number of reasons:

Experience suggests that neither the LP64 nor the ILP64 data modelsprovide a painless porting path from a 32-bit platform, but thatall other things being equal, the smaller data types in the LP64 datamodel enable better application performance.
A crucial investment for end-users is the existing data built upover decades in their computer systems.Any proposed solution must make it easy to utilize such data ona continuing basis.Unfortunately, the ILP64 data model does not provide a natural wayto describe 32-bit data types, and must resort to non-portableconstructs such asint32to describe such types.This is likely to cause practical problems in producing code whichcan run on both 32 and 64-bit platforms without numerous#ifdefconstructions.It has been possible to port large quantities of code to LP64 platformswithout the need to make such changes, while maintaining the investmentmade in data sets, even in cases where the typing information wasnot made externally visible by the application.
Mostintsin existing applications can remain as 32 bits in a64-bit environment; only a small number are expected to be the same size aspointerorlong.Under the ILP64 data model, mostintswill need to change toint32.However,int32does not behave like a 32-bitint.Instead,int32is likeshortin that all operations have to be converted toint(64-bits, sign extended) and performed in 64-bit arithmetic.Thus,int32in the ILP64 data model is not exactly the same asintin the ILP32 data model.These differences may cause subtle and hard-to-find bugs.
Instruction cycle penalties are incurred whenever additional cyclesare required to properly implement the semantics of the intended data model.For example, in the LP64 data model it is only necessary to performsign extension onintwhen you have a mixed expression includinglongs.However, most integral expressions do not includelongsand compilers can be made smart enough to only sign extend when necessary.
intis by far the most frequent data type to be found (staticallyand sometimes dynamically) within C and C++ programs.64-bit integers require twice as much space as 32-bit integers.Applications using 64-bit integers consume additional memory andCPU cycles transporting that memory throughout the system.Furthermore, the latency penalty of 64-bit integers can be enormous,especially to disk, where it can exceed 1,000,000 CPU cycles(3 nsec to 3 msec).The memory size penalty for unneeded 64-bit integers could thereforebe very high for some applications.
Portability, especially for combined FORTRAN and C applications,is enhanced by the LP64 data model, and the most common types ofproblems that can occur are susceptible to automatic detection.
Interoperability is improved by the ability to use a standarddata type to declare data structures that can be used in both32-bit and 64-bit environments.
Standards conformance has been demonstrated both in the practicalsense by the porting of many programs and in the formal senseof compliance with industry standards through verification test suites.
Transition from the current industry practice is smooth anddirect following a path grooved with experience and demonstrated success.
No new non-portable data types are required.The data model makes natural use of the C fundamental data types.

Data Size Neutrality

When it was understood that the Single UNIX Specification was constraining system implementationsthat were other than ILP32, the relevant specifications werereviewed and recommendations drafted to make these specificationsdata size- and architecture-neutral.These recommendations were incorporated into the Single UNIX Specification, Version 2 published in 1997.

The following is a summary of the changes that were made.Changes are identified with respect to the CAE Specifications whichmade up the previous version of the Single UNIX Specification.

System Interface Definitions, Issue 5 (XBD)

The following text was added to System Interface Definitions, Issue 4, Version 2 (XBD), Chapter 10, page 130,point 6, as a fourth bullet item:


"Ranges greater than those listed here are allowed."

This section describes the argument syntax of the standardutilities and introduces terminology used throughout the Single UNIX Specificationfor describing the arguments processed by the utilities.It was updated so that the maximum value of a numerical argumentis allowed to be greater than a 32-bit value, thus permitting supportof 64-bit values.

System Interfaces and Headers, Issue 5 (XSH)

Two general changes were made to System Interfaces and Headers, Issue 4, Version 2 (XSH):

Use of the typeintfor return values, arguments, and structure members.
Several functions using the typeintfor return values, arguments, or structure members are not ableto represent 64-bit values correctly on architectures implementingan LP64 data model.Where alternate functions were available which do not have thislimitation, the functions were marked LEGACY and the alternate functionsnoted in the Application Usage section.Where no alternative function was available, the types were changedin a data model-neutral manner to overcome this limitation.
size_tversusssize_t.
Several functions have a parameter declared to besize_twhere the parameter specifies the length of an object to manipulate,and returns the portion of the length of the object processed in a typessize_t.The typessize_tis required so that a negative return value can be used toindicate an error.However, in these functions it is possible for the return value toexceed the range of the typessize_t(sincesize_thas a larger range of positive values thanssize_t).Some functions, such asmq_receive() ,msgrcv() ,read() ,strfmon() ,andwrite() ,resolve this conflict by restricting the object size in the descriptionsection.For example, the description section for therea()dfunction states: "If the value ofnbyteis greater than {SSIZE_MAX}, the result is implementation-dependent."

The following were the detailed changes:

getdtablesize()
Thegetdtablesize()function returns the size of the file descriptor table.This is equivalent togetrlimit()with the RLIMIT_NOFILE option.Whereas thegetrlimit()function returns a value of typerlim_t.This function, returning anint,may have problems representing appropriate values in the future.A note about this was added to the Application Usage section,and the function marked LEGACY, with the recommendation that applicationsshould use thegetrlimit()function instead.
getpagesize()
Thegetpagesize()function returns the current page size.It is equivalent tosysconf(_SC_PAGE_SIZE) andsysconf(_SC_PAGESIZE).This function, returning anint,may have problems representing appropriate values on non-32-bit platforms.Also, the behavior is not specified for this function on systemsthat support variable size pages.On variable page size systems, a page can be extremely large(theoretically, up to the size of memory).This allows very efficient address translations for large segmentsof memory that have common page attributes.A note about this has been added to the Application Usage section,and the function marked LEGACY, with the recommendation thatapplications should use thesysconf()function instead.
readlink()
Thereadlink()function returns the size of the information that it reads as a typeint,but the size of the buffer area is specified by asize_t.This function is specified in the IEEE PASC P1003.1a draft standard,and the return value may change in a future version of theSingle UNIX Specification to reflect the final POSIX.1a standard.
sbrk()
The parameter to thesbrk()function is a typeintdefining the number of bytes by which to change the break value.This function may not be able to address the full memory rangein the future for certain data models.A new type has been introduced to be used in place of the typeint.This is theintptr_ttype which is an abstract data type equatingto a signed integral type large enough to hold any pointer.This new type is one of a new set of types introduced in a new headerinttypes.hto address the issues of data sizes for specific types.
inttypes.h
Theinttypes.hheader is a new header in the Single UNIX Specification, Version 2 and includes definitions forat least the following types:
int16_t
16-bit signed integral type.
int32_t
32-bit signed integral type.
int64_t
64-bit signed integral type.
uint16_t
16-bit unsigned integral type.
uint32_t
32-bit unsigned integral type.
uint64_t
64-bit unsigned integral type.
intptr_t
Signed integral type large enough to hold any pointer.
uintptr_t
Unsigned integral type large enough to hold any pointer.
sys/shm.h
The elementshm_segszof structureshmid_ds,specifying the size of a memory segment was of typeint.This has been changed to typesize_t.
sys/stat.handsys/statvfs.h
The elementst_blocksof the structurestatwas changed to the new abstractblkcnt_ttype.
The elementsf_blocks,f_bfree,andf_bavailof the structurestatvfswere changed to the new abstractfsblkcnt_ttype.
The elementsf_files,f_ffree,andf_favailof the structurestatvfswere changed to the new abstractfsfilcnt_ttype.
To support the above changes, the following definitions were added tosys/types.h :
blkcnt_t
A signed arithmetic type, used for file block counts.
fsblkcnt_t
An arithmetic type, used for filesystem block counts.
fsfilcnt_t
An arithmetic type, used for file serial numbers.
sys/time.h
Thetv_usecelement of thetimevalstructure was of typelong.This has been changed to use a new abstract data type for signedintegral time values, known assuseconds_t.suseconds_twas added to sys/types.h .
msgrcv()
In XSH, Issue 4, Version 2,msgrcv()returns the size of the message received as an integer value, butthe size of the message area is specified by asize_t.On 64-bit systems wheresize_tmay be a different data type toint,this will cause problems.XSH, Issue 5 addresses this problem by changing the type of thereturn value frominttossize_t,and adding a warning to the Description about values ofmsgszlarger the {SSIZE_MAX}.
sysconf()andunistd.h
There is now a way to find out the data model supported by the platform.This can be queried at compile time, using the constants defined inunistd.h ,or at run time using thesysconf()function.
The following symbolic constants are defined to have the value -1if the implementation never provides the feature, and to havea value other than -1 if the implementation always provides thefeature.If these are undefined, thesysconf()function can be used to determine whether the feature is provided for aparticular invocation of the application.
_XBS5_ILP32_OFF32
Implementation provides a C-language compilation environment with 32-bitint,long,pointer,andoff_ttypes.
_XBS5_ILP32_OFFBIG
Implementation provides a C-language compilation environment with32-bitint,long,andpointertypes, and anoff_ttype using at least 64 bits.
_XBS5_LP64_OFF64
Implementation provides a C-language compilation environment with32-bitintand 64-bitlong,pointer,andoff_ttypes.
_XBS5_LPBIG_OFFBIG
Implementation provides a C-language compilation environment with aninttype using at least 32 bits andlong,pointer,andoff_ttypes, using at least 64 bits.

Commands and Utilities, Issue 5 (XCU)

A new section of text was added to the end of the first paragraph ofSection 1.9, Utility Description Defaults, to align with requirementsin POSIX.2, and restates that integer variables and constantsused by utilities are permitted to be 64-bit values.

Programming Environments

Thec89reference page has some new text describing programming environments.All conforming implementations must support one of the followingprogramming environments by default.Applications can use thesysconf()function or thegetconfutility to determine which programmingenvironments the implementation supports.

The following table describes the supported programming environments.

Programming Environment	int	long	pointer	off_t
XBS5_ILP32_OFF32	32	32	32	32
XBS5_ILP32_OFFBIG	32	32	32	>=64
XBS5_LP64_OFF64	32	64	64	64
XBS5_LPBIG_OFFBIG	>= 32	>= 64	>= 64	>= 64

Thec89reference page also has text describing new support ingetconfandsysconf()to determine configuration strings for C compiler flags,linker/loader flags, and libraries for each supported environment.

When an application wishes to use a specific programming environmentrather than an implementation's default programming environmentwhile compiling, the application must first verify that theimplementation supports the desired environment.If the desired programming environment is supported, the applicationcan then invokec89with the appropriate C compiler flags as thefirst options for the compile, the appropriate linker/loader flagsafter any other options but before any operands, and theappropriate libraries at the end of the operands.

The following table shows the various options available:

Programming Environment	Use	Compiler Flags
XBS5_ILP32_OFF32	C Compiler Flags	XBS5_ILP32_OFF32_CFLAGS
	Linker/Loader Flags	XBS5_ILP32_OFF32_LDFLAGS
	Libraries Flags	XBS5_ILP32_OFF32_LIBS
XBS5_ILP32_OFFBIG	C Compiler Flags	XBS5_ILP32_OFFBIG_CFLAGS
	Linker/Loader Flags	XBS5_ILP32_OFFBIG_LDFLAGS
	Libraries Flags	XBS5_ILP32_OFFBIG_LIBS
XBS5_LP64_OFF64	C Compiler Flags	XBS5_LP64_OFF64_CFLAG
	Linker/Loader Flags	XBS5_LP64_OFF64_LDFLAGS
	Libraries Flags	XBS5_LP64_OFF64_LIBS
XBS5_LPBIG_OFFBIG	C Compiler Flags	XBS5_LPBIG_OFFBIG_CFLAGS
	Linker/Loader Flags	XBS5_LPBIG_OFFBIG_LDFLAGS
	Libraries Flags	XBS5_LPBIG_OFFBIG_LIBS

Porting Issues

Porting an application to a 64-bit UNIX system can be accomplishedwith a minimal amount of effort if the application was developedusing good modern software engineering practices such as:

ISO C function prototypes
consistent and careful use of data types
all declarations are in headers.

First of all, determine which data model is available on the platformyou are porting to.The data model you are porting to will have a major impacton the amount of work required to achieve a successful port.

Then take the time to create and use ISO C function prototypesif they are absent from the source code.Unfortunately large quantities of perfectly good legacy codedeveloped in the days before portability was a major issuemay not have function prototypes.Fortunately many compilers have an option to generate ISO C functionprototypes.

The remainder of this article assumes that you are porting to anLP64 platform since this is the data model of choice amongst majorvendors, but the issues raised are equally valid on some or allof the other 64-bit data models.

General

Use utilities such asgrepto locate and check all instances of the following:

Shift and complement operators; that is, "<<", ">>", "~".If used withlong,".add "L" to value shifted to avoid an incorrect result.
Addresses of "&".Make sure they are not being stored in anint.
Declarations of typelong.Many of these can be converted to typeintto save space.This is particularly true for network code.
The functionslseek() ,fseek() ,ftell() ,fgetpos() ,and so on.Use eitheroff_torfpos_tas appropriate for offset arguments.Do not useintorlongto store file offsets.
All (int *)and (long *)casts.
Use of (char *)0 for zero or (char *)comparisons.Use NULL instead.
Hard-coded byte counts or memory sizes.These will be wrong if they assumelongsorpointersare 32 bits.Applications should use thesizeof()operator to avoid such problems.

Declarations

To enable application code to work on both 32-bit and 64-bit platforms,check allintandlongdeclarations.Declare integer constants using "L" or "U" as appropriate.Ensure anunsigned intis used where appropriate to prevent sign extension.If you have specific variables that need to be 32 bits on bothplatforms, define the type to beint.If the variable should be 32 bits on an ILP32 platform and 64 bits onan LP64 platform, define the variables to belong.

Declare numeric variables asintorlongfor alignment and performance.Don't worry about trying to save bytes by usingcharorshort.Remember that if the type specifier is missing from a declaration,it defaults to anint.Declare character pointers and character bytes asunsignedto avoid sign extension problems with 8-bit characters.

Assignments and Function Parameters

All assignments require checking.Sincepointer,int,andlongare no longer the same size on LP64 platforms, problems may arisedepending on how the variables are assigned and used withinan application.

Do not useintandlonginterchangeably because of the possible truncation of significantdigits, as shown in the following example:

int iv;long lv;iv = lv;

Do not useintto store a pointer.The following example works on an ILP32 platform but fails on an LP64platform because a 32-bit integer cannot hold a 64-bit pointer:

unsigned int i, *p;i = (unsigned) p;

The converse of the above example is sign extension:

int *p;int i;p = (int *)i;

Do not passlongarguments to functions expectingintarguments.Avoid assignments similar to the following:

int foo(int);int iv;long lv;iv = foo( lv );

Do not freely exchangepointersandints.Assigning a pointer to anint,assigning back to apointer,and dereferencing the pointer may result in a segmentation fault.Avoid assignments similar to the following example:

int iv;char *buffer;
buffer = (char *) malloc((size_t)MAX_LINE);iv = (int) buffer;buffer = (char *) iv;

Do not pass a pointer to a function expecting anintas this will result in lost information.For example, avoid assignments similar to the following:

void f();char *cp;f(cp);

Use of ISO C function prototypes should avoid this problem.Use thevoid*type if you need to use a generic pointer type.This is preferable to converting apointerto typelong.

Examine all assignments of alongto adoubleas this can result in a loss of accuracy.On an ILP32 platform, an application can assume that adoublecontains an exact representation of any value stored in along(or apointer).On LP64 platforms this is no longer a valid assumption.

External Interfaces

An external interface mismatch occurs when an external interfacerequires data in a particular size or layout, but the data is notsupplied in the correct format.

For example, an external interface may expect a 64-bit quantity,but receive instead a 32-bit quantity.Another example is an external structure which expects apointer to a structure with 2ints(8 bytes) but instead receives a pointer to a structure with anintand along(16 bytes, 12 of data, 4 of alignment padding).External interface mismatching is a major cause of porting problems.

Format Strings

The functionprintf()and related functions can be a major source of problems.For example, on 32-bit platforms, using "%d" to print either anintorlongwill usually work,but on LP64 platforms "%ld" must be used to print along.Use the modifier "l" with the d, u, o, and x conversion charactersto specify assignment of typelongorunsigned long.When printing a pointer, use "%p".If you wish to print the pointer as a specific representation,the pointer should be cast to an appropriate integer type before usingthe desired format specifier.For example, to print a pointer as aunsigned longdecimal number, use %lu:

char *p;

printf( "%p %lu\n", (void *)p, (unsigned long)p );

As a rule, to print an integer of arbitrary size, cast the integer tolongorunsigned longand use the "%ld" conversion character.

Constants

The results of arithmetic operations on a 64-bit platform can differfrom those obtained using the same code on a 32-bit platform.Differing results are often caused by sign extension problems.These are generally the result of mixingsignedandunsignedtypes and the use of hexadecimal constants.Consider the following code example:

long lv = 0xFFFFFFFF;if ( lv < 0 ) {

On an ILP32 platform,lvis interpreted as -1 and theifcondition succeeds.On an LP64 platformlvis interpreted as 4294967295 and the if condition fails.

Pointers

On ILP32 platforms, anintand apointerare the same size (32 bits) and application code can generally usethem interchangeably.For example, a structure could contain a field declared as anint,and most of the time contain aninteger,but occasionally be used to store apointer.

Another example, which most 32-bitintutilities will not catch, is the following:

int iv, *pv;iv = (int) pv;pv = (int *) iv;

This code fails on an LP64 platform.Not only do you lose the high 4 bytes of "p", but by default these highbytes are significant.

Sizeof()

On ILP32 platformssizeof(int)=sizeof(long)=sizeof(ptr *).Using the wrongsizeof()operand does not cause a problem.On LP64 platforms, however, using the wrongsizeof()will almost certainly cause a problem.For example, the following 32-bit code which copies an array ofpointerstoints:

memcpy((char *)dest, (char *)src, number * sizeof(int))

must be changed to usesizeof(int *):

memcpy((char *)dest, (char *)src, number * sizeof(int *))

on an LP64 platform.

Note that the result of thesizeof()operation is typesize_twhich is anunsigned longon LP64 platforms.

Structures and Unions

The size of structures and unions on 64-bit platforms can be differentfrom those on 32-bit platforms.For example, on ILP32 platforms the size of the followingstructure is 8 bytes:

struct Node {    struct Node *left;    struct Node *right;}

but on an LP64 platform its size is 16 bytes.

If you are sharing data defined in structures between 32-bitand 64-bit platforms, be careful about usinglongsandpointersas members of shared structures.These data types introduce sizes that are not generally available on32-bit platforms.Avoid storing structures with pointers in data files.This code then becomes non-portable between 32-bit and 64-bit platforms.

To increase the portability of your code, usetypedef'dtypes for the fields in structures to set up the types asappropriate for the platform, and use thesizeof()operator to determine the size of a structure.If necessary, use the#pragmapack statement to avoid compiler structure padding.[Note:This is not portable and is not a general solution.]This is important if data alignment cannot change (network packets,and so on).

Structures are aligned according to the strictest aligned member.Padding may be added to ensure proper alignment.This padding may be added within the structure, or at the endof the structure to terminate the structure on the samealignment boundary which it started.

Problems can occur when the use of a union is based on an implicitassumption, such as the size of member types.

Consider the following code fragment which works on ILP32 platforms.The code assumes that an array of twounsigned longoverlays a double.

union double_union {    double d;    unsigned long ul[2];};

To work on an LP64 platform,ulmust be changed to anunsigned inttype:

union double_union {    double d;    unsigned int ul[2]};

This problem also occurs when building unions betweenintsandpointerssince they are not the same size on LP64 platforms.

Beware of all aliasing which is different multiple definitions ofthe same data.For example, assume the following two structures refer to the samedata in different ways:

struct node {    int src_addr, dst_addr;    char *name;}struct node {    struct node *src, *dst;    char *name;}

This works on an ILP32 platform, but fails on an LP64 platform.The two structure definitions should be replaced with a uniondeclaration to ensure portability.

More Information

This article is derived from The Open Group Source Book,"Go Solo 2 - The Authorized Guide to Version 2 of the Single UNIX Specification". Thisis published herein with permission of The Open Group.More information on the Single UNIX Specification, Version 2can be obtained from the following sources:

The online version of the Single UNIX Specification can befound at theURLhttp://www.UNIX-systems.org/online.html.
The Open Group Source Book "Go Solo 2 - The Authorized Guide to Version 2 of the Single UNIX Specification", 600 pages, ISBN 0-13-575689-8. Thisbook provides complete information on what's new in Version 2 , withtechnical papers written by members of the working groupsthat developed the specifications , and a CD-ROM containing the complete 3000 page specification in both HTMLand PDF formats (including PDF reader software). For more information on the book, see URLhttp://www.UNIX-systems.org/gosolo2 .
Additional information on the Single UNIX Specification can beobtained at The Open Group world wide web site, see the URLhttp://www.UNIX-systems.org .

Movatterモバイル変換

UNIX System Home The Single UNIX Specification UNIX 03 UNIX V7 Mailing Lists White Papers

64-bit and Data Size Neutrality

What is UNIX®?

UNIX 98 |UNIX 03 |UNIX V7