Movatterモバイル変換

64-Bit Programming Models: Why LP64?

Participation from: Digital Equipment Corporation, Hewlett-Packard Company,IBM Corporation, Intel Corporation, Novell Inc., NCR Corporation (formally AT&TGIS), Santa Cruz Operation Inc., Sunsoft Inc., and X/Open Company Ltd. (a.k.a.,Aspen Data Model Committee)

EXECUTIVE SUMMARY

The Open Systems community is at an optimum moment to make a choiceregarding how 64-bit architectures will be supported by our implementations. The technical arguments included in this paper defend the position thatLP64is a better solution for 64-bit programming than other potential models (ILP64& LLP64) . The key evaluation issues are portability, interoperabilitywith 32-bit environments, standards conformance, performance effects andtransition costs; as we analyzed each of the data models against theseevaluation metrics,LP64 was our clear choice.

PROBLEM STATEMENT

Demand for a larger flat address space is the key opportunity for 64-bitsystems. There is much discussion today in the computer industry about thebarrier presented by 32-bit addresses. 32-bit addresses to byte granularity arecapable of selecting one of 4 gigabytes (GB) of memory. Disk storage has beenimproving in areal density at the rate of 70% compounded annually, and drives of4 GB are readily available with higher-density drives coming shortly. Memoryprices have not dropped as sharply, but 16 MB chips are readily available with64 MB chips in active development. CPU processing power continues to increaseby about 50% every 18 months, providing the power to process ever largerquantities of data. This conjunction of technological forces along with thecontinued demand for systems capable of supporting larger data bases, largersimulation models and newer data types (e.g., full-motion video) have generatedrequirements for support of larger addressing structures in todays systems.

Major chip architectures have begun to provide direct addressing of uniformaddress spaces larger than 4GB. As these chips and systems based on them cometo market, there is an opportunity to agree on the programming model that willbe used with them, thus avoiding unnecessary fragmentation. This is clearlybest done within the environment of the Open Systems specifications asimplementations exist for all the architectures. Further, if the variousimplementers can agree on some of the details, then application developers willbe faced with a single environment from multiple vendors, easing on-goingsupport costs.

TECHNICAL CHOICES

The C programming language lacks a mechanism for adding new fundamentaldatatypes. Providing 64-bit addressing and scalar arithmetic capabilities tothe C application developer involves changing the bindings or mappings of theexisting datatypes or adding new datatypes to the language.

There are three basic models that can be chosen,LP64,ILP64andLLP64 as shown in the following table. The notation describes thewidth assigned to the basic data types.LP64(also known as 4/8/8)denoteslong andpointer as 64 bit types,ILP64(also known as 8/8/8) meansint,long andpointerare 64 bit types andLLP64 (also known as 4/4/8) adds a new type (longlong) andpointeras 64 bit types. Most of today's 32bit systems areILP32 (that is,int,longandpointers are all 32-bits wide). The majority of C Languageprograms written today for Microsoft Windows

3.1 are written for the Win-16 APIs which is anLP32 (intis 16 bits, whilelong andpointers are 32-bits)model. The C definitions on the Apple Macintosh are alsoLP32.

ILP32char8short16int323264pointer64

Datatype	LP64	ILP64	LLP64	LP32
8	8	8	8
16	16	16	16
_int32		32
32	64	32	16
long	64	64	32	32
long long
64	64	32	32

C language standards specify a set of relationships between the various datatypes but deliberately do not define actual sizes. Ignoring the non-standardtypes for a moment, all three 64-bit pointer models satisfy the rules asspecified.

A change in the width of one or more of the C datatypes affects programs inobvious and not so obvious ways. There are two basic sets of issues: (1) dataobjects defined with one of the 64-bit datatypes will be different in size fromthose declared in an identical way on a 16 or 32-bit system, and (2)assumptions (not those specified within the C standard, but used anyway by thedevelopers of particular pieces of code) about the relationships between thefundamental datatypes may no longer be valid. Programs depending on thoserelationships often cease to work properly.

LLP64 preserves the relationship betweenint andlong by leaving both as 32-bit datatypes. Objects notcontainingpointers will be the same size as on a 32-bit system. Support for 64-bit scalar data is provided by adding a new, non-portable scalardatatype such as__int64 orlong long.LLP64is really a 32-bit model with 64-bit addresses. Most of the runtime problemsassociated with the assumptions between the sizes of the datatypes are relatedto the assumption that apointer will fit in anint. To solve this class of problemsint orlongvariables are changed tolong long, a non-standard datatype. This solution is optimized for the first class of problem and is dependent onthe introduction of a new datatype.

AnLLP64 operating system is forced to either change the datatypedefinition of many of the system programming interfaces (API's) or introduce anew set of 64-bit wide interfaces. Most of these interfaces have been stablefor almost 25 years across all versions of the UNIX operating system. Thus,LLP64 requires extensive modifications to existing specifications tosupport those places which should naturally become 64-bit wide.

ILP64 attempts to maintain the relationship betweenint,long andpointer by making all three the samesize (as is the case inILP32). Assignment ofpointerstointorlong variables does not result in dataloss. On the other hand this model either ignores the portability of data ordepends on the addition of a 32-bit datatype such asint32 or__int32. This is a potential conflict with existing typedefs,and is especially contrary to the spirit of C development, which has avoidedembedding size descriptions into fundamental datatypes. Programs that mustpreserve the size and alignment of data are forced to use the non-standarddatatype and may not be portable.

The world is currently dominated by 32-bit computers, a situation that islikely to exist for the foreseeable future. These computers run 16 or 32-bitprograms, or some mixture of the two. Meanwhile, 64-bit CPUs will run 32-bitcode, 64-bit code, or mixtures of the two (and perhaps even some 16-bit code). 64-bit applications and systems must integrate smoothly into this environment. Key issues facing the industry are the interchange of data between 64 and 32-bitsystems (in some cases on the same system) and the cost of maintaining softwarein both environments. Such interchange is especially needed for largeapplication suites (like database systems), where one may want to distributemost of the programs as 32-bit binaries that run across a large installed base,but be able to choose 64-bits for a few crucial programs, like server processes.ILP64 implies frequent source code changes and requires the use ofnon-standard datatypes to enable interoperability and maintain binarycompatibility for existing data.

LP64takes the middle road. 8, 16 and 32-bit scalar types (char,short andint) are provided to declare objectsthat maintain size and alignment on 32-bit systems. A 64-bit type (long)is provided to support the full arithmetic capabilities and is available to usein conjunction with pointer arithmetic. Programs that assign addresses toscalar objects need to specify the object aslonginsteadofint. Programs that have been made 64-bit safe can berecompiled and run on 32-bit systems without change. The datatypes arenatural,each scalar type is larger than the preceding type.

As a language design issue, the purpose of havinglong inthe language anticipates cases where there is an integral type longer thanint.The fact thatint andlong representdifferent width datatypes is a natural and common sense approach and is thestandard in the PC world whereint is 16 andlongis 32-bits.

EVALUATION CRITERIA

The programming model choice described in the last section can be madeindividually by each of the system vendors, or jointly through an implementersagreement amongst multiple vendors. We argue that the Open Systems community,most particularly the application developers, are best served if there is asingle choice widespread in the emerging 64-bit systems. This removes a sourceof subtle errors in porting to a 64-bit environment, and encourages more rapidexploitation of the technology options. Also, this is an opportune moment tomake such an agreement, since the early shippers (Digital and SGI) have alreadyselected the same model (namely,LP64), while other vendors have not yetcommitted their shipping products to a choice.

The remainder of this paper describes the evaluation criteria we suggestusing to make a selection for the industry, and assesses theLP64 andILP64 models against these criteria. TheLLP64 model is not,in our view, a satisfactory basis for widespread adoption and use since itrequires extensive modification to existing standards.

PORTABILITY

A major test for any model is the ability to support the large existing codebase within the Open Systems arena. The investment in code, experience, anddata surrounding these applications is the largest determiner of the rate atwhich new technology is adopted and spread. In addition, it must be easy for anapplication developer to build code which can be used in both existing and newenvironments.

At this point, there is experience at Digital and SGI with the realities ofporting applications to anLP64 based 64-bit programming environment. The Digital UNIX product requires that ALL applications run in anLP64runtime environment. The SGI product provides a mixed environment supportingILP32 andLP64 models, but many applications currently shippingon the SGI systems are already using theLP64 model.

The Digital and SGI experiences prove complementary facts. Digitals showsthat it is possible for ISVs and customers to port large quantities of code to a64-bit environment, while producing and inter-operating with 32-bit portselsewhere. SGIs experience shows that code can be improved to be compilable foreither 32 or 64-bits and still be able to interchange data in the moretightly-coupled fashion expected by processes on the same system.

Although we are beginning to see a number of applications grow to the pointof requiring the larger virtual address space there hasnt been a requirement fora 64-bitint data type. The majority of todays 64-bitapplications previously ran only on 32-bit systems (usually some flavor ofUNIX), and had no expectation of a greater range for theintdata type. In such cases, the extra 32 bits of data in a 64-bit integer arewasted. Future application requiring a larger scalar data type can uselong.

Other language implementations will continue to support a 32-bit integertype. For example, the FORTRAN-77 standard requires that the typeINTEGERbe the same size asREAL, which is half the size ofDOUBLE PRECISION. This, together with customer expectations, means that FORTRAN-77implementations will generally haveINTEGER as a 32-bit type, even on64-bit hardware. A significant number of applications use C and FORTRANtogether -- either calling each other or sharing files. Such applications havebeen amongst the quickest to find reason to move to 64-bit environments. Experience has shown that it is usually easier to modify the data sizes/types onthe C side than the FORTRAN side of such applications, and we expect that suchapplications would require a 32-bit integer data type in C regardless of thesize of theint datatype.

Nearly all applications moving from a 32-bit system require some minormodifications to handle 64-bit pointers, especially where assumptions about therelative size ofint andpointer data types weremade. We have also noticed assumptions about the relative sizesof int,char,short andfloat datatypesthat do not cause problems in anLP64 model (since the sizes of thosedatatypes are identical to those on a 32-bit system) but do in aILP64model. Our experience suggests that neither anLP64 nor anILP64model provides a painless porting path from a 32-bit system, but that allother things being equal, smaller datatypes enable better applicationperformance

INTEROPERABILITY WITH 32-BIT SYSTEMS AND OTHER 64-BIT SYSTEMS

For many years, we expect the bulk of systems shipping in the computerindustry will be based on 32-bit programming models. A crucial investment forend-users is the existing data built up over decades in their computer systems. Any solution must make it easy to utilize such data on a continuing basis.

Unfortunately, theILP64 model does not provide a natural way todescribe 32-bit data types, and must resort to non-portable constructs such as__int32 to describe such types. This is likely to cause practicalproblems in producing code which can run on both 32 and 64 bit platforms without#ifdef constructions. It has been possible to port large quantities ofcode toLP64 models without the need to make such changes, whilemaintaining the investment made in data sets, even in cases where the typinginformation was not made externally visible by the application.

Mostints in existing programs can remain as 32 bits in a64-bit environment; only a small number are expected to be the same size aspointersorlongs. UnderLP64 very few instances need to bechanged.

WithILP64, mostints need to change to__int32. However,__int32 does NOT behave like 32 bitint. Instead__int32 is likeshort in that alloperations have to be converted toint (64-bits, sign extended)and performed in 64-bit arithmetic.

So,__int32 inILP64 is not the same asintinILP32, nor the same asint inLP64.Thesedifferences will predictably cause subtle hard-to-find bugs.

STANDARDS

The Open Systems community is technically driven by a set of API agreementsembodied in specifications from groups such as X/Open, IEEE, ANSI, ISO and OMG. These documents have developed over many years to codify existing practice anddefine agreement on new capabilities. Thus, the specifications collectively area major value to the system developers, application developers and end-users. There is a body of work on verifying that implementations correctly embody thedetails of the specification and certify that fact to various consumers. Theseverification suites are also part of the "glue" that keeps us acommunity. Any 64-bit programming model cannot invalidate large quantities ofthese specifications (with their extensive detailed descriptions) and expect toachieve wide adoption.

Currently shipping,LP64 based operating system have met and passedmany of the existing specifications and verification suites. It is ourobservation that there was no major philosophical barrier in doing so, butrather much detailed review of critical items buried within the specificationsand verification suites. As a community, we know by demonstration thatLP64systems can comply with the commercially important standards; there is NO suchdemonstration today forLLP64 orILP64 systems

In particular, most standards, but particularly the language standards suchas the ANSI C specification deliberately are silent in enforcing width decisionsfor basic data types since history has shown an assortment of choices made toreflect underlying architectures. This leads to deliberate ambiguity in themeaning of certain code samples, both as they move between different C compilersand when they move between optimization levels on specific C compilers. Some ofthese can occasionally cause practical problems, but well-written code and toolssuch as lint have eased this problem significantly. Moreover, the experience ofporting applications between various vendor platforms has significantly helpedfind problems like this. Nonetheless, such examples exist both as demonstrationpoints and as practical problems - they require significant intellectual energyto get them right. WithLP64, we have the accumulated experience ofseveral implementers and application developers to help.

PERFORMANCE CHARACTERISTICS

We have argued that technology advances at economic levels enable 64-bitcomputing to become widespread. However, for some time the acceptable economicboundaries for many problems will be a barrier. Thus, crucial to rapid adoptionis attention to inherent performance differences between the models. We see twocategories of differences: (1) instruction cycle costs to properly implement thedefined model, and (2) memory system costs (at all levels of the memoryhierarchy) to transport to the places required.

Instruction cycle penalties will be incurred whenever additional cycles arerequired to properly implement the semantics of the intended programming model. However, compiler writers have extensive experience with the type ofoptimizations that are available.

For example, inLP64 its only necessary to perform sign extension onint when you have a mixed expression includinglong;most integral expressions do not includelongs. Compilers canbe smart enough to only sign extend when necessary. Many current architectures(Alpha, MIPS, Sparc V9 and PowerPC) dont have a problem because there is a32-bit load that performs sign extension, although they may have the analogousproblem with 32-bit unsignedint. Given that most CPUs willspend much of their time executing 32-bit operations (whether running 64-bitprograms, or simply doing 32-bit operations in 64-bit code), it seems difficultto understand why many implementations would penalize 32-bit operations.

In our porting experience, these "inner loop" issues have NEVERbecome major performance determiners for commercial applications, and thecurrent balancing act between memory cycle times and CPU cycle times leaves openissue slots which can often absorb the additional cycles.

A much larger practical effect in some commercially important applicationscomes from the consumption of additional memory and the costs of transportingthat memory throughout the system. 64-bit integers require twice as much spaceas 32-bit integers. Further, the latency penalty can be enormous, especially todisk, where it can exceed 1,000,000 CPU cycles (3nsec to 3msec).int is by far the most frequent data type to be found(statically and sometimes dynamically) within C and C++ programs.

Some software vendors have experimented with anILP64model, whichcan be approximated onLP64 systems by changing allintdeclarations tolong. In these cases, the conclusion reachedafter these experiments was not to useILP64, since the application didnot benefit from the additional range ofint values and did notwish to pay the performance penalty of extra memory use.

TRANSITION FROM CURRENT INDUSTRY PRACTICE

By far, the largest body of existing code already modified for 64-bitenvironments runs onLP64 based platforms. The vendors of theseplatforms have worked with most of the application developers whose support iscritical to any widespread adoption of 64-bits in the community. The practicalproblems of such ports have been resolved in a fashion that is demonstrablysuccessful based on years of market experience.

There have been a few examples ofILP64 systems that have shipped (Cray and ETA come to mind). These attempts have not had the broad market basethatLP64 systems have had, although they do demonstrate that it isfeasible to complete the implementation of anILP64 environment.

Applications already modified forLP64 will need additional effortin code audit, changes to usage ofint, and performance tuningto run onILP64 systems. Many large ISVs would need separate code poolsto support this difference. It is hard to see why this change provides them anyvalue to compensate for the additional effort.

Applications not yet modified for any 64-bit systems have the experience ofothers to guide them - experience expressed as tools to identify troublesomeconstructs, and as porting documents. These experiences become the practicalguidance needed to both encourage adoption and avoid pitfalls.

SUMMARY

Each of the evaluation criteria are subject to extensive further exploration(a quote comes from Jim Gray - Computing is fractal; wherever you look, there isinfinite complexity). We have argued that each issue supports a choice ofLP64.

Portability, especially for combined FORTRAN and C code, isenhanced withLP64 and the most common types of problems that occur aresusceptible to automatic detection.
Interoperability is improved by the ability to use a standard datatype to declare data structures that can be used in both 32 and 64-bitenvironments.
Standardsconformance has been demonstrated both in thepractical sense of porting many programs and in the formal sense of compliancewith important industry standards.
Performance has been a major and effective selling theme for bothLP64 systems, and the memory size penalty for unneeded 64-bit integerscan be very high for some applications.
Transition from the current industry practice is smooth and directfollowing a path grooved with experience and demonstrated success.

All this, as well asnatural use of the native C datatypesto support all the widths needed in a 64-bit system make a compelling argumentfor the inherent advantage ofLP64.

Read other technical papers.

Read ordownload the complete Single UNIX Specification fromhttp://www.UNIX-systems.org/go/unix.

UNIX is a registered trademark of The Open Group.

[8]ページ先頭