









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
The goal of a programming language is to make it easier to build software. A programming language can help make software flexible, correct, ...
Typology: Summaries
1 / 17
This page cannot be seen from the preview
Don't miss anything!
The goal of a programming language is to make it easier to build software. A programming language can help make software flexible, correct, and efficient. How good are today’s languages and what hope is there for better languages in the future? Many programming languages are in use today, and new languages are designed every year; however, the quest for a language that fits all software projects has so far come up short. Each language has its strengths and weaknesses, and offers ways of dealing with some of the issues that confront software development today. The purpose of this chapter is to highlight and discuss some of those issues, to give examples from languages in which the issues are addressed well, and to point to ongoing research that may improve the state of the art. It is not our intention to give a history of programming languages [42, 10]. We will focus on flexibility, correctness, and efficiency. While these issues are not com- pletely orthogonal or comprehensive, they will serve as a way of structuring the discussion. We will view a programming language as more than just the programs one can write; it will also include the implementation, the programming environment, various programming tools, and any software libraries. It is the combination of all these things that makes a programming language a powerful means for building software. Two particular ideas are used for many purposes in programming languages: compilers and type systems. Traditionally, the purpose of a compiler is to translate a program to executable machine code, and the purpose of a type system is to check that all values in a program are used correctly. Today, compilers and type systems serve more needs than ever, as we will discuss along the way.
We say that software is flexible when it can run on a variety of platforms and is easy to understand, modify, and extend. Software flexibility is to a large extent achieved by good program structure. A programming language can help with the structuring of programs by offering language constructs for making the program structure explicit. Many such lan- guage constructs have been designed, and some of them will be highlighted in the following discussion of several aspects of software flexibility.
Goal: Write once, run everywhere.
The C programming language [27] was invented in the 1970s. Over the years, it became the “gold standard” for platform independence: most general-purpose processors, supercom- puters, etc. have their own C compiler, which, in principle, enables any C program to run on any such computer. This is much more flexible than rewriting every program from scratch, perhaps in machine code, for each new computer architecture. In practice, most C programs will have some aspects that tie them closely to a particular architecture. Programmers are well aware of this and tend to design their programs such that porting software from one computer to the next entails minimal changes to the C source code. The main challenges to porting software lie partly in finding out the places in the C program where changes have to be made, and partly in understanding the differences between the two architectures in question. In the 1990s, the Java programming language [23] popularized the use of virtual machines. Virtual machines increase platform independence by being an intermediate layer between high-level languages such as Java, and low-level languages such as machine code. A Java compiler will translate a Java program to virtual machine code, which the virtual machine executes. Each kind of computer will have its own implementation of the virtual machine, much like it will have its own C compiler. The virtual machine offers capabilities such as event handling in a platform-independent way, which in C would be platform dependent. This is a step towards the ultimate goal of “write once, run everywhere.” Intuitively, we have a spectrum
machine code C virtual machine code —————————————————————————————–→ platform dependent platform independent
Virtual machines such as those for Java and the .NET Common Language Runtime [30] are slowly replacing C as “portable assembly code,” that is, the preferred target for compiling languages such as Java and C# [29]. Some virtual machines work in part by translating virtual machine code to C. Platform independence is particularly important for mobile code that can move from one computer to another while it is executing. If the computers all run the same kind of virtual machine, then such movement is greatly simplified. Platform independence via virtual machines comes with a price: it is more difficult to execute virtual machine code efficiently than it is to execute C code efficiently. Efficient implementation of virtual machines remains an active research area [2].
Goal: Never duplicate code.
When a programmer faces a programming task, it is tempting to program the needed functionality from scratch. It gives complete control over the code and minimizes the amount
Goal: Separation of concerns.
A large program tends to have other big features than many lines of code. It may have a large state space, represented by many variables, and it may have several rather independent subtasks that are handled by quite separate parts of the code. A programmer will manage the code complexity by trying to keep separate things separate. For example, a classical system design is to have three layers: a user interface, the business logic, and a database. If the code for each of the layers is kept separate and has minimal and well-defined interaction with the other layers, then the overall system will be easier to build, understand, and maintain. A programming language can help enforce separation of concerns by offering ways of creating separate name spaces. For that purpose, Java has packages, ML [32] has modules, and C++ has so-called namespaces. In each case, the idea is, for example, to enable the name space of the user interface to be separate from that of the database. The advantage is that the programmer of the user interface will not accidentally or purposefully manipulate the internals of the database. Furthermore, the programmer of the database can choose names of variables, procedures, etc without worrying about clashes with names used in the user interface code. Good constructs for code management greatly enhance software engineering by teams of programmers where each programmer is responsible for a module. As long as the interfaces between the modules are well defined, each programmer can work independently. Java packages are a simple mechanism for separation of concerns. Each package has a name and a separate name space, and it can access names from other packages only by explicitly importing them. For example, in Java, if we write
import java.util.*;
then we get access to all classes and interfaces defined in the package java.util. The import statements make explicit the relationships with other modules and therefore avoid name clashes. The collection of Java classes is flat and unstructured; a Java package cannot be nested in another package. Moreover, a Java package is not a value; it cannot be stored in a variable, passed as an argument, or returned as a result. In ML, a module is a first-class entity which can be stored and passed around in much the same way as a basic value such as an integer. There is a layering, though: functions that take modules as arguments are functors, not functions. So, functors are functions from modules to modules, while normal functions are functions from normal values to values. ML modules and functors blend mechanisms for abstraction and for separation of con- cerns. Further blending of modules with mechanisms of object-oriented programming re- mains an active research area [20].
Goal: Control access to shared resources.
At the hardware level, synchronization of concurrent processes can be supported in a variety of ways. For example, there can be an atomic test-and-set operation that allows a shared register to be simultaneously read and written. Another example is an atomic operation for swapping the contents of two registers. Such operations enable a register to be a lock for a shared resource. With either of the two mentioned operations, one can ask for the lock, and possibly get it, in one computation step. If it had happened in two computation steps, then it is possible that two different concurrent processes simultaneously would ask for the lock, both find that the lock is not currently held, and then both processes take the lock. This would be a programming error. Representing and operating on locks with atomic operations is fairly cumbersome and highly machine dependent. Programming languages tend to provide more convenient con- structs for concurrency control. Those constructs can often be easily implemented using test-and-set, etc. A classical example is that of a semaphore [19], which is a data structure with just two operations: wait and signal. We can think of a semaphore as an abstraction of a lock. When a process wants access to a shared resource, it issues a wait to the semaphore that guards the resource. If no other process is currently accessing the resource, then the process is granted access right away and is now holding the lock. When the process is done with the access, it issues a signal to the semaphore, thereby releasing the lock. If the wait is issued while another process holds the lock, then the process will wait until the lock is released. In general, the semaphore may have a queue of processes waiting to access the shared resource. A different construct for concurrency control is that of monitors [24]. A monitor is a data structure where we can think of all the operators as being guarded by a single semaphore. Whenever an operator is called, a wait to the semaphore is automatically issued, and when an operation is about to return, a signal is issued. The effect is that the data structure is accessed by at most one concurrent process at a time. In Java, any variable can be used as a lock, and any block of code can be guarded by a lock. For example, we can write
synchronized(lock){ balance = balance + x; }.
Here we can think of lock as a semaphore on which a wait is automatically issued on entry to the guarded block of code, and on which a signal is automatically issued on exit. If every method of a class is guarded by synchronized on the same lock, then each object of that class is effectively a monitor. Much research has gone into finding efficient implementations of the various constructs for concurrency control [5]. A radical idea is to let a compiler prove that a certain lock can be held by at most one process at a time. In such a case, the lock is unnecessary and the calls to be operations on it can be eliminated [1].
3 Correctness
We say that software is correct when it does what it is supposed to do. Most software has complex behavior and one can rarely get a waterproof guarantee that software is fully correct. However, a programming language can help establish partial correctness by offering ways of
In an object-oriented language such as Java, a data type can be a subtype of another type. For example, Plane and Helicopter can be two subtypes of Aircraft. The idea is that every Plane is an Aircraft, but not vice versa. If we have a variable of type Aircraft, the actual object residing in the variable may be a Plane or a Helicopter. Since the type system does not know which one, it would be a type error to do a Plane-specific operation on the variable. This is where type casts come in: we can cast the variable to be a Plane before operating on it. If the variable actually contains a Plane object, then all is well; otherwise a type cast exception is thrown. If the exception is not caught, then the effect is the same as a dynamic type error in Scheme. The quest of type systems research is to strike a balance between expressiveness, that is, the number of programs that can be type checked, and efficiency, that is, the time it takes to do the type checking. One of the current active research areas is to combine static type systems for object-oriented languages with a notion of generic types [11].
Goal: A guarantee that the program meets the specification. Mathematical proofs of program properties are based on a formal description of program behavior. A description of program behavior is also known as the semantics of a programming language. There are several approaches to formal semantics. Operational semantics [40] models a computation as a step-by-step process in which commands are executed one after the other. The execution process is modeled on a mathematical, high-level representation of the bits and bytes in an actual computer. Denotational semantics [34] formalizes a program as simply its input-output behavior, that is, in the simplest case, how it maps an initial state to a final state. Axiomatic semantics [25] specifies relationships between program states, such as if a predicate is true of a state, then after executing a command, a somewhat different predicate is true of the next state. Once a formal semantics is in place, we have a foundation for mathematical reasoning about programs. The properties we want to prove must also be stated formally. In general, proving that a program has a property can be difficult. The ultimate goal of proving that a program meets its entire specification remains elusive and is getting increas- ingly problematic as software grows in size and complexity. One kind of program has received particular attention when it comes to proving correctness: compilers. The motivation is that compilers are an important part of today’s computational infrastructure; if the compiler is not working correctly, then all bets are off. Moreover, many aspects of compilers are well understood which increases the chance that good proof techniques can be found [17]. One aspect of compiler correctness can be stated as follows:
If a program p is compiled to code c, and evaluating p gives result r, then evalu- ating c gives a result which is a machine representation of r.
Notice that the semantics of both the source language and the target language play a crucial role in the statement of compiler correctness. One of the goals of modern programming language design is to ensure that all programs in the language have a certain correctness property. One example is type soundness [31], which states:
Well-typed programs cannot go wrong.
Here, “well-typed” simply means that the program type checks, and “go wrong” is, intu- itively, an abstraction of the error “illegal address—core dump” that one can encounter at the hardware level. The standard way of proving type soundness is to first prove two lemmas [37, 43]:
Preservation: If a well-typed program state takes a step, then the new program state is also well typed.
Progress: A well-typed program state is either done with its computation, or it can take a step.
The Preservation lemma says that if we have type checked a program, then during the computation, all reachable program states will also type check. The Progress lemma says that if we have reached a program state that type checks, then we cannot go wrong in the next step of computation. Together, the two lemmas are sufficient to prove type soundness: when we have type checked a program, Preservation ensures that we will only reach typable program states, and Progress ensures that in those states, the computation cannot go wrong. The field of program synthesis [7] is concerned with generating programs from specifica- tions. This obviates the need for proofs altogether; if we can prove the correctness of the program generator, of course! An active research area is that of proving the correctness of optimizing compilers [28]. Some optimizations radically change the code and remain a challenge to prove correct.
Goal: Verify a certificate before executing the program.
Clicking on a link may lead to the downloading and execution of Java bytecode. In addition, the bytecodes will be verified before execution, giving the same kind of guarantee as we get from type checking source code. This can be done easily because Java bytecode contains type information that guides the verifier. We can view this extra type information as a certificate which is checked by the bytecode verifier. If the raw bytecode does not match the certificate, then the bytecode will not be executed. The combination of code and certificate achieves a level of tamperproofing: one can try to tamper with either the code, the certificate, or both, but if the end result does not verify, then it will not be executed. In Java, bytecode verification first and foremost guarantees memory safety. Even if an attacker manages to change the code and the certificate in such a way that the result verifies, the code will be memory safe. It may no longer compute factorial, but will not crash your computer! A weakness of the Java bytecode verification model is that after the verification is done, it may be necessary to compile the bytecode in order to achieve efficient execution. This entails that the compiler is part of the trusted computing base: if the compiler is faulty, then the guarantees obtained at the bytecode level are worthless. This raises the challenge of whether we can avoid trusting the compiler and have a certificate for machine code. The idea is that, instead of downloading Java bytecodes, we would like to download machine code
If we allow certificates to be written as proofs in a full-blown logic, then the certified code is known as proof-carrying code [36]. This is vastly more powerful and flexible than typed assembly language, but it is also more difficult to produce the proofs. It remains an active research area to develop industrial-strength certifying compilers that produce proof- carrying code. Another active research area concerns producing better certified bytecode representations of high-level programs [3].
Goal: Find bugs faster than via software testing. In most software engineering efforts, more time is spent on testing software than on writing software. The goal of software testing is to find as many bugs as possible. It is widely believed that no large piece of software can be bug free, even after extensive testing. Part of the testing process can be automated, but testing remains labor intensive and therefore costly. It is particularly time consuming to invent test cases and to figure out what the correct output should be. Modern programming languages help with bug finding, even without running the pro- gram. Structured programming, variables instead of registers, static type checking, auto- matic memory management, and certifying compilers all contribute to lowering the number of bugs. Still, a program can contain errors in the program logic that won’t be caught by, say, the Java type system. The idea of model checking [14] is to try to find bugs in a model, that is, an abstraction of the program. If the model faithfully preserves some aspects of the program and eliminates others, then it will preserve some bugs and eliminate others. Thus, if we can find bugs in the model, then those bugs will likely correspond to bugs in the program. Creating good models is difficult. On one hand, we want small models that will make powerful bug finding algorithms feasible, and on the other hand, we want models that are large enough to contain at least some of the bugs. For example, suppose we have a program and want to check whether all read operations on a file only happen after the file has been opened. We can create a model which eliminates all operations on data other than files, and which for every conditional statement embodies the abstraction that both branches are executable. This abstraction may well eliminate some bugs in the part of the code that is not concerned with files. However, if we find a read on a file without a preceding open operation, then chances are that we have identified a bug in the original program. A model checking problem has two components: the model and the property we want to check. To enable flexible bug finding, we want to have a property language in which we can express a variety of properties to be checked. Otherwise, we would need a separate model checking algorithm for every property. One of the popular approaches to designing a property language is to base it on regular expressions. For example, we might choose the alphabet {open, read, write, close}, and check the property:
open · read∗
We can understand the property as stating that a file must be opened exactly once, and only after that can it be read any number of times. A somewhat different property could be:
open · ( read | write )∗ · close
The same general model checking algorithm would handle both properties and possibly report bugs. While it is easiest to build the model independently of the desired property, it can lead to faster model checking if the model is property driven. For example, the first property above has no need for a model with write and close operations. Model checking has been immensely successful at bug finding for hardware. Software model checking has turned out to be more difficult and remains an active research area [16, 6].
4 Efficiency
We say that software is efficient when it can do its job using the available resources. A variety of resources can be available to software, including time, space, power, databases, and networks. When resources are constrained, a programming language can help with using them judiciously by offering resource-aware language constructs and compilers. A variety of approaches to resource awareness have been devised, and some of them will be highlighted in the following discussion of several aspects of efficiency.
Goal: Complete the task faster.
For desktop computing, users want spreadsheets that update the fields quickly after changes, they want internet searches to complete quickly, they want games with fancy graph- ics and computer players that move quickly, and so on. High speed is in practice achieved by a combination of good algorithms, careful coding, good compilers, etc. A good programming language can contribute to fast execution time in at least two ways. First, the language can make it easier to express efficient algorithms. For example, a recursive algorithm such as quicksort is easier to express in a language such as C than in assembly language because recursion is supported in C. Second, the compiler can have a major impact by doing good instruction selection, register allocation, etc. The difference between a simple compiler and a highly optimizing compiler can be several factors when measuring execution speed. The traditional role of a compiler is to do its job “way ahead of time” (WAOT), that is, before the compiled code is run. A WAOT compiler can take as long time as it likes, as long as it generates efficient code. Usually, a highly-optimizing compiler has the following structure:
a memory area after it has been deallocated, a bug known as the “dangling pointer” error. A program might also keep allocating heap space indefinitely, a bug known as a “memory leak.” Some of these problems can be avoided by dealing with the heap at a higher level of abstraction and leaving the details to a memory management system. The memory manager is a part of the run-time system which is present during the execution of a program. For example, in Java all dynamic memory allocation is done with expressions of the form
new C()
or something similar, where C is a class name. The memory manager will allocate space for a C object, that is, sufficient heap space for representing the fields and methods of a C object. When the program can no longer reach the object, then the memory manager will automatically deallocate it. The automatic deallocation, also known as garbage collection, ensures that dangling-pointer errors cannot occur. A type system can help the compiler when it computes how much space is needed for the fields of an object. In Java, a field of type int takes four bytes, while a field of type long takes eight bytes. For a language without a static type system, it may not be easy to determine ahead of time what will be stored in a field. The usual solution is to “box” the data in the field, that is, instead of storing the data itself in the field, the program will store a pointer to the data. Given that all pointers have the same size, it is easy to determine the size of each field. A compiler can help save data space by doing a data-flow analysis to determine, for example, whether the values stored in a field of type long really go beyond what could be stored in a field of type int. If not, then the compiler can treat the field as if it is of type int, and thereby save four bytes. Such an optimization is particularly important in embedded systems where memory may be limited [4]. The use of dynamic memory allocation can make it difficult to determine an upper bound on the need for heap space. As a consequence, some embedded software refrains entirely from dynamic memory allocation and instead allocates all data in a global area or on a stack. Even without dynamic memory allocation, it can be difficult to determine an upper bound on the need for stack space [12]. In particular, recursion makes it difficult to find such upper bounds. So, some embedded software also refrains from using recursion. A compiler can help save stack space by inlining procedure calls. If a procedure is called more than once, such inlining may in turn increase the code size, creating a trade off between stack size and code size. Current research includes developing memory managers that can allocate and deallocate data more efficiently and with smaller overhead [9]. Another active area is the development of methods for statically determining the need for heap space in recursive programs with dynamic memory allocation [26]. Yet another active area is resource-aware compilation [35, 39], where the compiler is told up front about resource constraints, such as memory limits, and the compiler then generates code that meets the requirements, or tells the programmer that it cannot be done.
5 Concluding Remarks
As the need for software grows, so does the need for better programming languages. New directions include power-aware compilers that can help save energy in embedded systems, and XML processing languages for programming of web services.
Acknowledgments. Thanks to Mayur Naik, Vidyut Samanta, and Thomas VanDrunen for helpful comments of a draft of the chapter.
References
[1] Jonathan Aldrich, Craig Chambers, Emin G¨un Sirer, and Susan J. Eggers. Static anal- yses for eliminating unnecessary synchronization from Java programs. In Proceedings of SAS’99, 6th International Static Analysis Symposium, pages 19–38. Springer-Verlag (LNCS 1694), 1999.
[2] B. Alpern, C. R. Attanasio, J. J. Barton, M. G. Burke, P.Cheng, J.-D. Choi, A. Cocchi, S. J. Fink, D. Grove, M. Hind, S. F. Hummel, D. Lieber, V. Litvinov, M. F. Mergen, T. Ngo, J. R. Russell, V. Sarkar, M. J. Serrano, J. C. Shepherd, S. E. Smith, V. C. Sreedhar, H. Srinivasan, and J. Whaley. The Jalapeno virtual machine. IBM System Journal, 39(1), February 2000.
[3] Wolfram Amme, Niall Dalton, Michael Franz, and Jeffery Von Ronne. SafeTSA: A type safe and referentially secure mobile-code representation based on static single assign- ment form. In Proceedings of PLDI’01, ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 137–147, 2001.
[4] C. Scott Ananian and Martin Rinard. Data size optimizations for Java programs. In LCTES’03, Languages, Compilers, and Tools for Embedded Systems, 2003.
[5] David F. Bacon, Ravi B. Konuru, Chet Murthy, and Mauricio J. Serrano. Thin locks: Featherweight synchronization for Java. In Proceedings of PLDI’98, ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 258–268,
[6] Thomas Ball, Rupak Majumdar, Todd Millstein, and Sriram Rajamani. Automatic predicate abstraction of C programs. In Proceedings of PLDI’01, ACM SIGPLAN Con- ference on Programming Language Design and Implementation, pages 203–213, 2001.
[7] David A. Basin. The next 700 synthesis calculi. In Proceedings of FME’02, International Symposium of Formal Methods Europe, page 430. Springer-Verlag (LNCS 2391), 2002.
[8] Gerald Baumgartner, Konstantin L¨aufer, and Vincent F. Russo. On the interaction of object-oriented design patterns and programming languages. Technical Report CSD- TR-96-020, 1998. citeseer.nj.nec.com/baumgartner96interaction.html.
[23] James Gosling, Bill Joy, and Guy Steele. The Java Language Specification. Addison- Wesley, 1996.
[24] C. Hoare. Monitors: An operating system structuring concept. Communications of the ACM, 17(10):549–557., October 1974.
[25] C. A. R. Hoare. An axiomatic basis for computer programming. Communications of the ACM, 12(10):576–580, 1969.
[26] Martin Hofmann and Steffen Jost. Static prediction of heap space usage for first-order functional programs. In Proceedings of POPL’03, SIGPLAN–SIGACT Symposium on Principles of Programming Languages, pages 185–197, 2003.
[27] Brian W. Kernighan and Dennis M. Ritchie. The C Programming Language. Prentice- Hall, 1978.
[28] David Lacey, Neil D. Jones, Eric Van Wyk, and Carl Christian Frederiksen. Proving correctness of compiler optimizations by temporal logic. In Proceedings of POPL’02, SIGPLAN–SIGACT Symposium on Principles of Programming Languages, pages 283– 294, 2002.
[29] Microsoft. Microsoft Visual C#. http://msdn.microsoft.com/vcsharp.
[30] Microsoft. The .NET common language runtime. http://msdn.microsoft.com/net.
[31] Robin Milner. A theory of type polymorphism in programming. Journal of Computer and System Sciences, 17:348–375, 1978.
[32] Robin Milner, Mads Tofte, and Robert Harper. The Definition of Standard ML. MIT Press, 1990.
[33] Greg Morrisett, David Walker, Karl Crary, and Neal Glew. From System F to typed assembly language. ACM Transactions on Progamming Languages and Systems, 21(3):528–569, May 1999.
[34] Peter D. Mosses. Denotational semantics. In J. van Leeuwen, A. Meyer, M. Nivat, M. Paterson, and D. Perrin, editors, Handbook of Theoretical Computer Science, vol- ume B, chapter 11, pages 575–631. Elsevier Science Publishers, Amsterdam; and MIT Press, 1990.
[35] Mayur Naik and Jens Palsberg. Compiling with code-size constraints. In LCTES’02, Languages, Compilers, and Tools for Embedded Systems joint with SCOPES’02, Soft- ware and Compilers for Embedded Systems, pages 120–129, Berlin, Germany, June 2002.
[36] George Necula. Proof-carrying code. In Proceedings of POPL’97, 24th Annual SIGPLAN–SIGACT Symposium on Principles of Programming Languages, pages 106– 119, 1997.
[37] Flemming Nielson. The typed lambda-calculus with first-class processes. In Proceedings of PARLE, pages 357–373, April 1989.
[38] Jens Palsberg. Type-based analysis and applications. In Proceedings of PASTE’01, ACM SIGPLAN/SIGSOFT Workshop on Program Analysis for Software Tools, pages 20–27, Snowbird, Utah, June 2001. Invited paper.
[39] Jens Palsberg and Di Ma. A typed interrupt calculus. In FTRTFT’02, 7th International Symposium on Formal Techniques in Real-Time and Fault Tolerant Systems, pages 291–
[40] Gordon D. Plotkin. A structural approach to operational semantics. Technical Report DAIMI FN–19, Computer Science Department, Aarhus University, September 1981.
[41] Massimiliano Poletto and Vivek Sarkar. Linear scan register allocation. ACM Transac- tions on Progamming Languages and Systems, 21(5):895–913, 1999.
[42] Richard L. Wexelblat. History of Programming Languages (Proceedings). Academic Press, 1981.
[43] Andrew Wright and Matthias Felleisen. A syntactic approach to type soundness. In- formation and Computation, 115(1):38–94, 1994.