Teaching Software Engineering

-- lessons from MIT

by Hal Abelson (http://www-swiss.ai.mit.edu/~hal/) and Philip Greenspun (http://philip.greenspun.com)

Presented at the Tenth International World Wide Web Conference (Hong Kong), May 1-5, 2001.

Abstract
This is a report on what we've learned during the first four semesters of teaching a new subject at MIT: Software Engineering of Innovative Internet Applications. We present new ideas in teaching computer science students to build the kinds of applications demanded by society. We discuss methods for involving alumni as teaching assistants and coaches. We argue for the method of helping students achieve fluency by assigning five complete applications for construction in a semester rather than the traditional single problem in a software engineering semester.

Why is software engineering part of the undergraduate computer science curriculum? There are enough mathematical and theoretical aspects of computer science to occupy students through a bachelor's degree. Yet most schools have always included at least some hands-on programming. Why? Perhaps there is a belief that someone with an engineering degree ought to be able to engineer the sorts of systems that society demands. In the 1980s, users wanted desktop applications. Universities adapted by teaching students how to build a computer program that interacted with a single user at a time, processing input from the mouse and keyboard and displaying results graphically. Starting in the early 1990s, however, demand shifted toward server-based Internet applications. With 1000 users potentially attempting the same action at the same instant, the technical challenge shifts to managing concurrency and transactions. Given stateless protocols such as HTTP, software engineers must learn to develop stateful user experiences. Given the ubiquitous network and evolving standards for remote procedure calls, students can learn practical ways of implementing distributed computing.

Once we've taught students how to build Internet applications, it is gratifying to observe their enormous potential. A graduate of MIT in 1980 was, by his or her efforts alone, able to reach only a handful of users. Before graduating, a member of the MIT Class of 2001, however, is able to write a program that hundreds of thousands of people will use. For example, one student team in our course built arfdigita.org, a Web service that provides a way for animal shelters to maintain a current list of adoptable pets and for users to search across all shelters for particular kinds of animals within a specified distance of their home. Two years later, the arfdigita.org site is still running on behalf of several hundred animal shelters nationwide. Another team built a photo sharing service launched to the users of photo.net. Through March 2001, the software built by the students is holding more than 50,000 photographs on behalf of roughly 4000 users.

What deep principles do they need to learn?

To contribute to the information systems of the next 20 years, in addition to the material in the core computer science curriculum, we have to teach students:

object-oriented design where each object is a Web service (distributed computing)
about concurrency and transactions
how to build a stateful user experience on top of stateless protocols
about the relational database management system

As discussed in the introduction, it looks as though distributed computing is finally possible, partly due to the rise of standards such as SOAP and WSDL and partly due to the old adage that "The exciting thing in computer science is always whatever we tried 20 years ago and didn't work." Everyone is familiar with standalone database-backed Web services such as Amazon.com. Most programmers these days could sit down and build something like Amazon.com. But an increasingly common engineering challenge will be building a Web service that pulls data and computing resources from other servers before delivering a summarized result to the user. Students need experience building servers that talk to other servers before delivering a Web, WAP, or voice response to their user.

Universities have long taught theoretical methods for dealing with concurrency and transactions. The Internet raises new challenges in these areas. A dozen users may simultaneously ask for the same airline seat. Twenty responses to a discussion forum question may come in simultaneously. The radio or hardwired connection to a user may be interrupted halfway through an attempt to register at a site. Starting in 1994 there has been a convergence of solutions to these problems, with the fundamental element of the solution being the relational database management system (RDBMS). At a school like MIT where the RDBMS has not been taught, this gives an opportunity to introduce SQL and data modelling with tables. At a school with an existing database course, our course can be used to get students excited about using the RDBMS as a black box before they embark on a more formal course where the underpinnings are explained.

Scientists measure their results against nature. Engineers measure their results against human needs. Programmers ... don't measure their results. As a final overarching deep principle, we need to teach students to constantly measure their results against the end-user experience. Anyone can build a Web service. The services that are successful and have impact are those whose data model and page flow permit the users to accomplish their tasks with a minimum of time and confusion.

What skills do they need to learn?

At MIT we concentrate on teaching principles rather than skills. For example, there is no course in the computer science department that teaches a computer language. Students learn Lisp as a notation for the computer science concepts in our first course. Students write code in Java in the first software engineering course. But we don't go through one language feature per lecture as some schools might.

This sounds worrisome. Suppose that an MIT student spends summers as a camp counselor and graduates without having learned any practical skills. Are badly engineered systems going to be inflicted on users? In the old world, no. We didn't have to teach computer science students any skills because they'd graduate into a job at Hewlett Packard or IBM. Sitting next to an experienced engineer, our graduate would learn his or her craft over a six-year period and emerge, at age 28, to lead a project or become the chief technologist at a small company.

With the Web explosion, however, came an explosion in the number of organizations engaging in software development. Teams are smaller, deadlines are shorter, and there aren't enough qualified project leaders to go around. In the superheated job markets of 1999 and 2000 we noticed a surprising number of our graduates starting off as CTOs of startup companies or in lead engineering roles on Web projects for larger organizations. Thus we are hoping to teach some of the more important skills of an experienced engineer, notably rapid application development and dealing with extreme requirements.

We'd like our students to be able to take vague and ambitious specifications and turn them into a system design that can be built and launched within a few months, with the most important-to-users and easy-to-develop features built first and the difficult bells and whistles deferred to a second version. We'd like our students to know how to test prototypes with end-users and refine their application design once or twice within even a three-month project.

Business decision-makers are no longer shy about presenting software engineers with extreme requirements, e.g., "build a new accounting system for a mid-size company within three months" or "build a publishing system with a customized workflow within one month and, by the way, we'll change our mind mid-stream about how we want it to work." We teach two methods of dealing with extreme requirements. The first is via automatic code generation. Perhaps the system requirements can be represented in a machine-readable form. Then a portion of the software can be generated by a computer program. If the requirements change mid-stream, we need only run the code generator again. The second method of dealing with extreme requirements is via use of a toolkit. For example, in the case of the three-month accounting system project, starting with a toolkit such as SAP or Oracle Applications is probably a much better idea than trying to write all the code from scratch. But using a toolkit requires students to develop the skill of not reading all the documentation and being judicious about which parts of the toolkit they study and adopt.

Survey courses considered helpful?

Suppose that one were convinced that these are the correct topics to teach a computer science undergraduate. Should we teach them one at a time, in-depth? Or should we start with a survey course that teaches all concepts simultaneously in the context of building actual applications?

Students in a traditional computer science curriculum will

spend a term learning the syntax of a language
spend a term learning how to implement lists, stacks, hash tables
spend a term learning that sorting is O(n log n)
spend a term learning how to interpret a high-level language
spend a term learning how to build a time-sharing operating system
spend a term learning about the underpinnings of several different kinds of database management systems
spend a term learning about AI algorithms

Students in MIT course 6.001 (Structure and Interpretation of Computer Programs) learn all of the above in one semester, albeit not very thoroughly. By the end of the semester, they're either really excited about the challenges in computer science or... they've switched to biology.

Survey courses have been similarly successful on the electrical engineering side of our department. In the good old days, MIT offered 6.01, a linear networks course. Students learned RLC networks in detail. But they forgot why they'd wanted to major in electrical engineering. Today the first hardware course is 6.002, where students get to play with op-amps even though they don't yet know what a transistor is!

One of the most celebrated courses at MIT is the Aeronautics and Astronautics department's Unified Engineering. Here is the first semester's description from the course catalog:

Presents the principles and methods of engineering, as well as their interrelationships and applications, through lectures, recitations, design problems, and labs. Disciplines introduced include: statics, materials and structures, dynamics, fluid dynamics, thermodynamics, materials, propulsion, signal and system analysis, and circuits. Topics: mechanics of solids and fluids; statics and dynamics for bodies systems and networks; conservation of mass and momentum; properties of solids and fluids; temperature, conservation of energy; stability and response of static and dynamic systems. Applications include particle and rigid body dynamics; stress and deformations in truss members; airfoils and nozzles in high-speed flow; passive and active circuits. Laboratory exposure to empirical methods in engineering; illustration of principles and practice. Design of typical aircraft or spacecraft elements.

Note that this is all presented in one semester, albeit with double the standard credit hours! For almost every topic in the course description, MIT has one or more full-semester courses exclusively devoted to that topic. Is Unified Engineering a pedagogical success? One piece of evidence is MIT's number 1 ranking by U.S. News in Aero/Astro. More compelling evidence is provided by the fact that you're undoubtably flown on an aircraft with systems designed by MIT Course 16 graduates. And you're still alive.

Experiences like these led us to develop 6.916, Software Engineering of Innovative Internet Applications, a survey course in building computer systems for collaboration. We've taught it for four semesters and are currently revamping it. In this paper we'll call the original version "6.916 Classic" and the next revision "Nouveau 6.916"

Our survey course

The fundamental characteristics of 6.916 Classic were the following:

problem sets provided the core knowledge
teaching assistants, oftentimes alumni, worked side-by-side with students in a supervised twice weekly lab to help with design issues and questions of taste
projects ran the entire length of term, starting with application design and ending with intense software development
by the end of the semester, each student had built four or five database-backed Internet applications

The students who took our course were typically juniors and seniors in computer science who've completed 6.170, the standard software engineering lab at MIT. Thus they had at least basic debugging skills. We did not assume any knowledge of SQL, HTTP, HTML, or any particular imperative programming language. We tried to limit the class size to between 30 and 45 students.

We spent three hours per week in a traditional classroom. We used this time to teach background principles, explain the inner-workings of substrate systems, and encourage thinking about usability. We spend portions of some early classroom sessions on public reviews of student-authored code. But the most important use of the lecture podium was by the students themselves. We devoted approximately one quarter of the classroom time to student groups presenting their projects. Each project was presented first as an application design and development plan, second as a skeletal prototype, and finally as a completed application. The average quality of the presentations got much higher toward the end of the semester.

Much more important than the classroom time was the work that students did on problem sets. Software engineering is a craft and can only be learned by practice. Ideally any craft is learned via apprenticeship to a master. Consequently, we invited alumni who were working as professional software engineers to return to campus on Tuesdays and Thursday evenings to coach students during the 6 hours of supervised laboratory time per week. There are perhaps 10 alumni out there for every current MIT undergraduate. Very few universities would find it practical to pay teaching assistants to sit next to students during the design and implementation of custom software. Students report that one of their favorite aspects of 6.916 has been that they get feedback on their work before they submit it for grading, when there is still time to improve their designs.

During the four times that we taught the course at MIT we elected to start the project phase of the class earlier every semester. It seems that the best practice is to automatically drop any student from the course who does not complete the first problem set on time. Thus, approximately two weeks into the semester we were able to assign students to teams with a reasonable expectation that all team members would be present at the end of the term. Project work ran concurrently with problem sets during the middle portion of the course.

At the end of the semester, a student in 6.916 could look back upon four or five completed Internet services. The first ones that he or she built had been done for the problem sets. They won't have been complex. They may not have been built to a very high standard of polish. But their existence enabled nearly all students to become fluent in the arts of designing a data model, specifying a page flow, and implementing the designed system in SQL and a procedural language. Our experience contrasts with typical software engineering courses in which a student builds only one application (or a piece of one application) during the entire semester. Research on simple word association tasks has demonstrated that people who learned to perform quickly but not accurately would have remarkably good recall even months later and, with a bit of practice, could always be made to perform accurately. Whereas people who were slow but accurate forgot all of their skills within a month or two. An alternative data set may be collected in your local Starbucks. Ask the people behind the counter how many times they made a Cappuccino before they got it right reliably. Usually the answer is at least 10. So how can we expect our graduating students to be fluent at the job of building software systems unless they've built at least 10?

Let's look at some of the elements of 6.916 Classic in more detail.

Problem Sets

The first problem set we called "basics". Students learn how to write and debug a computer program that runs inside a Web server. They learn how to write SQL queries and connect to the database management system from an operating system shell and from a Web script. Students learn a bit about Web standards, e.g., how to read and write data in XML and how to personalize Web services with HTTP magic cookies.

At MIT all of the students are in one room during their work on the first problem set. Each student has a computer running an ssh client, an X server, and a Web browser. As it happens all of the desktop computers are MIT Athena workstations (built on low-end commercial Unix machines, such as the Sun Ultra 10). They use this workstation to connect to one of four servers in a machine room. Each server is a pizza-box Sun server running Solaris, the Oracle RDBMS, one Web server process per student, and one GNU Emacs text editor process per student. Typically we can comfortably support 1 student for every 128 MB of RAM on the server.

The course materials have been translated by teachers at other universities to work in Microsoft SQL Server. Most of the important thinking in the course is done during data modelling, page flow design, and SQL transaction design. Development of abstractions may be done in procedural languages that run inside the RDBMS, e.g., the Oracle system can execute PL/SQL and Java code. However, there is also a mechanical step of building code for presentation, i.e., wrapping the results of an SQL query in an HTML template. The course materials were available for students to develop presentation layer code in Java Server Pages (JSP) and also for students to develop presentation layer code in the Tcl scripting language. Portions of the course materials were also translated to work with the Microsoft .NET system, which favors a presentation layer using Visual Basic. Of the three options, students report that they are able to develop pages fastest in Tcl and this is what we've used primarily at MIT (though as noted below in 6.916 Version 2.0 we're not going to mandate any particular set of tools). See http://philip.greenspun.com/teaching/psets/ps1/ps1 for the full problem set (Oracle/AOLserver version).

In the second problem set ("reservation system"), students built a collaborative conference room scheduling system. This raises the problem of concurrency in a natural manner. Every student can understand that you don't want to book two people into a room at overlapping times. The problem set is designed so that nearly every student can get the basic scheduler working quickly. That's V1.0. Exercise 5 in the pset asks the students to build version 2.0, in which powerful "uber users" can bump regular users out of their reserved slots and in which for some rooms the reservations must be approved by a room administrator, who is notified by the server sending email upon someone requesting a room. In the final exercise of the problem set, we ask the students to mark certain rooms as requiring fees. Users who wish to book those rooms must supply a credit card number. At MIT we hook up the servers to a live merchant account at CyberCash. Thus our better students will be able to open their credit card statements in the middle of the semester and discover a few dollars in charges made by their own Web server. For students who can't get this far during the 10 days that we allot for the problem set, they are still left with a working V1.0 or V2.0 reservation system. Thus do the students learn the virtues and satisfactions of incremental development and rapid prototyping. See http://philip.greenspun.com/teaching/psets/ps2/ps2 for the complete problem set.

During the second problem set we introduced them to the idea of using a toolkit when developing Internet services. Instead of building their own registration, user management, and user group system for the conference room schedule, the students install the free open-source ArsDigita Community System. This toolkit was originally developed at the MIT Laboratory for Computer Science and thus the authors had some personal affection for it. As noted below, however, we've concluded that using a toolkit was a mistake and have designed Nouveau 6.916 differently.

After the first two problem sets, instructors are free to mix and match problem sets. We have a "family tree" problem set that teaches students to build a collaborative authoring environment for structured multi-media data. They learn the unfortunate intricacies of storing and indexing large objects (photographs and stories) in the Oracle database while building an application that all of their relatives can use to upload facts about their family tree. If you pay Federal Express $6 to deliver a letter, you get visibility. Using a Web browser, you can find out where in the system your package is. If you pay MIT $120,000 to educate your child, why shouldn't you be able to see what is happening in the classroom? Well, in 6.916 Classic the students' parents could!

Students said that the "metadata" problem set was very valuable for speeding work on their projects. Students were asked to build a knowledge management system by writing a computer program to write all of the computer programs. I.e., we gave them a machine-readable language for representing the system capabilities and user experience and asked them to write a program to generate the SQL data model and then the scripts to support the user experience. See http://philip.greenspun.com/teaching/psets/ps4/ps4 for details.

The "SOAP/SDL" problem set teaches students about using emerging Web standards for services that invoke procedures on other services. Students write programs that read Service Description Language (SDL) contracts and generate SOAP requests. Students learn to describe their own services in SDL. During the Fall 2000 semester Microsoft Research provided a SDL/SOAP-compliant version of the Terraserver satellite imagery service and our students were able to build systems that seamlessly integrated Terraserver data. See http://philip.greenspun.com/teaching/psets/services/tcl/.

Content management is a serious problem at most large Web sites. Fundamentally the problem is one of computer support for a collaborative workflow. We specify a workflow and approval process and ask the students to implement a system that supports them. This is a difficult problem set and should be assigned last. See http://philip.greenspun.com/teaching/psets/ps3/ps3.

Projects

For every project in 6.916 Classic, we insisted on having a client. This is a person who can describe desired capabilities for an information system but offers no hint as to how to build it. The best clients are people who are in fact passionate about some sort of Internet service and completely clueless about all matters technical. Good sources of clients are dotcom CEOs, MBA students, non-profit organization directors, and university administrators. Students have the best opportunity for success when the client has a clear idea of what features are essential and when the client responds quickly to email announcing the availability of a revised prototype. But clients who are fickle and change their minds about the application upon seeing a first prototype are instructive for the entire class. Welcome to the real world!

For making sure that projects stay on track, one valuable technique was to have each student group, with its TA, present privately to the lecturers once per week during the evening lab hours.

The best projects were ones with clients who had the wherewithal to extend and maintain the service after the course is over, possibly by hiring the students who built it. For example, we've had students build a volunteer matching and event coordination system for a group within MIT. The group was already up and running doing a dozen or more events every year and managing thousands of volunteer-days. So they had a real interest in a better computer support for their work and the ability to launch the system within the MIT community.

At the end of the semester we drill into the students' heads the cold hard facts of the world: nobody owes them attention. We have each student group prepare an overview page that is a single HTML document, with a few screen shots, that demonstrates the major functions of the Internet service that they've built. Visit http://philip.greenspun.com/seia/gallery/ to see these pages.

Exams

Students do the problem sets. They work on their projects. Do they come to lecture? Read the various journal articles, handouts, and textbooks assigned? Develop the ability to think critically about the system architecture, data model, and user experience? Not all of them used to. So we started giving exams!

The 6.916 Classic mid-term is a one-week take-home exam. A student who is really up-to-date with all of the reading might be able to complete it in one or two hours. We test a student's ability to go from vague business requirements to a reasonably specific set of Web service requirements and explain why the resulting service is an effective use of the Internet.

The final exam was given ex-camera. We say

You can take it where you want to and use any resources you like, except that you must do the exam by yourself. In order to do the exam, you will need access to a Web browser. We suggest that you find a place to take the exam where you have quiet, private use of a web browser, a text editor, and email.

We gave the students a six-hour block of time in which to complete the exam, though we expected them to spend no more than two hours. On the final, we asked students to normalize an SQL data model, critique the usability of a few public Web services, think about concurrency control and transaction management with the RDBMS, and address the issues of community building online.

Running a final exam with public Internet services yields surprising results. For example, in one question we asked the students to visit carnival.com and try to figure out if they offered gay-friendly cruises (they do). But the carnival.com site went down for many hours during the afternoon that we gave the exam. A follow-up question asked the class to

Visit http://boards.gay.com and see if you can find any information on gay-friendly Caribbean cruises (hint: start in the "travel" category).

One student, not a native English speaker, ignored our advice to start in the "travel" category on the gay.com bboards. Instead, he wrote that he typed "cruising" into the search engine on gay.com...

Adoption by other universities

We did not set out to export our course to other universities. However, because of (1) effective use of the Web, and (2) a clear policy permitting reuse, the course exported itself.

We wrote three textbooks for the course, plus lecture notes, handouts, and problem sets. Our preferred word processor is the Emacs text editor and we like writing directly in HTML. Thus the natural place to author course content was on a Web server and we simply left the materials in place for student access. The two authors having been steeped in the open-source software tradition for a combined total of five decades, it seemed natural to include explicit statements permitting reuse.

The result? Students at about 15 universities worldwide have substantially taken 6.916, with students at another few hundred schools having been pointed to our textbooks. A lot of the students who've taken an exported 6.916 are studying at a "usual suspects" school, e.g., Caltech, University of California Berkeley, UCLA, University of Massachusetts, University of Michigan, University of Pennsylvania. But we've had surprise interest and success at schools in unlikely spots such as Australia, Guatemala, Israel, and Spain.

Use in Industry

The problem sets from 6.916 have proven very popular in industry. Professional software developers are able to do three or four problem sets in two- or three-week "boot camps". When the boot camp starts you have a group of programmers. Two or three weeks later you have a group of Web developers. If they did not know SQL, HTTP, or how to design an HTML experience before the boot camp, they are facile with these systems after a week or two. A working programmer already has experience turning client specifications into a prototype. Thus the project aspect of 6.916 is not essential for an industrial version of the course.

More than 5,000 industrial programmers have trained themselves using the problem sets, either in boot camps on-site at companies such as Hewlett-Packard, at home with their personal computer, or at free boot camps run by ArsDigita Corporation.

Lessons from MIT labs and ArsDigita University

We have some related anecdotal experience with students learning computer science and software engineering in Cambridge, Massachusetts. The first observation is that the best programmers among the undergraduates at MIT are those who've worked on research projects (UROPs) at an MIT laboratory. For example, the Artificial Intelligence Laboratory, the Laboratory for Computer Science, and the Media Lab are renowned for inspiring undergraduates to build ambitious software systems. There are some similarities between a successful UROP and a 6.916 project. In both cases there is a client with a rough sketch of what is to be built. In both cases the student will meet periodically with the client to figure out how to improve and extend what has been built so far.

Another data point comes from ArsDigita University. This is a one-year post-baccalaureate program in computer science. Instead of dispersing after class, the 36 students do all of their work together in one room full of desks. Instructors who've taught the same material at MIT and ArsDigita University report that the ADU students perform better on examinations and seem to have learned more, despite having generally come from a non-technical background. We attribute the success of ADU in large part to the fact that the students are constantly available to each other and also that teaching assistants are available most of the time in the same room. In 6.916 we think that the shared supervised laboratory is the most valuable part of the course at the beginning of the semester. Students might otherwise have gotten stuck because they didn't know where to find the error log or some other such simple mechanical fact.

Warts in 6.916 Classic

In March 2001 we stepped back once more to ask "What hasn't worked about 6.916?"

The project aspect of 6.916 Classic has been both its greatest glory and its greatest failing. Learning on projects was not very uniform. Some students ended up with projects that exposed them to a big range of challenges but others skated by doing something trivial. Because each project had different goals that were client-dependent, the students weren't able to have very meaningful exchanges concerning their projects amongst themselves. Suppose student group A was presenting something with a clumsy interaction design. The other students couldn't be absolutely sure that it could be done in a more user-friendly fashion because they'd not had to build anything similar. Furthermore, the students attending the presentation did not have full information about what the client might have specified or requested. Finally, professors at other universities complained that, while they were attracted to the 6.916 course materials, they couldn't see how the project aspect was manageable in their 300-person classes. We symphathized because even with 30 students it was a bit of a challenge every semester to make sure that every student team had a reasonable client and every client had a reasonable student team.

Though as noted above the authors had some personal affection for ArsDigita Community System (ACS), it turns out that its use in 6.916 Classic has been problematic. On the plus side, the use of ACS on projects often meant that students were able to build sophisticated systems very quickly. ACS 3.4, for example, provides out-of-the-box capabilities that are a superset of what you see at the 100,000-user online community photo.net. However, the downside was many students didn't reflect carefully on what the user experience ought to be for, say, discussion forums, or user registration. They just accepted the ACS-imposed data model and page flow uncritically. This was a serious problem because the ability to think critically about data model and page flow are the most important skills for an Internet service developer.

We've never been happy with the amount of distributed computing that we've been able to include in 6.916 Classic. We touch on this subject in the first problem set ("basics") and return to it in the SOAP/WSDL problem set but the core of the course was building standalone database-backed Internet services. This might not be such a bad thing. Students need to crawl before they can walk and run. However, many of the most interesting and challenging problems of the coming decades will center around the Internet and Web as a distributed computing environment. An "island" site such as amazon.com or photo.net is probably less representative of the future than http://philip.greenspun.com/WealthClock (Bill Gates Personal Wealth Clock).

On a purely practical level, we have to consider that five years from now people will laugh if a student shows someone an application that can only be used from a Web browser. An engineer should know how to build an application that is useful from the Web, from a mobile phone (WAP), and to a human who wishes only to speak and listen. Many of our students have already built projects with WAP interfaces and the adaptation to WAP usually only takes an evening or two. We've as yet only done one student project with a voice interface. All the students need to become, uh, conversant with VoiceXML and more advanced speech processing systems (see http://philip.greenspun.com/seia/voice/).

Finally, we've not been 100 percent happy with our information systems for course management. In partnership with the MIT Sloan School, we've been improving our systems that keep track of (1) who is teaching which courses, (2) who is taking which courses, (3) what are the assignments for a course and when are they due, (4) what grades and comments has a student received, etc. One nice feature of the systems that we've launched so far is that students can go to a single Web page and see a unified calendar of all their obligations. But much more important is that these kinds of systems theoretically enable alumni tutors to see what is going on in an MIT course and offer useful assistance to current students. One of the most important aspects of 6.916 Classic is that we've been able to bring alumni back to campus to share their industrial software engineering expertise. But the process of tutoring hasn't been very structured. So we've not been able to incorporate TAs that aren't local to the students and able to meet face-to-face.

Nouveau 6.916

In an attempt to fix the problems mentioned in the previous section we're developing a new textbook and problem sets, banning the use of toolkits, and limiting the range of projects.

First, let's discuss the projects. We want all the students to be working on projects that are vaguely similar. We've settled on the challenge of online learning communities. An online learning community is any system where users are attracted by a body of magnet tutorial content, can post their comments on that content, and can ask and answer questions of each other. Examples of these communities are public sites such as photo.net and corporate knowledge sharing systems. The most successful ecommerce sites, e.g., Amazon, have strong similarities to the best online learning communities, with users able to collaborate via reader reviews, lists of favored items, uploaded portraits and self-descriptions, and rankings of users according to magnitude of contribution to the community. For Nouveau 6.916 the attractions of settling on online learning communities are (1) the problem area is infinitely rich and challenging due to the idiosyncracies of human beings and the learning process, (2) it is easy to find small non-profit organizations that want to operate public online learning communities, (3) it is easy to find medium-sized enterprises that want to operate knowledge sharing systems, and (4) in a class of 300 or 400 at a public university where it isn't practical to find 100+ clients the student teams can build sites related to personal interests and passions.

Second, Nouveau 6.916 does not mandate the use of a toolkit. We still think that students should learn how to engineer using a toolkit. And a portion of the lecture time and student presentations will be devoted to how widely used toolkits solve some of the same problems that students are attacking. But if a student wants to build something idiosyncratic using a toolkit that should be done in a second semester special projects course. Or as a senior thesis project.

Because we're no longer using a toolkit the student projects will proceed more slowly. Each problem set will treat the construction of a module of their project in a deliberate and careful fashion (see syllabus below).

Third, because all of the projects have a predictable shape we'll be able to introduce distributed computing challenges merely by having students offer services to each other.

Fourth, introduction into the problem sets/projects of WAP and speech interfaces will be straightforward. Students without mobile phones can use WAP emulators to debug systems that they've developed. Free services such as tellme.com will allow students to debug voice applications written in VoiceXML. It would be nice to provide students with practical exposure to superior conversational speech interfaces but these are research systems (see JUPITER at http://groups.csail.mit.edu/sls/applications/jupiter.shtml) and right now are probably impractical for use in a tight survey course.

Finally, we will no longer mandate that all students in a given class use the same tools. If two students have used PostgreSQL, Apache, and mod_perl during their summer job and want to do their coursework using the same infrastructure, fine. If a student wants to use Windows, SQL Server, Microsoft .NET, and C#, fine. If a student wants to use Linux, Oracle, and AOLserver/Tcl, fine. If a student wants to use Solaris, Oracle, and Tomcat/Java, fine. Students who want to use course-provided infrastructure rather than their own machines will have a more limited choice (at MIT we'll probably offer them Unix/Oracle/AOLserver and Microsoft .NET). And there is no guarantee that the teaching assistants will be able to understand student-authored Perl. But except for the stipulation that the source of persistence must be an ACID-compliant RDBMS, we will impose no technology restrictions on students.

The new syllabus

Basics: more or less the old pset 1 but remove references to specific
technologies and put those into a supplement.  On MIT calendar: 1.5
weeks.

Global Competitive Analysis: write down some usage scenarios and work
through them on some other sites on the Internet that have similar
objectives.  On MIT calendar:  0.5 weeks.

User registration and management.  Support grouping of users.  Include
the issue of whether or not to implement a party system.  On MIT
calendar: 1 week.

Content management.  Need to capture authorship, approval, moderation,
reference ("X is a comment on Y" or "A is a reponse to B").  On MIT
calendar: 2 weeks.

Software Modularity.  Need a way to group all the code for a module,
record the docs, publish APIs to other parts of the system, read
configuration parameters.  On MIT calendar: 1 week.

Discussion.  The most basic Web service, built on top of the content
management system.  Support categorization, moderation, breakout and
reassemble.  Include a user test.  On MIT calendar: 2 weeks.

WAP and voice interfaces.  Build a WAP interface that lets someone on
a mobile phone participate in the community.  Build a voice interface
that lets someone on a regular phone participate in the community.  In
2001 this will most likely use VoiceXML and the tellme.com
infrastructure, though it would be nice to use a more sophisticated
conversational system such as LCS JUPITER. On MIT calendar: 1 week.

Distributed Computing.  The world of Web services.  Introduces protocols
such as SOAP, WSDL, and UDDI.  On MIT calendar: 1 week.

Scaling gracefully.  Geospatialization.  An "interesting person"
system.  On MIT calendar: 1 week.

Full-text search.  On MIT calendar: 1 week.

Personalization.  Users picking subject areas of interest,
categorization of content.  Then getting more advanced with full-text
comparison tools to measure similarity and dissimilarity to items
previously marked by a user as good or bad.  On MIT calendar: 1 week.

Goodbye to all that (manual coding).  Autogeneration of data model and
user experience scripts.  Either building a structured knowledge
management system within the subject area of the site (a la problem
set 4) or finding some similar challenge that is idiosyncratic to the
service being developed.  On MIT calendar: 1 week (make it a bit
easier than the old pset 4).

Writeup.  A final overview paper with screen shots illustrating the
most important aspects of the Web service built.  Designed to be no
more than 5 pages; one continuous Web document that can be entirely
consumed by scrolling.  

Exams: easy mid-term and harder final.  Concentrate on ability to
think critically about the user experience plus some RDBMS
fundamentals.

The new textbook to support this syllabus is available at http://philip.greenspun.com/seia/.

Early results with the new course

As this article goes to press, we're halfthrough through teaching Nouveau 6.916 at ArsDigita University (ADU). The ADU calendar is one month per course and hence it may be premature to draw conclusions about how things will work at standard universities. The new material is definitely forcing students to spend more time thinking about data modeling. The new syllabus polarizes those with strong from those with weak debugging skills, i.e., instructors will want to make sure that they enforce completion of a standard software engineering course as a prequisite.

Students at ADU chose the following infrastructures, in declining order of popularity:

Microsoft .NET (beta 1), usually with Microsoft SQL Server underneath
AOLserver Tcl pages querying from Oracle 8.1.7 on Linux
Apache plus Java Server Pages, Python, or Ruby (an object-oriented scripting language) querying from PostgreSQL on Linux

The .NET and AOLserver students are way ahead of the Apache users. Students using Apache sometimes spent an entire week, 12 hours per day, getting various modules and database connectivity tools to compile, only to end up where the AOLserver or .NET users had been after 30 minutes. Students had no trouble installing PostgreSQL but were unable to get the answers to simple questions from the available books and documentation. No students discovered the table inheritance features of PostgreSQL, for example, and some built data models that could have used these features to great advantage.

Conclusions

Our experience with 6.916 leads us to believe that a significant improvement in students' software engineering skills can be achieved via the following elements:

challenging students to build four or five applications over a 13-week semester (note that these applications can be submodules in a single online learning community)
drawing on the alumni to bring professional software engineers onto the campus to coach students
a terminal room where students can work together on a scheduled basis
projects with real clients
an emphasis on oral and written presentation of results

Finally, it has been fun to watch our students graduate and go onto the job market. During job interviews they are able to point their interviewer to the URL of the running Web service that they developed during 6.916. Oftentimes, the student-built service is more sophisticated and is running on a more reliable infrastructure than most of the Internet applications launched on the public Internet by the interviewer's company!

Acknowledgments

Our first thanks must go to our students, who taught us what worked and what didn't work. It is a privilege to teach at MIT and every instructor should have the opportunity once in a lifetime.

We did not teach those four semesters alone. The students' most valuable partners were the teaching assistants, most of whom were MIT alumni volunteering their time: David Abercrombie, Tracy Adams, Ben Adida, Eve Andersson, Mike Bonnet, Christian Brechbühler, James Buszard-Welcher, Bryan Che, Jack Chung, Randy Graebner, Bruce Keilin, Jesse Koontz, Chris McEniry, Henry Minsky, Neil Mayle, Dan Parker, Richard Perng, Jon Salz, Lydia Sandon, Mike Shurpik, Steve Strassman, and Jessica Wong.

Michael Dertouzos, and Andrew Grumet were our partners on the lecture podium. Michael gave us an early push into voice applications. Andrew provided leadership in the mobile browser (WAP) arena.

We've gotten valuable feedback from instructors at other universities using these materials, notably Aurelius Prochazka at Caltech.

philg@mit.edu

Reader's Comments

If the Nouveau 6.916 is taught standalone as AD University did, "completion of a standard software engineering course as a prequisite" will keep out a number of people who would otherwise benefit.
In that case, it would be helpful to provide a preliminary some-weeks 'bootcamp' style software engineering course before the one-year ordeal began. Such a course would also serve to weed out those from other fields with weaker tree building anhd debugging skills.

-- David R. Levitt, August 27, 2001

I worry about the RDBMS side here; it looks to me a bit as if it's being treated as a dumb data store, and is ignoring the integrity and logical aids that the relational model gives you. In particular, comments such as "No students discovered the table inheritance features of PostgreSQL, for example, and some built data models that could have used these features to great advantage," are very disturbing; that particular feature is not only a failed experiment in a non-relational data management style that's simply never been removed from the source, but is actively broken, allowing things such as duplicate keys.
In doing "enterprise" applications, many of them with large web-based components, I've found the hardest part is producing a logical model of the business or service, yet that model is one of the most valueable things for both technical and business development. It's the relational model of data management that's given me the ability to work out and specify these models.
Building a relational model compares to developing in a non-relational way on an SQL DBMS in the same way that working in LISP compares to working in PHP3. All code in PHP3 is going to be bad anyway, due to the lack of power in the language, but the LISP programmers will at least do all that the language is capable of, whereas the PHP3 hackers will just produce a sad, illogical mess.

-- Curt Sampson, September 27, 2005

Add a comment