Thursday, June 3, 2010

The Future of High Computation Software

Why I at best I have only used about 20% of my computers compute power?

Here I am running a big build job on my computer. I rarely see it get above 20% CPU usage. So why am I using so little of my computer's potential?

The two main reasons are poor software and slow disks. I could improve the disk, but I want to talk about the future of software on the multicore, hyperthreaded machine.

I use Windows OS and currently when I run a single application (most applications live as a single process) it can only run on one core. This is a limitation of the OS which will need to end soon. It will not be long before the 128 core systems are on the market (within the next 5 years) so I hope Microsoft is working on the problem. I would hate to put down $1000 for a new 128 core processor machine only to find that my application is still running slow. How disappointing it would be to find that it can’t go any faster and is using less than 1% of the computers potential.

Ideally there will be operating system changes to allow a process to run across more than one processor. I am not sure who will be the first to make this happen. Apple is very forward thinking but focuses on the user experience and is not likely to be a leader in this activity. I suspect the Linux community would be the first to cross this boundary. It has all the ingredients to be a great Master’s Thesis for some university student. I suspect that Microsoft will come late to this game. But I think this change to the operating systems is not required to fully leverage the massive parallelism of future hardware by managed runtime environments.

The future: Speed through high level languages and managed runtimes.

In order to use all the parallel capability of computer hardware, software will need to run in a managed environment.

Software which is written in the form of data objects with methods (actions the objects can be told to perform) is written in a pretty natural structure. If my object is a file (object) and I want the size, I call the get-size method (action) upon the object. This makes it easy to understand and manage the software. The software mimics the human world as humans perceive it. Even if it is modeled as real world objects, almost all of the use is to achieve functional results (use cases). It goes through a single path to a single result which we call a use case.

In the past, if you wanted software to run fast, you would get closer to the hardware and write in a more primitive language. Many people are still stuck in the mode that C and C++ are a good compromise for going fast and having reasonably readable code. It is an approach that has proven itself an easy way to translate use cases into speedy results on a single processor with a single core and often a single thread. This is why many applications of today will use only 1% of the power of my future 128-core processor computer.

Currently it is possible to get speed this functional path (use case) to run in a parallel manor with a lot of not intuitive highly specialized software coding. This makes the software very unintuitive to the human view of the world and difficult to manage. This translates to a high cost and is primarily why few companies currently support the technology.

The change is coming soon where managed code runtime environments will bridge the gap. These environments already exist as Microsoft’s .Net CLR and Sun’s Java JVM. Currently these types of environments are still single process and run on one core. I expect these environments to start running as process pools with shared memory connections in order to go around current operating system limitations for one processor core to each process. This will allow threads within the application to run on multiple processes.

The second change that I see coming to the managed code runtime environments is the automatic generation of massively parallel threading to take advantage of all cores and processor chips. The human intuitive model that has been coded by the programmer will automatically be converted to the non-intuitive parallel architecture. (Methods will automatically become asynchronous callables and the variables which are returned will be blocking queues.) The result will be that almost all object actions (methods) will run on a separate thread. These threads will span processor cores. This will turn what appears in code to be simple functional objects into massively parallel computing.

Software architects have been slow to see this coming. I suspect that in about 3 years (when the 32 core processes hit the market) there will be a sudden race to get this type of functionality built into the managed runtime environments of CLR, JVM, Python, and Ruby. Those who have written code in these high level languages will quickly see the advantages. Those who optimized code to run low level and “fast” (on one processor or one thread) will suddenly find themselves obsolete and uncompetitive.

There will be a massive disruption in the software market as current leaders in computational software (much of which has been optimized for the single core architecture using low level languages) will be replaced with forward thinking new players. These new players will be leveraging high level languages to make massively parallel computational software. This disruption will have its greatest impact in the GIS, CAD, simulations, and predictive systems markets.


Unknown said...

I received this comment in email after Steve was unable to post it.
Steve Kommrusch writes:

In addition to multiprocessor computing, HW is evolving towards making modern programmable graphics processors available to application SW. AMD's support of OpenCL and Nvidia's Cuda allow programs to perform floating-point SIMD behaviors with 10X the floating point performance of processors. In short, SW languages need to evolve and SW coding styles need to support various code streams running in parallel but being managed in a way that is usable for complex SW development.

aap said...

If the input of one operation depends on the output of another operation, using two different cores for those operations will not help -- the second operation must wait for the first to complete. So "almost all operations" might be slightly too optimistic.

In cases where the algorithm itself could be parallelized, I don't see why managed code should be inherently more adaptable than C.

Maggie said...

John Underkoffler points to the future of UI

John Underkoffler's future UI is a bit optimistic on the time to the end user. The UI will may be there, but the apps will be slow to follow. I suspect for the types of apps this UI will drive will need the software revolution needed to harness all of the new parallel multi-core computer.