Stacking on multiple processors

A forum to ask questions, post setups, and generally discuss anything having to do with photomacrography and photomicroscopy.

Moderators: ChrisR, Chris S., Pau, rjlittlefield

Post Reply
rjlittlefield
Site Admin
Posts: 20777
Joined: Tue Aug 01, 2006 8:34 am
Location: Richland, Washington State, USA
Contact:

Stacking on multiple processors

Post by rjlittlefield »

A recent topic in the Technical Macro gallery touched on the speed of stacking software, and whether it does or does not help to have a multi-core or multi-processor machine.

Here's the quick summary of my experience:
  • Helicon Focus Pro Multiprocessor does use multiple cores/processors, but with diminishing returns. Their web page quotes 1.7 times faster with 2 cores, but only 2.3 times faster with 4.
  • TuFuse uses only one.
  • CombineZM uses only one.
  • ImageJ uses multiple cores/processors for many built-in functions, but the registration and stacking plugins from BIG use only one (and those plugins are agonizingly slow in addition).
I've tickled various developers about this issue, but it's hard to say when things will change. Writing software to exploit multiple and/or specialized processors is harder than you'd think, especially if that wasn't in the design to start with.

Done really well, there are huge gains to be had. One research group in the stitched panorama community has been developing a new stitching code that is specialized to use the SIMD instructions of the IBM/Sony/Toshiba Cell-Processor. They report that it runs twice as fast on a Playstation 3, compared to the fastest available competing code running on an 8-processor AMD PC. The cost-effectiveness is even more striking, bearing in mind that a properly configured Playstation 3 costs only around $450.

There is, however, one huge difficulty: the software has to be redesigned from the ground up to get this sort of performance improvement. Most likely it'll happen, but just when is anybody's guess.

--Rik

Charles Krebs
Posts: 5858
Joined: Tue Aug 01, 2006 8:02 pm
Location: Issaquah, WA USA
Contact:

Post by Charles Krebs »

OK... here's another question for the computer gurus...

Right now, something like Tufuse really monopolizes my computer (single processor) while it's working. I can do certain other things, but it's pretty crippled. So even if the software can't make use of multiple processors, would a new quad core processor allow me to do real work with another "heavyweight" program simultaneously?

(I must say, not only is Helicon much faster, but it "plays well" with the programs!)

rjlittlefield
Site Admin
Posts: 20777
Joined: Tue Aug 01, 2006 8:34 am
Location: Richland, Washington State, USA
Contact:

Post by rjlittlefield »

Yes, having a dual- or quad-core cpu would help a lot at keeping things responsive even while heavy number crunching is going on. (There can still be an issue if the offending program is hogging your disk.)

But there are a couple of things that may help even with a single processor.

First is to be sure that "hyperthreading" is turned on, if your processor supports it. This will be in the boot rom configuration. When you select hyperthreading, you kinda sorta get two processors. They share so much hardware that you don't actually get much more work done (a few percent), but the two instruction streams are completely separate and are interleaved at a very fine level, so one stream never has to wait for the other to get to a convenient stopping point.

The other possibility is to manually drop the priority of a computational task to "Below Normal" or even to "Low". To do this, get yourself a Task Manager (right-click on any blank space in the task bar), select the Processes tab, highlight whatever task is hogging the cpu, right-click and Set Priority as appropriate. (You'll have to Yes through the warning.) I just now tried this from TuFuse Pro and discovered that the GUI is already launching the tufuse.exe cruncher at Low priority, so manual intervention would not help in this case. But if you're launching tufuse.exe directly from a command line, it will probably help a bunch.

--Rik

augusthouse
Posts: 1195
Joined: Sat Sep 16, 2006 1:39 am
Location: New South Wales Australia

Post by augusthouse »

I'm running a Pentium D (Dual Core) 3.4 GHz (2x 3.4). These are the CPUs that were released in between the Hyper-Threading and Core 2 Duo CPUs.

From my undertsanding it is the OS that directs the traffic but as Rik mentioned, you can override to some extent.

I'm running Vista 32 bit.

I just ran a stack via TuFuse (my first - all settings at Default). There were only 22 images (tiff) in the stack (from the D100 6MP converted from RAW in Capture One 4).

It took less than 6 mins to complete the 'fusing'. Not sure how this rates. I only have 2GB of DDR2 RAM and it certainly wasn't having any problems with the process. I had Photoshop open at the same time and it wasn't laboring. I was browsing the forum. The CPU cores were dancing at about 30% capacity respectively.

These are just my initial observations.

Craig
To use a classic quote from 'Antz' - "I almost know exactly what I'm doing!"

DaveW
Posts: 1702
Joined: Fri Aug 04, 2006 4:29 am
Location: Nottingham, UK

Post by DaveW »

They always said the problem with multicore processors would be the software to exploit them would lag years behind their introduction, so you would seldom gain the benefit you ought to before the next higher multicore processor came out.

See:-

http://techfreep.com/intel-80-cores-by-2011.htm

In the "old days" it was more transistors for each new processor, but then they ran into heating troubles with the Pentium 4 I believe. I read therefore Intel will not in future make any faster individual processor cores due to heating problems but simply keep ganging them as multicores.

I gather that the Pentium 4 has now been dropped as the way forward and the Intel multicore processors are based on an Israeli development of the cooler running Pentium 3, so you could say a step backwards from up the blind alley to go forwards again.

It looks like the processor used as a core will be the equivalent of adding extra transistors in the future. In the past they just kept adding more transistors for a new processor but they will now just keep adding cores with the individual cores getting no faster.

Who is going to write programs for an 80 core processor and have them on the stocks and ready to go when it arrives, most of present software cannot really exploit twin core processors let alone quad core?

DaveW

AndrewC
Posts: 1436
Joined: Thu Feb 14, 2008 10:05 am
Location: Belgium
Contact:

Post by AndrewC »

There is some really exciting work going on using the computational cores embedded in high end graphics cards. Nvidia (and perhaps others) have made available a command interface to access their cores. Why is this important - a high end card can have 16 or more fully parallel cores with supporting cache etc. Think of the computing power needed to perform realtime "live action" rendering and ray tracing. It's a lot. Nvidia are now working with coders to use their graphics engines to perform heavy lift computation. The results are impressive - I saw a live demo last week where a low power laptop with a high eng GPU blew away a multicore/multiprocessor server. It would be nice to see this applied to some benign applications rather than simulating death and destruction :(

lauriek
Posts: 2402
Joined: Sun Nov 25, 2007 6:57 am
Location: South East UK
Contact:

Post by lauriek »

On a connected theme, any experience of running stacking software under 64 bit OS, whether XP, Vista or some form of Linux?

I would imagine the extra potential memory could help, XP 32 bit seems to have an absolute 3gb RAM limit. I don't know if there could be any advantage. Does the various stacking software actually work on 64 bit machines/OS?

rjlittlefield
Site Admin
Posts: 20777
Joined: Tue Aug 01, 2006 8:34 am
Location: Richland, Washington State, USA
Contact:

Post by rjlittlefield »

I don't have a 64-bit O/S to test stacking on, so these comments are from inference and experience with 64-bit apps at work.

Most of the current stacking packages seem to be built as 32-bit applications. They will probably run under 64-bit operating systems, but will be subject to essentially the same memory limitations because internally they are using 32-bit addresses. (32 bits = 4 GB max.)

ImageJ, being a Java application, can exploit a 64-bit JVM and use however much physical memory is available. It's likely to also run quite a bit faster in that mode. Unfortunately the existing plugins for ImageJ are reported to be quite slow, so there's probably no advantage going that route, even with the speedup.

If anybody has experience to the contrary, I'd be very interested to hear about it.

--Rik

Graham Stabler
Posts: 209
Joined: Thu Dec 20, 2007 11:22 am
Location: Swindon, UK

Post by Graham Stabler »

Is it possible to divide and conquer using tufuse?

You could combine images 1-5 into one image while combining 6-10 in another etc you can have as many subsets as you have processors. Write a batch file to do this, perhaps you need multiple copies of the exe file, I'm not sure. Then process the processed images.

I can see many reasons why this might not work but thought I would throw it out there.

Graham

lauriek
Posts: 2402
Joined: Sun Nov 25, 2007 6:57 am
Location: South East UK
Contact:

Post by lauriek »

I just got a new computer setup in the last couple of weeks and wanted to post my experience with stacking software speed on my old and new machines.

My old machine was:-

AMD Athlon Barton core 2.5ghz
1.5gb Ram

New machine is

Intel core duo quad core 2.4ghz overclocked to 3.2ghz
4gb DDR2 Ram, of which Windows XP 32 bit uses either 3.5 or 3.25gb - it seems to vary!

I ran some test stacks last night in CZP and tufuse. The first stack was around 45 images. (All my input images are uncompressed 8bit 7.5mp TIFFs - around 23mb each). It took about 2 minutes in CZP. On my old system this would have taken 15-20 minutes at least. (I didn't have CZP on my old system the time estimate is for tufuse.)

Second stack was almost exactly 150 images. This took 12.5 minutes on the new system. (There wasn't a lot of difference between tufuse and czp on this, I wasn't measuring to the second and couldn't say which was the fastest - really not a lot in it!). This would have taken so long on my old machine that I would have left it running overnight - probably at least 1.5-2 hours.

Also my old machine became quite unusable when stacking (although I realise I could have put the priority of the stacking process down I didn't want to!) - new machine is still very responsive while processing stacks.

On a side note, RAW processing is amazing on this machine, I use RawShooter Premium for this and I didn't realise but it does use all cores - it processes each RAW file in about 3 seconds vs about 30 seconds on the old machine!

Now I personally don't like Helicon that much - I much prefer the output from the pyramid algorithms (Tufuse and CZP) - so I don't yet have any option for multi-core stacking, but hey it's pretty much fast enough on one decent core now!! Having said that I won't complain if someone makes a pyramid stacker that can use multi-cores, although I hope we will be able to choose how many processors to use!! :D

I am slightly confused as to why the new machine is so much faster than the old at stacking considering it's on one core - 3.2ghz vs 2.5ghz doesn't suggest that much improvement, I guess the architectural differences between the old AMD Athlon and the new Intel Core-duo must make a considerable difference. I guess the new processor has a lot more on chip cache, 6mb iirc, suppose this could help as well...

Post Reply