Why do I get image corruption?

Have questions about the equipment used for macro- or micro- photography? Post those questions in this forum.

Moderators: ChrisR, Chris S., Pau, rjlittlefield

Online
JKT
Posts: 157
Joined: Fri Oct 28, 2011 9:29 am
Location: Finland
Contact:

Post by JKT »

...or between RAM/CPU and SSD as happened in my case. Bad memory or bad SSD are also possible, but those you already included.

Macro_Cosmos
Posts: 900
Joined: Mon Jan 15, 2018 9:23 pm
Location: Sydney

Post by Macro_Cosmos »

I need to learn how to read. :(

Right, the PC is used as the medium to fetch images, while the Mac is for editing. Got it.

I doubt the CPU or overclocking would cause something like this.

As we've said, there's lots of stuff that could cause such corruption. I brought up one case that I've experienced and fixed. I got quite identical corruption compared to your examples, however the entire batch was corrupted, not just 2 out of 1600. You simply can't just rule out software entirely. Just because all drivers are always up to date, doesn't make it problem free. New updates screw stuff up all the time, sometimes it's not a good idea to always keep everything up to date.

How long have you been using the SSD for? Samsung? You can get a freeware called samsung magician to check any SSD's status.

My approach would be take the motherboard out, repaste the CPU after cleaning of course, make sure the SSDs and RAMs are securely sitting in their slots, then get a new USB cable. I would also reinstall all drivers.

After this kind of soft reset, if the problem still exists, I'll go deeper... ie fresh windows installation, try out a different PC, maybe a friend's laptop... then worry about the motherboard being whack, eventually arriving at the camera.

Perhaps a stupid question, capturing images to a computer would surely work on mac too, right? Capture One offers tethered shooting on both platforms. Why not try set something up with a trial version to see if it's the camera or the PC being whack first?

pwnell
Posts: 2026
Joined: Fri Dec 18, 2009 4:59 pm
Location: Tsawwassen, Canada

Post by pwnell »

Macro_Cosmos wrote:How long have you been using the SSD for? Samsung? You can get a freeware called samsung magician to check any SSD's status.
I have a Samsung EVO 950 Pro SSD, had it for about 3 years, and have magician installed - status is Good and no SMART errors.
Macro_Cosmos wrote:My approach would be take the motherboard out, repaste the CPU after cleaning of course, make sure the SSDs and RAMs are securely sitting in their slots, then get a new USB cable. I would also reinstall all drivers.
Yeah that will be a big job. Custom watercooled loop with two GPUs all connected via tubing. But I will have to start somewhere I guess: (
Macro_Cosmos wrote:Perhaps a stupid question, capturing images to a computer would surely work on mac too, right? Capture One offers tethered shooting on both platforms. Why not try set something up with a trial version to see if it's the camera or the PC being whack first?
I have spent 5 years trying $6000 Olympus capture software, Capture One Pro version 7 - current and many others - none come close to DSLR Remote Pro. I know for a fact it is not the app as I have shot over 100000 images with it (and about 20000 on the current version) with no issues. It is unfortunately Windows only.

I HATE Capture One Pro - it is way too slow, and extremely annoying to enter comments per photo.

pwnell
Posts: 2026
Joined: Fri Dec 18, 2009 4:59 pm
Location: Tsawwassen, Canada

Post by pwnell »

Macro_Cosmos wrote:I doubt the CPU or overclocking would cause something like this.
Well it could, if the machine becomes unstable then it could introduce errors but I agree, it is a bit of a far stretch just looking at the way the corruption occurs. I did remove the overclock and will be testing that.

Also - if I come across as not wanting to change things, it is simply because I already know that is definitely the most systematic way to proceed but I have limited access to other (windows) pcs and other cameras and USB3 cables that will reach from my microscope to the PC. So I was hoping to lean on other people's experiences... That said I will probably have to go the route you suggested.

rjlittlefield
Site Admin
Posts: 20976
Joined: Tue Aug 01, 2006 8:34 am
Location: Richland, Washington State, USA
Contact:

Post by rjlittlefield »

Maybe I've missed it, but I don't see where you've said that you tested RAM.

Once upon a time I had one row of memory go bad, in a 32 GB system. The only symptom was that file system backups would not validate. Everything else ran fine. It was a hard failure so Windows memory test picked it up quickly.

--Rik

pwnell
Posts: 2026
Joined: Fri Dec 18, 2009 4:59 pm
Location: Tsawwassen, Canada

Post by pwnell »

rjlittlefield wrote:Maybe I've missed it, but I don't see where you've said that you tested RAM.
I have not yet tested RAM, but I did run Prime95 for an hour or so without errors. I will check RAM now. Good idea - thanks.

Chris S.
Site Admin
Posts: 3506
Joined: Sun Apr 05, 2009 9:55 pm
Location: Ohio, USA

Post by Chris S. »

Ugh—intermittent issues are the worst. A few thoughts (Disclosure: I've built a lot of Windows PC's, and troubleshooted many more):
  • 1) Do you have the option to save images to both PC and memory card? If so, I’d be inclined to try this for a few stacks, to see if the corruption occurs on the same images in both places. Whichever way it goes, you get useful information.

    2) Strange as it sounds, I’d be tempted to try wrapping the USB cord loosely in aluminum foil to reduce the chance of RF interference. Given the intermittent nature of the corruption, I wonder about occasional RF sources such as garage door openers, cordless phone ringers, and the like being picked up by this cord. I know from painful experience that any cord can act as an antenna, and that aluminum foil shielding will sometimes attenuate this.

    3)
    pwnell wrote:I have not yet tested RAM, but I did run Prime95 for an hour or so without errors. I will check RAM now.
    Testing RAM is a good next step. If that comes up clean, I’d suggest running Prime95 for 24 hours—the standard for declaring a PC “Prime95 stable.” Agreed that most problems do show up in the first hour of stress testing; but some intermittent issues take longer to manifest.

    4)
    pwnell wrote:. . . DSLR Remote Pro. I know for a fact it is not the app as I have shot over 100000 images with it (and about 20000 on the current version) with no issues.
    You have surely thought of this, but is there any chance that the corruption issue started with the current version of DSLR Remote Pro? After all, 20,000 images is just over a dozen stacks at 1600 images per.

    5) If none of the above help, I’d be tempted to examine some non-corrupt raw files with a hexadecimal editor to get a sense of what a good raw file looks like in hex. Then examine some corrupt raw files in the hex editor. Do you see any patterns? Granted, this is a vague suggestion, but in the past, doing this has given me clues that solved some vexing corruption issues.

    6) Regarding the discussion to disassemble the PC, clean and reseat the CPU, RAM, etc., I’ve been down that road a number of times, and mostly found it to be a huge time sink with little benefit.

    If you get to this point with nothing else working, I’d suggest you try getting a new SSD, cloning the old one over to it (my preferred tool for this is Acronis) and seeing if the problem goes away. If so, you had a corrupt SSD.

    If this doesn't work, consider that your motherboard may be going bad. I've had this happen. And if this is the case, in my bitter experience, you might as well throw out most of your system components and rebuild from scratch, if you value your time. Sourcing a mobo even two or three years-old is difficult and fraught with risk to your time. And newer motherboards work best with components of newer vintage.
Good luck!

--Chris S.

pwnell
Posts: 2026
Joined: Fri Dec 18, 2009 4:59 pm
Location: Tsawwassen, Canada

Post by pwnell »

Chris S. wrote:Do you have the option to save images to both PC and memory card? If so, I’d be inclined to try this for a few stacks, to see if the corruption occurs on the same images in both places. Whichever way it goes, you get useful information.
Good idea - this will determine if the camera is to blame or the PC / USB subsystem.
Chris S. wrote:Testing RAM is a good next step.
So this is the interesting one. As I mentioned, I had my watercooled PC overclocked for the past 3 years with no issues after extensive stability tests (Prime95, AIDA64, FurMark, RAM memtest86 etc.) The issue only started happening at the beginning of May. So I did not really expect RAM as in my personal experience having built more servers and desktops I can care to count, (modern) RAM usually comes broken from the factory or it works and stay working until a hard drive or motherboard or PSU fails first. So I did not expect this.

I performed a memtest86 and it picked up many errors. I then started to move RAM sticks around (there are 4 in quad channel configuration), I removed the overclock etc. Eventually it came down to this: If I remove the XMP profile (for which these DIMMs are certified to work) then I have a stable RAM test (memtest86 after 4 passes). If I add the XMP profile back I get tons of errors. So clearly these RAM sticks together with the motherboard and CPU cannot handle XMP speeds and timings any longer. Right now XMP is disabled and I will see how the next session goes (hopefully this weekend).
Chris S. wrote:You have surely thought of this, but is there any chance that the corruption issue started with the current version of DSLR Remote Pro? After all, 20,000 images is just over a dozen stacks at 1600 images per.
In my experience, in this case, it is not the software. You have to write real crappy code to generate a series of 3-4 bad images out of a batch of 1600 or so and the rest all being fine. It just does not seem like a software error (I have been writing software for more than 2 decades).
Chris S. wrote:If none of the above help, I’d be tempted to examine some non-corrupt raw files with a hexadecimal editor to get a sense of what a good raw file looks like in hex. Then examine some corrupt raw files in the hex editor. Do you see any patterns? Granted, this is a vague suggestion, but in the past, doing this has given me clues that solved some vexing corruption issues.
This was actually the first thing I did. However I did not see anything obviously different between two images that were of the same subject, slightly different depths of focus, one corrupt and the other not. That said, performing a compare between two 23MB RAW images is not easy. Whatever corruption crept in looks like random data and not something obvious like a section of zeros or 0xFFFF etc.
Chris S. wrote:Regarding the discussion to disassemble the PC, clean and reseat the CPU, RAM, etc., I’ve been down that road a number of times, and mostly found it to be a huge time sink with little benefit.
Agreed especially since this PC weighs 30kg and stands on a desk undisturbed - chances of something becoming unseated is very slim. This is however very useful to check after a PC has been shipped though.

Thanks for all the welcome suggestions and tips - much appreciated.

rjlittlefield
Site Admin
Posts: 20976
Joined: Tue Aug 01, 2006 8:34 am
Location: Richland, Washington State, USA
Contact:

Post by rjlittlefield »

Thanks for all the further discussion. This is very helpful info to have "in the files" for the next time I see something intermittent.

--Rik

Macro_Cosmos
Posts: 900
Joined: Mon Jan 15, 2018 9:23 pm
Location: Sydney

Post by Macro_Cosmos »

Some of these PC issues are just weird and esoteric.

I had an issue in the beginning of February where mouse clicks and keyboard input will be delayed, then spaz out. The scroll wheel will register as L-click and the such. I did find people with the same problem but none of the fixes they have worked. I tried to reinstall drivers and stuff, didn't work. The laptop keyboard worked just fine. I decided to reset, which failed because reasons, and then the problem disappeared.

After restarting, it came back. Then reset again just to terminate the process when Windows prompts that stuff is ready, the issue disappears.

Now it's gone regardless, I have absolutely no idea why it would happen in the first place.

It has to be said though, just because 20k shots incurred no issues, doesn't mean another 10 will be problem free. I'm not trying to be smug here, maybe that's the stats side of me feeling uncomfortable. That said, I don't think it's the software either.

Regarding to Capture One Pro, the current version is 20, I love it. 7 would be years ago I imagine, it likely would have sucked. 10 wasn't that great for me but 12 nailed it. Just my personal opinion.

A custom water cooling loop... disassembly will be an utter nightmare. I assume you have examined the stability of your overclock, you even turned it off and the problem was still there obviously.

This is odd, but you have to start somewhere I'm afraid. Chris S. is a lot more experienced than someone like myself whose only experience with PC building is from a keyboard. Good advice there!

Hope you squash those little gremlins in the system.

Chris S.
Site Admin
Posts: 3506
Joined: Sun Apr 05, 2009 9:55 pm
Location: Ohio, USA

Post by Chris S. »

pwnell wrote:
Chris S. wrote:You have surely thought of this, but is there any chance that the corruption issue started with the current version of DSLR Remote Pro? After all, 20,000 images is just over a dozen stacks at 1600 images per.
In my experience, in this case, it is not the software. You have to write real crappy code to generate a series of 3-4 bad images out of a batch of 1600 or so and the rest all being fine. It just does not seem like a software error (I have been writing software for more than 2 decades).
Be that as it may, you might contact the software company and tell them what is going on. In particular, double-check that your product key is on their "authentic list." Some years back, a photographic software company apparently created a feature to randomly and irretrievably delete images from memory cards, if the software was thought to be pirated. I was skeptical of this story, until I sought out an opportunity to test and confirm the behavior for myself. Though I liked the company's software, I decided not to buy it: What if, for some reason, they made a mistake and thought I had a bootleg copy? "Sorry, we'll return your product key to our 'authentic' list" would not have brought back missing images.

Unlikely as this is, it might be worth asking the software provider about intermittent image corruption.
pwnell wrote:
Chris S. wrote:Testing RAM is a good next step.
So this is the interesting one. As I mentioned, I had my watercooled PC overclocked for the past 3 years with no issues after extensive stability tests (Prime95, AIDA64, FurMark, RAM memtest86 etc.) The issue only started happening at the beginning of May. So I did not really expect RAM as in my personal experience having built more servers and desktops I can care to count, (modern) RAM usually comes broken from the factory or it works and stay working until a hard drive or motherboard or PSU fails first. So I did not expect this.

I performed a memtest86 and it picked up many errors. I then started to move RAM sticks around (there are 4 in quad channel configuration), I removed the overclock etc. Eventually it came down to this: If I remove the XMP profile (for which these DIMMs are certified to work) then I have a stable RAM test (memtest86 after 4 passes). If I add the XMP profile back I get tons of errors. So clearly these RAM sticks together with the motherboard and CPU cannot handle XMP speeds and timings any longer. Right now XMP is disabled and I will see how the next session goes (hopefully this weekend.
This is troubling. Three years ago, I built two identical, reasonably high-end machines--one for myself, one for a client. Mine is still going strong. But last fall, the client's machine started throwing up weird errors that involved memory. I took the machine back and started troubleshooting. It helped that I had an identical machine to test things against. After an awful lot of work, I found that the problem was indeed with memory, but not the memory sticks themselves--these were all fine. The issue was apparently in a portion of the motherboard that dealt with RAM. This issue was progressive--it had started small and gotten worse. By the time the machine was in my hands, it wouldn't even boot.

I at first tried to source a replacement motherboard, but the provenance of the three-year-old motherboards I looked at was invariably suspect. And time is money, for the client and myself. So we junked everything but his case, hot-swap bays, hard drives, and water cooler, and I built him another system.

--Chris S.

pwnell
Posts: 2026
Joined: Fri Dec 18, 2009 4:59 pm
Location: Tsawwassen, Canada

Post by pwnell »

Chris S. wrote:Some years back, a photographic software company apparently created a feature to randomly and irretrievably delete images from memory cards, if the software was thought to be pirated.
That is about as unethical as a lawyer sharing your case details with the opposition. I was hoping that such behaviour was not real but seems like I was mistaken.
Chris S. wrote:But last fall, the client's machine started throwing up weird errors that involved memory.
I think some of the components on the motherboard fails under the extra stress in due time even with good water cooling. A transistor switching faster than it was designed for, for extended periods of time can cause intermittent issues due to gate leaks. I would not be overclocking my next rig - water cool, sure - but overclocking is just not worth it if long term stability is what you are after.

Chris S.
Site Admin
Posts: 3506
Joined: Sun Apr 05, 2009 9:55 pm
Location: Ohio, USA

Post by Chris S. »

pwnell wrote:That is about as unethical as a lawyer sharing your case details with the opposition. I was hoping that such behaviour was not real but seems like I was mistaken.
I agree--totally uncool. I do sympathize with the difficulty that small software vendors have in getting paid for their work, but this particular defense did not sit well with me.
pwnell wrote:I think some of the components on the motherboard fails under the extra stress in due time even with good water cooling. A transistor switching faster than it was designed for, for extended periods of time can cause intermittent issues due to gate leaks. I would not be overclocking my next rig - water cool, sure - but overclocking is just not worth it if long term stability is what you are after.
I'm on the same page as you. I and the people I build computers for prefer stability and trouble-free functioning over tiny gains in speed. For the record, the machine I described that developed motherboard issues was never overclocked.

Best of luck solving your particular issues!

--Chris S.

ChrisR
Site Admin
Posts: 8557
Joined: Sat Mar 14, 2009 3:58 am
Location: Near London, UK

Post by ChrisR »

FWIW - my local computer shop would tell you to put your memory into single channel mode rather than multichannel, to avoid nonspecific memory problems.
I understand that's only an option if you have spare slots.
I can't add any further comment!
Chris R

pwnell
Posts: 2026
Joined: Fri Dec 18, 2009 4:59 pm
Location: Tsawwassen, Canada

Post by pwnell »

ChrisR wrote:FWIW - my local computer shop would tell you to put your memory into single channel mode rather than multichannel, to avoid nonspecific memory problems.
I understand that's only an option if you have spare slots.
I can't add any further comment!
You mean not quad channel? What is the point of fast DDR4 quad channel RAM if you do not use that configuration? The motherboard and the cips were designed for quad channel configuration...

Or am I misunderstanding you?

Post Reply Previous topicNext topic