Campuses:
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
groups:bes3:users:howell:farm:howell-t1 [2011/05/23 13:01] – howell | groups:bes3:users:howell:farm:howell-t1 [2011/05/27 11:57] (current) – howell | ||
---|---|---|---|
Line 21: | Line 21: | ||
=== Log === | === Log === | ||
- | | + | == 2011-05-27 == |
- | * even after sed script, farm still broken | + | |
- | * why? cleared out /dev/shm by rebooting machines (it is not persisted) | + | * Also added commit d147245991fe89f8cdfb3e3846d98e5145bec371, |
- | * decided to test manually | + | == 2011-05-23 == |
+ | * even after sed script, farm still broken | ||
+ | * why? cleared out /dev/shm by rebooting machines (it is not persisted) | ||
+ | * decided to test manually | ||
$ cd / | $ cd / | ||
Line 31: | Line 34: | ||
boss.exe / | boss.exe / | ||
- | | + | |
- | * presumably it succeeded? | + | * presumably it succeeded? |
- | * checked / | + | * checked / |
- | * check job script to see if the command to move it was broken (remember we changed mv -> cp) | + | * check job script to see if the command to move it was broken (remember we changed mv -> cp) |
$ cd / | $ cd / | ||
$ grep cp execute | $ grep cp execute | ||
Line 41: | Line 45: | ||
cp "/ | cp "/ | ||
- | | + | |
+ | |||
+ | $ cd / | ||
+ | $ rm */execute | ||
+ | |||
+ | * Now re-run job 0 | ||
- | * == before 2011-05-23 == | + | == before 2011-05-23 == |
- | * Previous cascading failures: | + | * Previous cascading failures: |
- | * move to hadoop (mv < | + | * move to hadoop (mv < |
- | * causes /dev/shm (our staging area) to fill up (only 24G in size) | + | * causes /dev/shm (our staging area) to fill up (only 24G in size) |
- | * causes farm to die (no staging space) | + | * causes farm to die (no staging space) |
- | * So we can't trust mv to hadoop; instead use cp to hadoop and then rm -f the file (rm -f / | + | * So we can't trust mv to hadoop; instead use cp to hadoop and then rm -f the file (rm -f / |
- | * mc updated correctly, but we still have this old farm which has so much progress; a shame to waste it | + | * mc updated correctly, but we still have this old farm which has so much progress; a shame to waste it |
- | * construct a sed command to update the howell-T1 job scripts | + | * construct a sed command to update the howell-T1 job scripts |