Go to the U of M home page
School of Physics & Astronomy
School of Physics and Astronomy Wiki

User Tools


groups:bes3:users:howell:farm:howell-t1

This is an old revision of the document!


howell-T1 farm

Configuration

(lines starting with $ are what I entered to recover this information)

Created by:

  $ mc list howell-T1 -c
  mc new "howell-T1" --seed 696189625 --decay ecm3770 --number 4.915113e8 --actions simulate dst d-skim monitor --drop-actions simulate --output-directory /hdfs/bes3/users/howell/simulations/howell-T1

Farm environment:

$ grep export /data/bes3d2/bes3/users/howell/simulations/howell-T1/work/process
export WorkArea="/data/bes3d2/bes3/users/howell/simulations/howell-T1/work/boss-workarea"
export BesFarmArea=/data/bes3d1/bes3/farm
export McSimulationsDirectory=/data/bes3d2/bes3/users/howell/simulations/
export McTemporaryOutputDirectory=/dev/shm
export McOutputDirectory=/hdfs/bes3/users/howell/simulations
export McStagingDirectory=/dev/shm/staging-howell
export McTemplatesDirectory=/data/bes3d1/bes3/farm/templates
export McDecaysDirectory=/data/bes3d1/howell/farm/decays

Log

  • 2011-05-23
    • even after sed script, farm still broken
    • why? cleared out /dev/shm by rebooting machines (it is not persisted)
    • decided to test manually
$ cd /data/bes3d2/bes3/users/howell/simulations/howell-T1/work
$ ./process 0
...
boss.exe /.../bhwide.boss.txt ...
  • this created bhwide.end (as could be seen from the rsync output)
  • presumably it succeeded?
  • checked /hdfs/bes3/…/mc, but howell-T1-0-…mn.rtraw wasn't there!
  • check job script to see if the command to move it was broken (remember we changed mv → cp)

$ cd /dev/shm/staging-howell/howell-T1/howell-T1-0-R11517-E185

$ grep cp execute
    cp "/dev/shm/staging-howell/howell-T1/howell-T1-0-R11517-E185/howell-T1-0-R11517-E185.mn.dst" "/hdfs/bes3/users/howell/simulations/howell-T1/dst/howell-T1-0-R11517-E185.mn.dst"
    cp "/dev/shm/staging-howell/howell-T1/howell-T1-0-R11517-E185/howell-T1-0-R11517-E185.mn.root" "/hdfs/bes3/users/howell/simulations/howell-T1/root/howell-T1-0-R11517-E185.mn.root"
    cp "/dev/shm/staging-howell/howell-T1/howell-T1-0-R11517-E185/howell-T1-0-R11517-E185.mn.skim" "/hdfs/bes3/users/howell/simulations/howell-T1/skim/howell-T1-0-R11517-E185.mn.skim"
  • no cp for the rtraw file!
  • before 2011-05-23
    • Previous cascading failures:
      • move to hadoop (mv <file> /hdfs/bes3/…/<file>) of output files was not removing the original
      • causes /dev/shm (our staging area) to fill up (only 24G in size)
      • causes farm to die (no staging space)
    • So we can't trust mv to hadoop; instead use cp to hadoop and then rm -f the file (rm -f /definitely/ removes the file)
    • mc updated correctly, but we still have this old farm which has so much progress; a shame to waste it
    • construct a sed command to update the howell-T1 job scripts
groups/bes3/users/howell/farm/howell-t1.1306173679.txt.gz · Last modified: 2011/05/23 13:01 by howell