Campuses:
This is an old revision of the document!
Installation overview of a basic 'Tier 3' installation configured for use by CMS. Initial planning based on OSG Tier3 Twiki. Actual installation process based on OSG Software 3 installation documents. Most configuration handled in cfengine/cf.osg
.
Hostname: gc1-ce.spa.umn.edu
Services: GridFTP server, GRAM, CEMon
Primary packages: osg-ca-certs osg-gridftp-hdfs osg-ce-condor globus-gram-job-manager-managedfork
Installation docs: Installing the Compute Element
Hostname: gc1-se.spa.umn.edu
Services: GridFTP server, BeStMan SRM
Primary packages: osg-ca-certs osg-gridftp-hdfs bestman2-server
Installation docs: Hadoop 20
To allow BeStMan to direct requests to specific GridFTP servers based on pathnames, a protocol selection plugin is used. The plugin is based on the Bestman Gridftp Path Plugin.
package policy; import gov.lbl.srm.util.TSRMLog; import java.io.IOException; import java.util.*; public class TPolicyPathBased implements gov.lbl.srm.policy.ISRMSelectionPolicy { Object[] _itemArray = null; Boolean _configProcessed = false; HashMap<String, Object> _pathMaps = null; private static Properties getConfigProperties(String configFileName) throws java.io.IOException { Properties prop = new Properties(); java.io.FileInputStream configFile = null; try { configFile = new java.io.FileInputStream(configFileName); prop.load(configFile); return prop; } catch (java.io.IOException e) { System.err.println("Error reading config file: " + e.getMessage()); throw new RuntimeException("Error reading config file: " + e.getMessage()); } finally { if (configFile != null) { configFile.close(); } } } public Object getNext(Object hint) { TSRMLog.debug(this.getClass(), null, "event=getNext(hint)", "hint: " + hint.toString()); if (_itemArray.length > 0 && _configProcessed == false) { TSRMLog.debug(this.getClass(), null, "event=getNext(hint)", "Creating path maps"); TSRMLog.debug(this.getClass(), null, "event=getNext(hint)", "Using " + _itemArray[0] + " as default host"); try { Properties prop = getConfigProperties(gov.lbl.srm.server.Config._configFileNameLoaded); TSRMLog.debug(this.getClass(), null, "event=getNext(hint)", "Config loaded"); String mappingPolicy = prop.getProperty("pathMapping"); String[] pathMaps = mappingPolicy.split(";"); _pathMaps = new HashMap<String, Object>(); for (String pathMap : pathMaps) { TSRMLog.debug(this.getClass(), null, "event=getNext(hint)", "pathMap: " + pathMap); Object mapHost = null; String[] mapParts = pathMap.split("="); if (mapParts.length != 2) { System.err.println("Error: Invalid path map " + pathMap); } else { TSRMLog.debug(this.getClass(), null, "event=getNext(hint)", "Path: " + mapParts[0] + ", Host: " + mapParts[1]); for (Object host : _itemArray) { if (host.toString().equals(mapParts[1])) { mapHost = mapParts[1]; } } if (mapHost == null) { TSRMLog.debug(this.getClass(), null, "event=getNext(hint)", "No match found for " + mapParts[1] + ", using default " + _itemArray[0]); mapHost = _itemArray[0]; } TSRMLog.debug(this.getClass(), null, "event=getNext(hint)", "Adding map from path " + mapParts[0] + " to host " + mapHost); _pathMaps.put(mapParts[0], mapHost); } } _configProcessed = true; } catch (IOException e) { System.err.println("Failed to get config: " + e.getMessage()); } } if (_pathMaps != null) { for (Map.Entry<String, Object> pathMap : _pathMaps.entrySet()) { TSRMLog.debug(this.getClass(), null, "event=getNext(hint)", "Path: " + pathMap.getKey() + ", Host: " + pathMap.getValue()); if (hint.toString().startsWith(pathMap.getKey())) { TSRMLog.debug(this.getClass(), null, "event=getNext(hint)", "Found match: Setting host to " + pathMap.getValue()); return pathMap.getValue(); } } } TSRMLog.debug(this.getClass(), null, "event=getNext(hint)", "No match found: Returning default host"); return getNext(); } public Object getNext() { Object result = null; if (_itemArray != null) { result = _itemArray[0]; } return result; } public void setItems(Object[] col) { _itemArray = col; _configProcessed = false; TSRMLog.debug(this.getClass(), null, "event=setItems", "Got host(s) " + Arrays.toString(_itemArray)); } public String[] displayContents() { String[] contents = new String[_itemArray.length]; for (int i = 0; i < _itemArray.length; i++) { contents[i] = _itemArray[i].toString(); } TSRMLog.debug(this.getClass(), null, "event=displayContents", "Returning host(s) " + Arrays.toString(_itemArray)); return contents; } }
[1051]nick@gc1-se:/data/malarkey/bestman2> javac -Xlint:unchecked -cp /usr/share/java/bestman2/bestman2.jar policy/TPolicyPathBased.java && jar cf policy.jar policy/*.class && cp -p policy.jar ~/git/cfengine/build/linux/usr/share/java/bestman2/plugin/
Obtain VOMS proxy
[1005]nick@gc1-ce:~> voms-proxy-init -voms cms:/cms/Role=cmsuser
Enter GRID pass phrase: Your identity: /DC=org/DC=doegrids/OU=People/CN=Nick Bertrand 25750 Creating temporary proxy ........................................ Done Contacting lcg-voms.cern.ch:15002 [/DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch] "cms" Done Creating proxy .......................................... Done Your proxy is valid until Fri Oct 26 05:30:29 2012
Validate proxy info
[1006]nick@gc1-ce:~> voms-proxy-info -all
subject : /DC=org/DC=doegrids/OU=People/CN=Nick Bertrand 25750/CN=proxy issuer : /DC=org/DC=doegrids/OU=People/CN=Nick Bertrand 25750 identity : /DC=org/DC=doegrids/OU=People/CN=Nick Bertrand 25750 type : proxy strength : 1024 bits path : /tmp/x509up_u5572 timeleft : 11:59:56 key usage : Digital Signature, Key Encipherment, Data Encipherment === VO cms extension information === VO : cms subject : /DC=org/DC=doegrids/OU=People/CN=Nick Bertrand 25750 issuer : /DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch attribute : /cms/Role=cmsuser/Capability=NULL attribute : /cms/Role=NULL/Capability=NULL attribute : /cms/uscms/Role=NULL/Capability=NULL timeleft : 11:59:56 uri : lcg-voms.cern.ch:15002
Transfer a file via GridFTP
[1018]nick@gc1-ce:/hdfs/cms> echo test > /tmp/test.txt [1019]nick@gc1-ce:/hdfs/cms> cat /tmp/test.txt test [1020]nick@gc1-ce:/hdfs/cms> globus-url-copy file:///tmp/test.txt gsiftp://gc1-se.spa.umn.edu:2811/hdfs/cms/user/cmsuser/test.txt [1021]nick@gc1-ce:/hdfs/cms> globus-url-copy gsiftp://gc1-se.spa.umn.edu:2811/hdfs/cms/user/cmsuser/test.txt file:///tmp/test-received.txt [1022]nick@gc1-ce:/hdfs/cms> cat /tmp/test-received.txt test [1023]nick@gc1-ce:/hdfs/cms> hadoop fs -ls /user/cmsuser/test.txt Found 1 items -rw-r--r-- 3 cmsuser hep 5 2012-10-25 17:59 /user/cmsuser/test.txt
Ping BeStMan SRM
[1013]nick@gc1-ce:/hdfs/cms> srm-ping -serviceurl srm://gc1-se.spa.umn.edu:8443/srm/v2/server
srm-ping 2.2.2.2.1 Wed May 9 09:46:08 PDT 2012 BeStMan and SRM-Clients Copyright(c) 2007-2012, Lawrence Berkeley National Laboratory. All rights reserved. Support at SRM@LBL.GOV and documents at http://sdm.lbl.gov/bestman Built on dm.lbl.gov 128.3.30.104 at 05/09/2012 09:49:24 PDT Built on ${myhost.NAME}.${myhost.DOMAIN} ${myhost.ADDR4} at 07/25/2012 16:05:58 CDT SRM-CLIENT: Connecting to serviceurl httpg://gc1-se.spa.umn.edu:8443/srm/v2/server SRM-PING: Thu Oct 25 17:55:00 CDT 2012 Calling SrmPing Request... versionInfo=v2.2 Extra information (Key=Value) backend_type=BeStMan backend_version=2.2.2.2.0 backend_build_date=2012-07-25T21:05:58.000Z gsiftpTxfServers[0]=gsiftp://gc1-se.spa.umn.edu GatewayMode=Enabled clientDN=/DC=org/DC=doegrids/OU=People/CN=Nick Bertrand 25750 gumsIDMapped=cmsuser
Transfer a file via BeStMan SRM
[1025]nick@gc1-ce:/hdfs/cms> srm-copy file:///tmp/test.txt srm://gc1-se.spa.umn.edu:8443/srm/v2/server\?SFN=/hdfs/cms/user/cmsuser/test_srm.txt
srm-copy 2.2.2.2.1 Wed May 9 09:46:08 PDT 2012 BeStMan and SRM-Clients Copyright(c) 2007-2012, Lawrence Berkeley National Laboratory. All rights reserved. Support at SRM@LBL.GOV and documents at http://sdm.lbl.gov/bestman Built on dm.lbl.gov 128.3.30.104 at 05/09/2012 09:49:24 PDT Built on ${myhost.NAME}.${myhost.DOMAIN} ${myhost.ADDR4} at 07/25/2012 16:05:58 CDT SRM-CLIENT: Thu Oct 25 18:07:00 CDT 2012 Connecting to httpg://gc1-se.spa.umn.edu:8443/srm/v2/server SRM-CLIENT: Thu Oct 25 18:07:01 CDT 2012 Calling SrmPrepareToPutRequest now ... request.token= put:1 Request.status=SRM_SUCCESS explanation=null SRM-CLIENT: RequestFileStatus for SURL=file:///tmp/test.txt is Ready. SRM-CLIENT: received TURL=gsiftp://gc1-se.spa.umn.edu//hdfs/cms/user/cmsuser/test_srm.txt SRM-CLIENT: Thu Oct 25 18:07:09 CDT 2012 start file transfer SRM-CLIENT:Source=file:////tmp/test.txt SRM-CLIENT:Target=gsiftp://gc1-se.spa.umn.edu//hdfs/cms/user/cmsuser/test_srm.txt SRM-CLIENT: Thu Oct 25 18:07:13 CDT 2012 end file transfer for file:///tmp/test.txt SRM-CLIENT: Thu Oct 25 18:07:13 CDT 2012 Calling putDone for srm://gc1-se.spa.umn.edu:8443/srm/v2/server?SFN=/hdfs/cms/user/cmsuser/test_srm.txt Result.status=SRM_SUCCESS Result.Explanation=null SRM-CLIENT: Request completed with success SRM-CLIENT: Printing text report now ... SRM-CLIENT*REQUESTTYPE=put SRM-CLIENT*TOTALFILES=1 SRM-CLIENT*TOTAL_SUCCESS=1 SRM-CLIENT*TOTAL_FAILED=0 SRM-CLIENT*REQUEST_TOKEN=put:1 SRM-CLIENT*REQUEST_STATUS=SRM_SUCCESS SRM-CLIENT*SOURCEURL[0]=file:///tmp/test.txt SRM-CLIENT*TARGETURL[0]=srm://gc1-se.spa.umn.edu:8443/srm/v2/server?SFN=/hdfs/cms/user/cmsuser/test_srm.txt SRM-CLIENT*TRANSFERURL[0]=gsiftp://gc1-se.spa.umn.edu//hdfs/cms/user/cmsuser/test_srm.txt SRM-CLIENT*ACTUALSIZE[0]=5 SRM-CLIENT*FILE_STATUS[0]=SRM_SPACE_AVAILABLE SRM-CLIENT*EXPLANATION[0]=SRM-CLIENT: PutDone is called successfully
[1026]nick@gc1-ce:/hdfs/cms> srm-copy srm://gc1-se.spa.umn.edu:8443/srm/v2/server\?SFN=/hdfs/cms/user/cmsuser/test_srm.txt file:///tmp/test-srm.txt
srm-copy 2.2.2.2.1 Wed May 9 09:46:08 PDT 2012 BeStMan and SRM-Clients Copyright(c) 2007-2012, Lawrence Berkeley National Laboratory. All rights reserved. Support at SRM@LBL.GOV and documents at http://sdm.lbl.gov/bestman Built on dm.lbl.gov 128.3.30.104 at 05/09/2012 09:49:24 PDT Built on ${myhost.NAME}.${myhost.DOMAIN} ${myhost.ADDR4} at 07/25/2012 16:05:58 CDT SRM-CLIENT: Thu Oct 25 18:07:34 CDT 2012 Connecting to httpg://gc1-se.spa.umn.edu:8443/srm/v2/server SRM-CLIENT: Thu Oct 25 18:07:35 CDT 2012 Calling SrmPrepareToGet Request now ... request.token= get:2 Request.status=SRM_SUCCESS Request.explanation=null SRM-CLIENT: RequestFileStatus for SURL=srm://gc1-se.spa.umn.edu:8443/srm/v2/server?SFN=/hdfs/cms/user/cmsuser/test_srm.txt is Ready. SRM-CLIENT: received TURL=gsiftp://gc1-se.spa.umn.edu//hdfs/cms/user/cmsuser/test_srm.txt SRM-CLIENT: Thu Oct 25 18:07:44 CDT 2012 start file transfer SRM-CLIENT:Source=gsiftp://gc1-se.spa.umn.edu//hdfs/cms/user/cmsuser/test_srm.txt SRM-CLIENT:Target=file:////tmp/test-srm.txt SRM-CLIENT: Thu Oct 25 18:07:48 CDT 2012 end file transfer for srm://gc1-se.spa.umn.edu:8443/srm/v2/server?SFN=/hdfs/cms/user/cmsuser/test_srm.txt SRM-CLIENT: Thu Oct 25 18:07:48 CDT 2012 Calling releaseFile SRM-CLIENT: ...Calling srmReleaseFiles... status=SRM_SUCCESS explanation=null status=SRM_SUCCESS explanation=null SRM-CLIENT: Request completed with success SRM-CLIENT: Printing text report now ... SRM-CLIENT*REQUESTTYPE=get SRM-CLIENT*TOTALFILES=1 SRM-CLIENT*TOTAL_SUCCESS=1 SRM-CLIENT*TOTAL_FAILED=0 SRM-CLIENT*REQUEST_TOKEN=get:2 SRM-CLIENT*REQUEST_STATUS=SRM_SUCCESS SRM-CLIENT*SOURCEURL[0]=srm://gc1-se.spa.umn.edu:8443/srm/v2/server?SFN=/hdfs/cms/user/cmsuser/test_srm.txt SRM-CLIENT*TARGETURL[0]=file:///tmp/test-srm.txt SRM-CLIENT*TRANSFERURL[0]=gsiftp://gc1-se.spa.umn.edu//hdfs/cms/user/cmsuser/test_srm.txt SRM-CLIENT*ACTUALSIZE[0]=5 SRM-CLIENT*FILE_STATUS[0]=SRM_FILE_PINNED
[1027]nick@gc1-ce:/hdfs/cms> cat /tmp/test-srm.txt
test
BeStMan requires the host certificate private key to be in RSA format (the key should start -----BEGIN RSA PRIVATE KEY-----
, not -----BEGIN PRIVATE KEY-----
). To convert the key generated by the cert-request
utility to RSA format:
openssl rsa -in hostkey.pem -out hostkey.pem
globus-url-copy
will fail with permission denied if the remote filesystem doesn't have a /cksums
directory writable by root.
The client issuing the globus-url-copy
command will see the following:
[1048]nick@gc1-se:~> globus-url-copy file:///tmp/test.txt gsiftp://gc1-se.spa.umn.edu:2811/hdfs/cms/user/cmsuser/test.txt error: globus_ftp_client: the server responded with an error 500 500-Command failed. : System error in Failed to open checksum file (host=gc1-se.spa.umn.edu, user=cmsuser, path=/user/cmsuser/test.txt): Permission denied 500-A system call failed: Permission denied 500 End.
The gridftp server destination will see the following in /var/log/gridftp-auth.log
:
[25293] Thu Oct 25 17:39:51 2012 :: Configuration read from /etc/gridftp-hdfs/gridftp.conf. [25293] Thu Oct 25 17:39:51 2012 :: Server started in inetd mode. [25293] Thu Oct 25 17:39:51 2012 :: New connection from: gc1-se.spa.umn.edu:46784 [25293] Thu Oct 25 17:39:52 2012 :: Max memory buffer count: 200. [25293] Thu Oct 25 17:39:52 2012 :: Max file buffer count: 1500. [25293] Thu Oct 25 17:39:52 2012 :: Checking current load on the server. [25293] Thu Oct 25 17:39:52 2012 :: Start gridftp server; hadoop nameserver hadoop-name, port 9000, replicas 3. [25293] Thu Oct 25 17:39:53 2012 :: Checksum algorithms in use: MD5,ADLER32,CRC32,CKSUM. [25293] Thu Oct 25 17:39:53 2012 :: Cannot set rlimits due to Unknown error 18446744073709551615. [25293] Thu Oct 25 17:39:53 2012 :: DN /DC=org/DC=doegrids/OU=People/CN=Nick Bertrand 25750 successfully authorized. [25293] Thu Oct 25 17:39:53 2012 :: User cmsuser successfully authorized. [25293] Thu Oct 25 17:39:53 2012 :: Going to do stat on file /user/cmsuser/test.txt. [25293] Thu Oct 25 17:39:53 2012 :: We are going to open file /user/cmsuser/test.txt. [25293] Thu Oct 25 17:39:53 2012 :: Open file /user/cmsuser/test.txt with 3 replicas. [25293] Thu Oct 25 17:39:53 2012 :: Successfully opened file /user/cmsuser/test.txt for user cmsuser. [25293] Thu Oct 25 17:39:53 2012 :: Starting to transfer "/hdfs/cms/user/cmsuser/test.txt". [25293] Thu Oct 25 17:39:53 2012 :: receive 1 blocks of size 5 bytes [25293] Thu Oct 25 17:39:53 2012 :: Trying to close file in HDFS; zero outstanding blocks. [25293] Thu Oct 25 17:39:55 2012 :: receive 1 blocks of size 0 bytes [25293] Thu Oct 25 17:39:55 2012 :: Checksum CKSUM: 935282863 [25293] Thu Oct 25 17:39:55 2012 :: Checksum ADLER32: 062801cb [25293] Thu Oct 25 17:39:55 2012 :: Checksum MD5: d8e8fca2dc0f896fd7cb4cb0031ba249 [25293] Thu Oct 25 17:39:55 2012 :: Checksum CRC32: 1001993670 [25293] Thu Oct 25 17:39:55 2012 :: Failed to open checksum file (host=gc1-se.spa.umn.edu, user=cmsuser, path=/user/cmsuser/test.txt) [25293] Thu Oct 25 17:39:55 2012 :: Failure attempting to transfer "/hdfs/cms/user/cmsuser/test.txt". [25293] Thu Oct 25 17:39:55 2012 :: Transfer failure: System error in Failed to open checksum file (host=gc1-se.spa.umn.edu, user=cmsuser, path=/user/cmsuser/test.txt): Permission denied A system call failed: Permission denied
In the case of a remote hadoop filesystem, the following commands will remedy the problem:
hadoop fs -mkdir /cksums hadoop fs -chown root /cksums
Condor-G jobs fail with the error 'Reason: 73 the job manager failed to open stdout' if the job's output directory is located on HDFS. Moving the rsv
user's home directory off of HDFS resolves this issue. Full error details:
Running metric org.osg.gratia.hadoop-transfer (9 of 18) metricName: org.osg.gratia.hadoop-transfer metricType: status timestamp: 2012-10-26 19:36:15 CDT metricStatus: CRITICAL serviceType: OSG-CE serviceURI: gc1-ce.spa.umn.edu gatheredAt: gc1-hn.spa.umn.edu summaryData: CRITICAL detailsData: Condor-G submission failed to remote host Condor log file: 000 (144.000.000) 10/26 19:36:05 Job submitted from host: <127.0.0.1:53796> ... 018 (144.000.000) 10/26 19:36:10 Globus job submission failed! Reason: 73 the job manager failed to open stdout ... EOT
Hostname: gc1-hn.spa.umn.edu
Services: Condor master, GUMS, RSV
Primary packages: osg-ca-certs condor osg-gums fetch-crl rsv
Installation docs: Install RSV, GUMS (Grid User Mapping Service) Install