Go to the U of M home page
School of Physics & Astronomy
School of Physics and Astronomy Wiki

User Tools


groups:osg:cms:install_notes

This is an old revision of the document!


OSG Installation Notes

Installation overview of a basic 'Tier 3' installation configured for use by CMS. Initial planning based on OSG Tier3 Twiki. Actual installation process based on OSG Software 3 installation documents. Most configuration handled in cfengine/cf.osg.

Compute Element (CE)

Hostname: gc1-ce.spa.umn.edu

Services: GridFTP server, GRAM, CEMon

Primary packages: osg-ca-certs osg-gridftp-hdfs osg-ce-condor globus-gram-job-manager-managedfork

Installation docs: Installing the Compute Element

Storage Element (SE)

Hostname: gc1-se.spa.umn.edu

Services: GridFTP server, BeStMan SRM

Primary packages: osg-ca-certs osg-gridftp-hdfs bestman2-server

Installation docs: Hadoop 20

BeStMan GridFTP Plugin

To allow BeStMan to direct requests to specific GridFTP servers based on pathnames, a protocol selection plugin is used. The plugin is based on the Bestman Gridftp Path Plugin.

Source

TPolicyPathBased.java
package policy;
 
import gov.lbl.srm.util.TSRMLog;
import java.io.IOException;
import java.util.*;
 
public class TPolicyPathBased implements gov.lbl.srm.policy.ISRMSelectionPolicy {
    Object[] _itemArray = null;
    Boolean _configProcessed = false;
    HashMap<String, Object> _pathMaps = null;
 
    private static Properties getConfigProperties(String configFileName) throws java.io.IOException {
        Properties prop = new Properties();
        java.io.FileInputStream configFile = null;
        try {
            configFile = new java.io.FileInputStream(configFileName);
            prop.load(configFile);
            return prop;
        } catch (java.io.IOException e) {
            System.err.println("Error reading config file: " + e.getMessage());
            throw new RuntimeException("Error reading config file: " + e.getMessage());
        } finally {
            if (configFile != null) {
                configFile.close();
            }
        }
    }
 
    public Object getNext(Object hint) {
        TSRMLog.debug(this.getClass(), null, "event=getNext(hint)", "hint: " + hint.toString());
        if (_itemArray.length > 0 && _configProcessed == false) {
            TSRMLog.debug(this.getClass(), null, "event=getNext(hint)", "Creating path maps");
            TSRMLog.debug(this.getClass(), null, "event=getNext(hint)", "Using " + _itemArray[0] + " as default host");
            try {
                Properties prop = getConfigProperties(gov.lbl.srm.server.Config._configFileNameLoaded);
                TSRMLog.debug(this.getClass(), null, "event=getNext(hint)", "Config loaded");
                String mappingPolicy = prop.getProperty("pathMapping");
                String[] pathMaps = mappingPolicy.split(";");
                _pathMaps = new HashMap<String, Object>();
                for (String pathMap : pathMaps) {
                    TSRMLog.debug(this.getClass(), null, "event=getNext(hint)", "pathMap: " + pathMap);
                    Object mapHost = null;
                    String[] mapParts = pathMap.split("=");
                    if (mapParts.length != 2) {
                        System.err.println("Error: Invalid path map " + pathMap);
                    }
                    else {
                        TSRMLog.debug(this.getClass(), null, "event=getNext(hint)", "Path: " + mapParts[0] + ", Host: " + mapParts[1]);
                        for (Object host : _itemArray) {
                            if (host.toString().equals(mapParts[1])) {
                                mapHost = mapParts[1];
                            }
                        }
                        if (mapHost == null) {
                            TSRMLog.debug(this.getClass(), null, "event=getNext(hint)", "No match found for " + mapParts[1] + ", using default " + _itemArray[0]);
                            mapHost = _itemArray[0];
                        }
                        TSRMLog.debug(this.getClass(), null, "event=getNext(hint)", "Adding map from path " + mapParts[0] + " to host " + mapHost);
                        _pathMaps.put(mapParts[0], mapHost);
                    }
                }
                _configProcessed = true;
            }
            catch (IOException e) {
                System.err.println("Failed to get config: " + e.getMessage());
            }
        }
        if (_pathMaps != null) {
            for (Map.Entry<String, Object> pathMap : _pathMaps.entrySet()) {
                TSRMLog.debug(this.getClass(), null, "event=getNext(hint)", "Path: " + pathMap.getKey() + ", Host: " + pathMap.getValue());
                if (hint.toString().startsWith(pathMap.getKey())) {
                    TSRMLog.debug(this.getClass(), null, "event=getNext(hint)", "Found match: Setting host to " + pathMap.getValue());
                    return pathMap.getValue();
                }
            }
        }
        TSRMLog.debug(this.getClass(), null, "event=getNext(hint)", "No match found: Returning default host");
        return getNext();
    }
 
    public Object getNext() {
        Object result = null;
        if (_itemArray != null) {
            result = _itemArray[0];
        }
        return result;
    }
 
    public void setItems(Object[] col) {
        _itemArray = col;
        _configProcessed = false;
        TSRMLog.debug(this.getClass(), null, "event=setItems", "Got host(s) " + Arrays.toString(_itemArray));
    }
 
    public String[] displayContents() {
        String[] contents = new String[_itemArray.length];
        for (int i = 0; i < _itemArray.length; i++) {
            contents[i] = _itemArray[i].toString();
        }
        TSRMLog.debug(this.getClass(), null, "event=displayContents", "Returning host(s) " + Arrays.toString(_itemArray));
        return contents;
    }
}

Compiling and packaging

[1051]nick@gc1-se:/data/malarkey/bestman2> javac -Xlint:unchecked -cp /usr/share/java/bestman2/bestman2.jar policy/TPolicyPathBased.java && jar cf policy.jar policy/*.class && cp -p policy.jar ~/git/cfengine/build/linux/usr/share/java/bestman2/plugin/

Validation

Obtain VOMS proxy

[1005]nick@gc1-ce:~> voms-proxy-init -voms cms:/cms/Role=cmsuser
Enter GRID pass phrase:
Your identity: /DC=org/DC=doegrids/OU=People/CN=Nick Bertrand 25750
Creating temporary proxy ........................................ Done
Contacting  lcg-voms.cern.ch:15002 [/DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch] "cms" Done
Creating proxy .......................................... Done

Your proxy is valid until Fri Oct 26 05:30:29 2012

Validate proxy info

[1006]nick@gc1-ce:~> voms-proxy-info -all
subject   : /DC=org/DC=doegrids/OU=People/CN=Nick Bertrand 25750/CN=proxy
issuer    : /DC=org/DC=doegrids/OU=People/CN=Nick Bertrand 25750
identity  : /DC=org/DC=doegrids/OU=People/CN=Nick Bertrand 25750
type      : proxy
strength  : 1024 bits
path      : /tmp/x509up_u5572
timeleft  : 11:59:56
key usage : Digital Signature, Key Encipherment, Data Encipherment
=== VO cms extension information ===
VO        : cms
subject   : /DC=org/DC=doegrids/OU=People/CN=Nick Bertrand 25750
issuer    : /DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch
attribute : /cms/Role=cmsuser/Capability=NULL
attribute : /cms/Role=NULL/Capability=NULL
attribute : /cms/uscms/Role=NULL/Capability=NULL
timeleft  : 11:59:56
uri       : lcg-voms.cern.ch:15002

Transfer a file via GridFTP

[1018]nick@gc1-ce:/hdfs/cms> echo test > /tmp/test.txt
[1019]nick@gc1-ce:/hdfs/cms> cat /tmp/test.txt
test
[1020]nick@gc1-ce:/hdfs/cms> globus-url-copy file:///tmp/test.txt gsiftp://gc1-se.spa.umn.edu:2811/hdfs/cms/user/cmsuser/test.txt
[1021]nick@gc1-ce:/hdfs/cms> globus-url-copy gsiftp://gc1-se.spa.umn.edu:2811/hdfs/cms/user/cmsuser/test.txt file:///tmp/test-received.txt
[1022]nick@gc1-ce:/hdfs/cms> cat /tmp/test-received.txt
test
[1023]nick@gc1-ce:/hdfs/cms> hadoop fs -ls /user/cmsuser/test.txt
Found 1 items
-rw-r--r--   3 cmsuser hep          5 2012-10-25 17:59 /user/cmsuser/test.txt

Ping BeStMan SRM

[1013]nick@gc1-ce:/hdfs/cms> srm-ping -serviceurl srm://gc1-se.spa.umn.edu:8443/srm/v2/server
srm-ping   2.2.2.2.1  Wed May  9 09:46:08 PDT 2012
BeStMan and SRM-Clients Copyright(c) 2007-2012,
Lawrence Berkeley National Laboratory. All rights reserved.
Support at SRM@LBL.GOV and documents at http://sdm.lbl.gov/bestman

  
Built on dm.lbl.gov 128.3.30.104 at 05/09/2012 09:49:24 PDT
       
Built on ${myhost.NAME}.${myhost.DOMAIN} ${myhost.ADDR4} at 07/25/2012 16:05:58 CDT
     
SRM-CLIENT: Connecting to serviceurl httpg://gc1-se.spa.umn.edu:8443/srm/v2/server

SRM-PING: Thu Oct 25 17:55:00 CDT 2012  Calling SrmPing Request...
versionInfo=v2.2

Extra information (Key=Value)
backend_type=BeStMan
backend_version=2.2.2.2.0
backend_build_date=2012-07-25T21:05:58.000Z 
gsiftpTxfServers[0]=gsiftp://gc1-se.spa.umn.edu
GatewayMode=Enabled
clientDN=/DC=org/DC=doegrids/OU=People/CN=Nick Bertrand 25750
gumsIDMapped=cmsuser

Transfer a file via BeStMan SRM

[1025]nick@gc1-ce:/hdfs/cms> srm-copy file:///tmp/test.txt srm://gc1-se.spa.umn.edu:8443/srm/v2/server\?SFN=/hdfs/cms/user/cmsuser/test_srm.txt
srm-copy   2.2.2.2.1  Wed May  9 09:46:08 PDT 2012
BeStMan and SRM-Clients Copyright(c) 2007-2012,
Lawrence Berkeley National Laboratory. All rights reserved.
Support at SRM@LBL.GOV and documents at http://sdm.lbl.gov/bestman

  
Built on dm.lbl.gov 128.3.30.104 at 05/09/2012 09:49:24 PDT
       
Built on ${myhost.NAME}.${myhost.DOMAIN} ${myhost.ADDR4} at 07/25/2012 16:05:58 CDT
     
SRM-CLIENT: Thu Oct 25 18:07:00 CDT 2012 Connecting to httpg://gc1-se.spa.umn.edu:8443/srm/v2/server

SRM-CLIENT: Thu Oct 25 18:07:01 CDT 2012 Calling SrmPrepareToPutRequest now ...
request.token= put:1
Request.status=SRM_SUCCESS
explanation=null

SRM-CLIENT: RequestFileStatus for SURL=file:///tmp/test.txt is Ready.
SRM-CLIENT: received TURL=gsiftp://gc1-se.spa.umn.edu//hdfs/cms/user/cmsuser/test_srm.txt

SRM-CLIENT: Thu Oct 25 18:07:09 CDT 2012 start file transfer
SRM-CLIENT:Source=file:////tmp/test.txt
SRM-CLIENT:Target=gsiftp://gc1-se.spa.umn.edu//hdfs/cms/user/cmsuser/test_srm.txt

SRM-CLIENT: Thu Oct 25 18:07:13 CDT 2012 end file transfer for file:///tmp/test.txt

SRM-CLIENT: Thu Oct 25 18:07:13 CDT 2012 Calling putDone for srm://gc1-se.spa.umn.edu:8443/srm/v2/server?SFN=/hdfs/cms/user/cmsuser/test_srm.txt
Result.status=SRM_SUCCESS
Result.Explanation=null

SRM-CLIENT: Request completed with success

SRM-CLIENT: Printing text report now ...

SRM-CLIENT*REQUESTTYPE=put
SRM-CLIENT*TOTALFILES=1
SRM-CLIENT*TOTAL_SUCCESS=1
SRM-CLIENT*TOTAL_FAILED=0
SRM-CLIENT*REQUEST_TOKEN=put:1
SRM-CLIENT*REQUEST_STATUS=SRM_SUCCESS
SRM-CLIENT*SOURCEURL[0]=file:///tmp/test.txt
SRM-CLIENT*TARGETURL[0]=srm://gc1-se.spa.umn.edu:8443/srm/v2/server?SFN=/hdfs/cms/user/cmsuser/test_srm.txt
SRM-CLIENT*TRANSFERURL[0]=gsiftp://gc1-se.spa.umn.edu//hdfs/cms/user/cmsuser/test_srm.txt
SRM-CLIENT*ACTUALSIZE[0]=5
SRM-CLIENT*FILE_STATUS[0]=SRM_SPACE_AVAILABLE
SRM-CLIENT*EXPLANATION[0]=SRM-CLIENT: PutDone is called successfully
[1026]nick@gc1-ce:/hdfs/cms> srm-copy srm://gc1-se.spa.umn.edu:8443/srm/v2/server\?SFN=/hdfs/cms/user/cmsuser/test_srm.txt file:///tmp/test-srm.txt
srm-copy   2.2.2.2.1  Wed May  9 09:46:08 PDT 2012
BeStMan and SRM-Clients Copyright(c) 2007-2012,
Lawrence Berkeley National Laboratory. All rights reserved.
Support at SRM@LBL.GOV and documents at http://sdm.lbl.gov/bestman

  
Built on dm.lbl.gov 128.3.30.104 at 05/09/2012 09:49:24 PDT
       
Built on ${myhost.NAME}.${myhost.DOMAIN} ${myhost.ADDR4} at 07/25/2012 16:05:58 CDT
     
SRM-CLIENT: Thu Oct 25 18:07:34 CDT 2012 Connecting to httpg://gc1-se.spa.umn.edu:8443/srm/v2/server

SRM-CLIENT: Thu Oct 25 18:07:35 CDT 2012 Calling SrmPrepareToGet Request now ...
request.token= get:2

Request.status=SRM_SUCCESS
Request.explanation=null

SRM-CLIENT: RequestFileStatus for SURL=srm://gc1-se.spa.umn.edu:8443/srm/v2/server?SFN=/hdfs/cms/user/cmsuser/test_srm.txt is Ready.
SRM-CLIENT: received TURL=gsiftp://gc1-se.spa.umn.edu//hdfs/cms/user/cmsuser/test_srm.txt

SRM-CLIENT: Thu Oct 25 18:07:44 CDT 2012 start file transfer
SRM-CLIENT:Source=gsiftp://gc1-se.spa.umn.edu//hdfs/cms/user/cmsuser/test_srm.txt
SRM-CLIENT:Target=file:////tmp/test-srm.txt

SRM-CLIENT: Thu Oct 25 18:07:48 CDT 2012 end file transfer for srm://gc1-se.spa.umn.edu:8443/srm/v2/server?SFN=/hdfs/cms/user/cmsuser/test_srm.txt

SRM-CLIENT: Thu Oct 25 18:07:48 CDT 2012 Calling releaseFile

SRM-CLIENT:  ...Calling srmReleaseFiles...
        status=SRM_SUCCESS
        explanation=null
        status=SRM_SUCCESS
        explanation=null

SRM-CLIENT: Request completed with success

SRM-CLIENT: Printing text report now ...

SRM-CLIENT*REQUESTTYPE=get
SRM-CLIENT*TOTALFILES=1
SRM-CLIENT*TOTAL_SUCCESS=1
SRM-CLIENT*TOTAL_FAILED=0
SRM-CLIENT*REQUEST_TOKEN=get:2
SRM-CLIENT*REQUEST_STATUS=SRM_SUCCESS
SRM-CLIENT*SOURCEURL[0]=srm://gc1-se.spa.umn.edu:8443/srm/v2/server?SFN=/hdfs/cms/user/cmsuser/test_srm.txt
SRM-CLIENT*TARGETURL[0]=file:///tmp/test-srm.txt
SRM-CLIENT*TRANSFERURL[0]=gsiftp://gc1-se.spa.umn.edu//hdfs/cms/user/cmsuser/test_srm.txt
SRM-CLIENT*ACTUALSIZE[0]=5
SRM-CLIENT*FILE_STATUS[0]=SRM_FILE_PINNED
[1027]nick@gc1-ce:/hdfs/cms> cat /tmp/test-srm.txt 
test

Troubleshooting

BeStMan fails to start

BeStMan requires the host certificate private key to be in RSA format (the key should start -----BEGIN RSA PRIVATE KEY-----, not -----BEGIN PRIVATE KEY-----). To convert the key generated by the cert-request utility to RSA format:

openssl rsa -in hostkey.pem -out hostkey.pem

globus-url-copy fails with permission denied

globus-url-copy will fail with permission denied if the remote filesystem doesn't have a /cksums directory writable by root.

The client issuing the globus-url-copy command will see the following:

[1048]nick@gc1-se:~> globus-url-copy file:///tmp/test.txt gsiftp://gc1-se.spa.umn.edu:2811/hdfs/cms/user/cmsuser/test.txt

error: globus_ftp_client: the server responded with an error
500 500-Command failed. : System error in Failed to open checksum file (host=gc1-se.spa.umn.edu, user=cmsuser, path=/user/cmsuser/test.txt): Permission denied
500-A system call failed: Permission denied
500 End.

The gridftp server destination will see the following in /var/log/gridftp-auth.log:

[25293] Thu Oct 25 17:39:51 2012 :: Configuration read from /etc/gridftp-hdfs/gridftp.conf.
[25293] Thu Oct 25 17:39:51 2012 :: Server started in inetd mode.
[25293] Thu Oct 25 17:39:51 2012 :: New connection from: gc1-se.spa.umn.edu:46784
[25293] Thu Oct 25 17:39:52 2012 :: Max memory buffer count: 200.
[25293] Thu Oct 25 17:39:52 2012 :: Max file buffer count: 1500.
[25293] Thu Oct 25 17:39:52 2012 :: Checking current load on the server.
[25293] Thu Oct 25 17:39:52 2012 :: Start gridftp server; hadoop nameserver hadoop-name, port 9000, replicas 3.
[25293] Thu Oct 25 17:39:53 2012 :: Checksum algorithms in use: MD5,ADLER32,CRC32,CKSUM.
[25293] Thu Oct 25 17:39:53 2012 :: Cannot set rlimits due to Unknown error 18446744073709551615.
[25293] Thu Oct 25 17:39:53 2012 :: DN /DC=org/DC=doegrids/OU=People/CN=Nick Bertrand 25750 successfully authorized.
[25293] Thu Oct 25 17:39:53 2012 :: User cmsuser successfully authorized.
[25293] Thu Oct 25 17:39:53 2012 :: Going to do stat on file /user/cmsuser/test.txt.
[25293] Thu Oct 25 17:39:53 2012 :: We are going to open file /user/cmsuser/test.txt.
[25293] Thu Oct 25 17:39:53 2012 :: Open file /user/cmsuser/test.txt with 3 replicas.
[25293] Thu Oct 25 17:39:53 2012 :: Successfully opened file /user/cmsuser/test.txt for user cmsuser.
[25293] Thu Oct 25 17:39:53 2012 :: Starting to transfer "/hdfs/cms/user/cmsuser/test.txt".
[25293] Thu Oct 25 17:39:53 2012 :: receive 1 blocks of size 5 bytes
[25293] Thu Oct 25 17:39:53 2012 :: Trying to close file in HDFS; zero outstanding blocks.
[25293] Thu Oct 25 17:39:55 2012 :: receive 1 blocks of size 0 bytes
[25293] Thu Oct 25 17:39:55 2012 :: Checksum CKSUM: 935282863
[25293] Thu Oct 25 17:39:55 2012 :: Checksum ADLER32: 062801cb
[25293] Thu Oct 25 17:39:55 2012 :: Checksum MD5: d8e8fca2dc0f896fd7cb4cb0031ba249
[25293] Thu Oct 25 17:39:55 2012 :: Checksum CRC32: 1001993670
[25293] Thu Oct 25 17:39:55 2012 :: Failed to open checksum file (host=gc1-se.spa.umn.edu, user=cmsuser, path=/user/cmsuser/test.txt)
[25293] Thu Oct 25 17:39:55 2012 :: Failure attempting to transfer "/hdfs/cms/user/cmsuser/test.txt".
[25293] Thu Oct 25 17:39:55 2012 :: Transfer failure:
System error in Failed to open checksum file (host=gc1-se.spa.umn.edu, user=cmsuser, path=/user/cmsuser/test.txt): Permission denied
A system call failed: Permission denied

In the case of a remote hadoop filesystem, the following commands will remedy the problem:

hadoop fs -mkdir /cksums
hadoop fs -chown root /cksums

RSV probes fail with 'Condor-G submission failed to remote host'

Condor-G jobs fail with the error 'Reason: 73 the job manager failed to open stdout' if the job's output directory is located on HDFS. Moving the rsv user's home directory off of HDFS resolves this issue. Full error details:

Running metric org.osg.gratia.hadoop-transfer (9 of 18)

metricName: org.osg.gratia.hadoop-transfer
metricType: status
timestamp: 2012-10-26 19:36:15 CDT
metricStatus: CRITICAL
serviceType: OSG-CE
serviceURI: gc1-ce.spa.umn.edu
gatheredAt: gc1-hn.spa.umn.edu
summaryData: CRITICAL
detailsData: Condor-G submission failed to remote host

Condor log file:
000 (144.000.000) 10/26 19:36:05 Job submitted from host: <127.0.0.1:53796>
...
018 (144.000.000) 10/26 19:36:10 Globus job submission failed!
    Reason: 73 the job manager failed to open stdout
...

EOT

Head Node (HN)

Hostname: gc1-hn.spa.umn.edu

Services: Condor master, GUMS, RSV

Primary packages: osg-ca-certs condor osg-gums fetch-crl rsv

Installation docs: Install RSV, GUMS (Grid User Mapping Service) Install

groups/osg/cms/install_notes.1364333242.txt.gz ยท Last modified: 2013/03/26 16:27 by nick