Getting files (roottuples, reconstructed, etc) from SAM:
> setup sam
> sam run project get_file.py
See get_file.py.

cab info:
http://www.nuhep.nwu.edu/~schellma/cab/cab_doc_v2.html
How to run a project on cab from d0mino:
> setup pbs
> sam run project GetSamFiles.py
(GetSamFiles.py has: sam_station = "cab")
> sam dump station --projects --station=cab

If an error occurs look at the error code:
SAM script: calling user script
/usr/products/sam_user/NULL/v4_0_18/src/python/run_project.py getroot.py
<<<<<< SAM Exception caught, status:
Simple Status:
Code: SEC_CannotGetNextFile (Category SAM Internal)
Severity level: ERROR
....
(Can't get next file...)

Simple Status:
Code: SEC_CannotEstablishConsumer (Category User)
(DBserver server error)

Simple commands to check "sam projects" and overall status:
> bjobs -u all | grep sam
> sam dump station --groups
> sam dump station --projects
> sam dump project --project=55337_sam_
To look at files being transfered: > sam dump fss
To see the station's status: > ps -fu sam | grep smaster
Look into the log files... (to know which log file look
at the file: d0mino_server_list.txt for station name.
To check web server go to the following address, it it loads ok then there shouldn't be any problems:
http://d0db.fnal.gov/sam_project_editor/DatasetEditor.html
Check sam basic: > sam locate fff


Storing files in SAM
See http://d0db.fnal.gov/sam/doc/userdocs/SamStore.html
See http://d0db.fnal.gov/sam/restart_how-tos/SamTrouble.html#sam_store
To see state of the server, including its known stagers and file store requests:
> sam dump fss

> sam get file store status reco_recocert_p10.03.00maxopt_lancs_pythia_hitv00+02+07+ttbar-incl_mb-poisson-2.5_264191315_2001
Simple Status:
Code: file not found
...
Storing web instructions:
http://www-d0.fnal.gov/~lueking/sam/UserFileStoreDetails.html

Instructions from Lee:
There are 39 files stuck in the fss queue. One of today's shifters needs to look into why they are stuck there. On D0mino, do "setup sam" then "sam dump fss" to get a list of files (this is also available on the sam at a glance page), then "sam get file store status " to get the full history. I happen to know that some have file name too long. Please make a list of these and send to the originator(s), or to d0-mcc mailing list. For file names too long, we will have to remove them from the queue with "sam cancel file store request " and the originator will need to "undeclare" them, work this out with the originator. For other files "stuck" there try to figure out what the problem might be.

To restart fss on clued0:

 1) Log into flotsam as sam
 2) Do "setup sam -q station_prd"
 3) Edit the file "flotsam-clued0_server_list.txt"
 4) Comment out the "fss" line by putting a # in the first column
 5) Do "ups update sam_bootstrap" -- this stops fss
 6) Now uncomment out that line by removing the "#"
 7) Do "ups update sam_bootstrap" again. This starts fss.
File store problems:
If file store shows "being imported" forever (in sam_data_browsing Data Files) it means that the file is declared but the store failed. If the file is at /pnfs/..., with sam privileges one can delete it and then do the resubmit: sam store --resubmit --descrip=... --source=...

To mark file as unavailable:
sam_admin mark file availability status --name=WZskim-emStream-20021227-145034.raw_p13.05.00 --status=unavailable --connect=gris/****@d0ofprd1 --comment="badfile"
See also Lee's script in sam folder mail.

To add file location:
> setup sam_admin
> samadmin add disk location --fullpath=cchpssd0.in2p3.fr:/hpss/in2p3.fr/group/d0/mc_prod/mcp11 --connect=gris/******@d0ofprd1
Disk location cchpssd0.in2p3.fr:/hpss/in2p3.fr/group/d0/mc_prod/mcp11 has been registered: id = 789, type = 'disk'


To add application family:
> samadmin add application family --appname=d0trigsim --family=triggersimulator --version=p13.06.01 --connect=gris/***@d0ofprd1
Application Family has been registered: id = 1487, family = triggersimulator, appname = d0trigsim, version = p13.06.01


Becoming SAM user:
[Online machines not longer in ~sam/.k5login]
> kinit -F gris/root (this may not be necessary, ksu sam may work directly)
> ksu sam
> telnet d0olb
> groups
D0 d0_prod CompDiv D0ods users products d0run
> ups list -aK+ sam_config
m_config" "v2_3_0" "NULL" "" ""
"sam_config" "v2_3_0" "NULL" "prd" ""
...
> ups start sam_bootstrap > XMAS-EVE-RESTART-d0olb.log 2>&1 &
> tail -f XMAS-EVE-RESTART-d0olb.log


Check files in SAM:
sam translate constraints --dim="__set__ bphysics_tmb_p1302"
d0lxac1:/home/gris/pro1/vtx/t01.65.00> sam translate constraints --dim='data_tier reconstructed and version p10.07.01 and (run_number 132987 or run_number 132988)'
Files:
reco_all_0000132987_034.raw_p10.07.01_001
reco_all_0000132987_034.raw_p10.07.01_000
reco_all_0000132987_006.raw_p10.07.01_001
....

Range of run numbers:
sam create dataset definition --dim="run_number 140000-141000 and data_tier root-tuple and file_name recoA%"
Dates:
> sam translate constraints --dim="create_date >= 03/18/2002 and data_tier root-tuple and FILE_NAME %recoA_reco_all_00001% and create_date <= 03/24/2002"
Request ID:
According to http://www-d0.fnal.gov/computing/mcprod/mcc.html requestid 4040 does not exist.
It is not garantied that the number quoted in the filename corresponds to the requestid. Therefore one should always use the requestid and not the filename in a dataset definition.
> sam translate constraints --dim='data_tier thumbnail and version p13.05.00 and global.requestid 4024'
May return filenames with wrong request id numbers in it.


Define dataset from prompt line:
d0lxac1:/home/gris/pro1/vtx/t01.65.00> sam define dataset --defname="vtx-cft-smt" --group=algo --dim='data_tier reconstructed and version p10.07.01 and (run_number 132987 or run_number 132988)'
Dataset definition created with Id: 9829

Dataset and Dataset Definition:
The confusion I assume is between datasets and definitions. The first person to create the definition gets to be there. After that, when ever somone uses the definition they get a new dataset version (we used to call these snapshot, because they may depend on when they are done). The definition remains the same, the dataset is versioned and can change. Whoever used the definition last, and specifies "new" makes a new dataset.


Dump dataset from prompt line:
> sam translate constraints --dim='dataset_def_name vtx-cft-smt'
Files:
reco_all_0000132987_034.raw_p10.07.01_001
reco_all_0000132987_034.raw_p10.07.01_000
... File Count: 164
Average File Size: 395014
Total File Size: 64782422
Total Event Count: 211267


How to truncate a project to the first n files:
sam submit --file-cut=n ...

How to restart a project after it aborted for whatever reason:
sam submit --restart ...


SAM admin:
> ps -fu sam
> cat ~sam/private/d0lxac1_server_list.txt
> http://d0db.fnal.gov/sam/documents.html (Station Administrator Documents)
> ups start sam_bootstrap
> ups stop sam_bootstrap
> ups depend sam
> upd list -aK+ sam
> setup sam_station (setup "product")
> cd $SAM_STATION -> and look at code...
~sam/config -> record of configuration
~sam/log -> what sam_boostrap did
> sam configure station ...

> ups list -aK+ sam_config
"sam_config" "v1_3_15" "NULL" "prd" ""
...

Sam Groups:
> sam set group --group=ttk1 --admin=gaston,lueking,gris
> sam configure group --admin=tomw,sosebee --group=np --station=uta-hep

Sam Lock files:
> sam lock --file=all_0000140149_001.raw --node=d0mino.fnal.gov --group=ttk1
Lock successful

On d0ora1:
> setup sam_bootstrap
> ups stop sam_bootstrap
> ups start sam_bootstrap

Cache disks:
> setup sam_admin
> samadmin uncache unused cache files
> samadmin uncache file --dir=/sam/cache49/boo/ --node=d0mino.fnal.gov --connect=gris/passwd@d0ofprd1 reco_all_0000132431_008.raw_p10.04.00_000
File d0mino.fnal.gov:/sam/cache49/boo//reco_all_0000132431_008.raw_p10.04.00_000 has been marked as 'uncached'.
And then (as sam):
> rm /sam/cache49/boo/reco_all_0000132431_008.raw_p10.04.00_000
Get rid of zombie projects:
samadmin purge zombie projects --started-before=1-jun-2002 --station=central-analysis

Start bbftp (from Lauri):
1) edit ~sam/private/REMOTE-STAGER_server_list.txt, comment OUT bbftp line;
2) ups update sam_bootstrap
3) re-edit, uncomment the bbftp line;
4) ups update sam_bootstrap

Kin Yip:
setup sam
setup -q remote-stager sam
edit that REMOTE....txt and comment out the line of d0karlsruhe
ups update sam_bootstrap
edit the same file again and uncomment the above line
ups update sam_bootstrap

To check remote stagers in d0karlsruhe:
Go to SAM At A Glance: d0karlsruhe and look for the two:
Stager@d0:Stager ---> because host is d0.fzk.de
Stager@d0mino:Stager

WEB and DB servers:
Restarting the WEB server will not help the DB server. If the DB server is dead or unreachable, you will need to restart THAT server. Make sure that it is actually dead, not just slow, before you restart it; check the log file to see when the last output was, etc. If you then decide that you need to restart the DB server, follow the standard practice of:
a) EDIT the server_list.txt file and comment OUT the server you want to restart;
ups update sam_bootstrap
b) EDIT the server list file and UNcomment your comment;
ups update sam_bootstrap
Calibration server exercise
Log into the server (d0dbsrv4/d0dbsrv5, for production) as "d0db" after "kinit -F username/root@FNAL.GOV", eg: ssh d0db@d0dbsrv4.fnal.gov. Then follow these instructions.

Adding SAM station:
> samadmin add station --admin=sosebee,tomw --desc="CSE department cluster" --name=uta-cse --connect=gris/*****@d0ofprd1 Station uta-cse has been registered: id = 281
Monitor Level: normal
Life Cycle: active
Admins: tomw, sosebee
Desc: CSE department cluster
> samadmin add node --hw=pc --name=cse000.uta.edu --os=linux --connect=gris/*****@d0ofprd1
Node cse000.uta.edu has been registered: id = 1191, os_type = linux, hardware_name = pc


Enstore:
> setup encp
WARNING: based on your node, d0mino, ENSTORE_CONFIG_HOST has been set to d0ensrv2.fnal.gov
WARNING: If this is not correct, either set ENSTORE_CONFIG_HOST by hand or use a qualifier in your setup command!
> enstore --help
enstore file [ --bfid= --help --list= --recursive --restore= --retries= --timeout=]
enstore monitor []
enstore volume [ --add= --delete= --help --new-library= --no-access= --read-only= --restore= --retries= --timeout= --update= --vol=]

> encp --data /pnfs/sam/beagle/copy1/datalogger/initial_runs/datalogger/all/all/daq_test_0000136161_008.raw /dev/null

INFILE=/pnfs/sam/beagle/copy1/datalogger/initial_runs/datalogger/all/all/daq_test_0000136161_008.raw
OUTFILE=/dev/null
FILESIZE=153736628
LABEL=PRJ006
LOCATION=0000_000000000_0000016
DRIVE=d0enmvr12a:/dev/rmt/tps2d1n
DRIVE_SN=4560020042
TRANSFER_TIME=14.78
SEEK_TIME=74.13
MOUNT_TIME=25.09
QWAIT_TIME=1.58
TIME2NOW=121.61
STATUS=ok

Completed transferring 153736628 bytes in 1 files in 121.55932498 sec.
Overall rate = 1.21 MB/sec. Drive rate = 9.92 MB/sec.
Network rate = 9.95 MB/sec. Exit status = 0.

Where to look for the status of a given tape?
Form http://www-d0en.fnal.gov/enstore/ (or stken): Tape Inventory -> VOLUMES

d0mino> encp --data_access_layer --verbose=1 /pnfs/sam/lto/copy1/physics_data_taking/group-phase1/wz/raw/all/pick_diem45.dat /scratch/1/veseli
Start time: Wed Aug 21 12:59:36 2002
User: veseli
Command line: encp --data_access_layer --verbose=1 /pnfs/sam/lto/copy1/physics_data_taking/group-phase1/wz/raw/all/pick_diem45.dat /scratch/1/veseli
Version: v2_17 CVS $Revision: 1.494 $
INFILE=/pnfs/sam/lto/copy1/physics_data_taking/group-phase1/wz/raw/all/pick_diem45.dat
OUTFILE=/scratch/1/veseli/pick_diem45.dat
FILESIZE=0
...
STATUS=USERERROR

> enstore file --bfid=D0MS101905781800000
{'bfid': 'D0MS101905781800000',
'complete_crc': 3524246669L,
'deleted': 'no',
'drive': 'd0enmvr9a:/dev/rmt/tps0d1n:4560020042',
'external_label': 'PRL683',
'location_cookie': '0000_000000000_0000139',
'pnfs_mapname': '',
'pnfs_name0':
'/pnfs/sam/dzero/copy1/datalogger/test/Stream_1_0065409843_001.raw',
'pnfsid': '000F000000000000001F7470',
'pnfsvid': '',
'sanity_cookie': (65536L, 1800353511L),
'size': 411325632L}


On SAM shift:
http://d0db-dev.fnal.gov/sam/backdoor/ Useful information for sam shifters.
http://d0db-dev.fnal.gov/sam/diagnostics.html Where to check overall sam satatus.
For SAM status check both of the following:
http://d0ora2.fnal.gov/sam_local/SamAtAGlance/
http://d0db-prd.fnal.gov/sam_local/SamAtAGlance/

Where to search for the nasty queries that may stuck the DB server?
Diagnostic web page and follow the links of the "DbServer" entry.
Also you can log into the d0db or d0db-dev (d0ora1.fnal.gov or d0ora3.fnal.gov) following http://d0db-dev.fnal.gov/sam/sam-shift-guide.html and look around.

Where to check if sam web servers (CORBA Name Sevices) are ok?
Diagnostic: "SAM Web Server" ("CORBA Name Services").
To restart Web server:
http://d0db-dev.fnal.gov/sam/sam-shift-guide.html#webserver

Looking at web servers logs:
General web logs are in
/fnal/ups/prd/webapache/d0db/html/logs/raw/error.log
aka: http://d0db.fnal.gov/logs/raw/error.log
The java/tomcat specific files are in
/fnal/ups/prd/webapache/d0db/html/logs/raw/tomcat_context/jvm***/logs
(where *** is the port number of the tomcat server...)



Sebastian Grinstein
Last modified: Thu May 15 16:21:16 ART 2003