您正在查看: 2015年8月

Java程序调用Shell命令及脚本文件

Overview

最近需要用到数个Python程序处理蛋白质序列以输出特征值,而这些Python文件需要在Shell脚本中传入文本文件(该文本文件记录了某些蛋白质序列)做参数,进而依次被Shell调用。我们在Java程序中建立Shell脚本的运行时环境Runtime,这其中用到了一个类,即java.lang.Runtime,下面对该类进行探讨和记录。

1.直接运行Shell命令

java.lang.Runtime类有一个特点,官方文档说明如下:

Every Java application has a single instance of class Runtime that allows the application to interface with the environment in which the application is running. The current runtime can be obtained from the getRuntime method.
An application cannot create its own instance of this class.

大意就是:Java程序不能实例化(所以也没有构造方法),如需获取当前的运行环境只能通过无参的getRuntime()方法,该方法返回值为Runtime,进而使用它的exec(String command)或者exec(String[] cmdarray),返回一个Process。代码如下:

Process process;
List<String> processList = new ArrayList<String>();  
try {  
    process = Runtime.getRuntime().exec(“pwd”);
    BufferedReader input = new BufferedReader(new InputStreamReader(process.getInputStream()));  
    String line = "";  
    while ((line = input.readLine()) != null) {  
        processList.add(line);  
    }  
    input.close();  
} catch (IOException e) {  
    e.printStackTrace();  
}  
for (String line : processList) {  
    System.out.println(line);  
} 

"pwd"命令运用之后,帮助我们得到当前的工作目录为/root,从而为下一步执行Shell脚本文件也提供了有利条件。

2.执行Shell脚本文件

Runtimeexec()方法还有另外一种重载形式exec(String[] cmdarray, String[] envp, File dir),其作用更为强大,开发人员可以将命令全部写入脚本文件,使得命令连续执行,而且可以指定脚本文件的工作目录。

public void read(String str) throws NullPointerException{  
                            
    String command = "/bin/bash" + " " + ServletActionContext.getServletContext().getRealPath("/") + "useful_scripts/feature_calc.sh"+" "+str;
    File dir = new File(ServletActionContext.getServletContext().getRealPath("/") + "useful_scripts");
    System.out.println(command);//测试command命令
    Process process;
    List<String> processList = new ArrayList<String>();  
    try {  
        process = Runtime.getRuntime().exec(command, null, dir);//环境变量通常为null,表明使用当前环境。
        BufferedReader input = new BufferedReader(new InputStreamReader(process.getInputStream()));  
        String line = "";  
        while ((line = input.readLine()) != null) {  
            processList.add(line);  
        }  
        input.close();  
    } catch (IOException e) {  
        e.printStackTrace();  
    }  
  
    for (String line : processList) {  
        System.out.println(line);  
    }  
}

其中有一点需要注意:dir参数类型是File,但是官方文档解释为“工作目录”,故用法如下:

File dir = new File("你的工作目录的路径")

The First Step To Use Weka

Overview

Weka is easy to use when we do feature selection and classification. Using Weka's friendly graphic user interface, we can try and validate multiple feature selection methods and classifiers that have already been implemented into Weka. This article tries to give a quick start guide about this wonderful tool.

1. Download Weka

1.1 About different versions

You can download Weka with different versions from Weka's Download page. There are Stable book 3rd ed. version(3.6.12) and Developer version(3.7.12+) available.
Stable book 3rd ed. version is a stable version for use. If this version can meet all your demands or you don't know what functions you need, use this one.
Compared to the former versions of Weka, Developer version(3.7.12+) adopts a new architecture, which provides a package manager in the tool menu. Unlike that Stable book 3rd ed. version(3.6.12) provides all the methods and classifier it can, Developer version(3.7.12+) only provides a series of basic methods and classifiers in default, however, with a new-developed package manager. If you want some classifier, open the package manager and download this classifier, then you can use it in Weka.
Taking feature selection as an instance, we want to use mRMR method to select features. Fortunately, mRMR is involved in weka at version 3.5.15. We don't need to update a new version since we only need this function. Open the package manager in the existing 3.7.12 version, we can easily download and use it.

1.2 For users in different platforms

Weka provides a series of versions that can run on Windows, Mac and other platforms(a zip file). For Windows and Mac users, you can directly download the corresponding version, install it and use it. However, I recommend you to download a zip version to use. When you wanna use SVM classifier in Weka 3.6.12, you will need the zip version. (How to involve libSVM in weka 3.6.12 will be depicted in another article)

2. Use Weka

If you download the non-zip version, I guess you are already using Weka. So we just talk about how to use the zip version.

2.1 Run Weka

Enter the weka folder, and use the following command to start it:

java -Xmx1000M -jar weka.jar

Notice the parameter -Xmx1000M. This means Weka can use a maximum of 1000M memory to use. When you wanna training big data set, set a bigger value will prevent the breakdown of Weka(I use 4000M memory).

2.1 Run Weka anywhere

We have already run Weka. There is still a problem: Every time we wanna use Weka, we have to enter the Weka folder and run the start command.
By setting the environmental viable, we can tackle this problem. The setting in my computer is as follows:

CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib:/Users/wangjiawei/weka/weka-3-6-12/weka.jar

/Users/wangjiawei/weka/weka-3-6/ is the folder of Weka. After adding the folder address into CLASHPATH, we can use the start command at any place.

Java处理文件名加时间戳

Overview

第一个项目中,输入框内的sequence传入后台,并在Action中用String类型的seq接收,继而处理成txt文件形式保存。由于以后的项目均要频繁使用io以及这种中间处理方式,故在参考了前人的类似处理方法之后,提取出适合本项目的JavaFileTimeStamp
Java类负责以“sequence_input_时间戳”的形式命名文本文件。以下便是该处理类的代码,而参考资料也会在最后给出,以尊重原作者。

1.FileTimeStamp获取包含时间戳的文件名

import java.text.SimpleDateFormat;
import java.util.Date;

public class FileTimeStamp {
    private SimpleDateFormat sdf = null;
    
    //获取时间戳
    public String getTimeStamp(){
        sdf = new SimpleDateFormat("yyyyMMddHHmmssZ");
        String timeStamp = sdf.format(new Date());
        return timeStamp;
    }
    
    //获取添加了时间戳和扩展名的文件名
    //并将StringBuffer类型的buf转换为字符串
    //于是便得到完整的文件名
    public String getTimeName(){
        StringBuffer buf = new StringBuffer("sequence_input_");
        buf.append(this.getTimeStamp()).append(".txt");
        return buf.toString();
    }
}

2.建立一个调用FileTimeStamp类的FileFullName

该类负责产生文件全名,并在后面运用。

package edu.monash.file;

public class FileFullName {
    public String getFullName() {
        FileTimeStamp fts = new FileTimeStamp();
        String fullName = fts.getTimeName();
        return fullName;
    }
}

参考资料

Perl处理文件Tips

Overview

最近经常需要使用到Perl处理文件,因此在此记录一下经常使用到的Perl知识。

1. Perl脚本中接受命令行参数

很多时候我们编写一个Perl脚本,都是用来处理一个文件,输出为另一个文件,例如,脚本file_converter.plinput.txt中的格式处理之后转化为另一种格式存储在output.txt中,则通常我们的使用习惯是

./file_converter.pl input.txt output.txt

Perl脚本中可以使用@ARGV获取命令行的参数,上面的例子中,@ARGV的值为2,使用$ARGV[0]$ARGV[1]几个获取到这两个文件名字,示例如下:

if (@ARGV != 2) 
{
    print "==Error: Please input the file you need to convert and the file name of the result\n";
}
my $original_file_name = $ARGV[0];
my $output_file = $ARGV[1];

2. 判断要使用的变量是否已定义or已赋值

在使用一个变量时,有时需要先判断这个变量是不是第一次被使用,这时我们通常可以选择定义此变量时只声明,不赋值。而在使用时,使用两种方式判断该变量是不是已经被赋值了。代码如下:

my $name_not_init;
my $name="Gly";

if(!$name_not_init)
{
    print "null\n"
}
else
{
    print "not null\n";
}
if(defined($name_not_init))
{
    print "defined\n";
}
else
{
    print "not defined\n";
}

if(!$name)
{
    print "null\n"
}
else
{
    print "not null\n";
}
if(defined($name))
{
    print "defined\n";
}
else
{
    print "not defined\n";
}

运行结果为:

null
not defined
not null
defined

即当一个变量只是被声明,未被赋值之前,都是未定义的,$name_not_initdefined($name_not_init)都为假。
但是,当对数组变量使用defined时,会报一下诊断信息:

defined(@array) is deprecated

(D-废弃-反对-可选)通常defined对数组并没有作用,因为它只是检查一个未定义的标量值。可以使用刚才说的另一种方式:

if (!@array)
{
    #数组为空
}

3. 获取数组长度

非常简单,直接将一个数组变量赋给一个普通变量或者使用scalar,示例如下:

my $count_array = @line_texts; #method 1
my $count_array = scalar @line_texts; #method 2

4. 从网页获取数据

直接使用一段代码展示,该代码从http://www.uniprot.org/uniprot/P01282.fasta处请求内容,这个网站是一个蛋白质序列信息查询网站,如果查询的ID信息存在,则返回该序列的信息,格式如下:

#>sp|P01282|VIP_HUMAN VIP peptides OS=Homo sapiens GN=VIP     PE=1 SV=1
MDTRNKAQLLVLLTLLSVLFSQTSAWPLYRAPSALRLGDRIPFEGANEPDQVSLKEDIDM
LQNALAENDTPYYDVSRNARHADGVFTSDFSKLLGQLSAKKYLESLMGKRVSSNISEDPV
PVKRHSDAVFTDNYTRLRKQMAVKKYLNSILNGKRSSEGESPDFPEELEK

以上数据以每一行之后以空格结尾,第一行为序列描述信息,我们需要的是提取第一行数据之外的序列详细信息,并去掉每一行之后的空格,我们需要的是以下结果:

   MDTRNKAQLLVLLTLLSVLFSQTSAWPLYRAPSALRLGDRIPFEGANEPDQVSLKEDIDMLQNALAENDTPYYDVSRNARHADGVFTSDFSKLLGQLSAKKYLESLMGKRVSSNISEDPVPVKRHSDAVFTDNYTRLRKQMAVKKYLNSILNGKRSSEGESPDFPEELEK

如果查询的ID信息不存在,则会返回一个包含Not found的网页,此时我们使用-代替序列内容。
完整代码如下:

#!/usr/bin/perl -w

#载入Web请求和解析所需要的模块
use LWP::UserAgent;  
use HTML::Element;
use HTML::TreeBuilder; 

#需要请求内容的网页
my $url = "http://www.uniprot.org/uniprot/P01282.fasta";
my $root = new HTML::TreeBuilder;
my $ua = LWP::UserAgent->new;  
#设置请求,这里使用的是GET请求,如果是POST参数,需要在$ua->request($req)之前设置一些参数,这里不展开
my $req = HTTP::Request->new('GET' => $url);  
#发送请求
my $res = $ua->request($req); 
#获得请求的网页内容,这里的网页内容是包含HTML代码的原生内容,需要自己抽取信息(不过这个网址恰好比较干净,比较方便获取)
my $res_content= $res->content();
#使用$solved_web_result存储最终需要得到的内容
my $solved_web_result="";

#以换行符为分割,截断获得的网页内容,存在数组中
my @web_results=split(/\n/,$res_content);

#去掉第一行的内容,并将剩下的内容合并起来
my $first_loop_count=0;
foreach $web_line (@web_results)
{
    if($first_loop_count!=0)
    {
        $solved_web_result=$solved_web_result.$web_line;
    }
    $first_loop_count++;        
}

#如果查询的ID信息不存在,则使用-代替。
if ($solved_web_result=~m/not found/)
{
    $solved_web_result="-";
}

#最后加上换行符
$solved_web_result=$solved_web_result."\n";
print $solved_web_result;