發表文章

Hadoop single node setup on Ubuntu

I tried to figure out how to use Hadoop and HDFS for a while. But the information on their official site is very scattered and out-of-updated. Here are some notes I did. A clean ubuntu 10.04 LTS build. Download hadoop package from here . The Hadoop's versioning rule is very confusing. 1.0.x is stable version 1.1.x is beta version 2.x.x is alpha version 0.23.x is similar to 2.x.x but missing Name Node HA I tried to ignore all other version started with 0.2x. I just use 1.0.4 directly. Download the KEYS in the root directory Download the from hadoop_1.0.4-1_i386.deb(or it's x64 version) and it's asc file from hadoop-1.0.4 folder. Check the integrity run `gpg --import KEYS` run `gpg --verify hadoop_1.0.4-1_i386.deb.asc` You should see mac@mac-ubuntu:~/projects/hadoop$ gpg --verify hadoop_1.0.4-1_i386.deb.asc gpg: Signature made Thu 04 Oct 2012 01:04:55 PM PDT using RSA key ID ECB31663 gpg: Good signature from "Matthew Foley (CODE SIGNING KEY) <m...

TrueCrypt

TrueCrypt  is a good stuff that you can encrypt your data to a virtual disk. Which is actually a file resides in your regular file system. And that file can be put in your Dropbox folder, so your data can be stored in "cloud" securely.

AES encryption/decryption

Encryption openssl enc -e -in original_file -out original_file.aes -aes256 -k password Decryption openssl enc -d -in original_file.aes -out original_file.out -aes256 -k password AES size: original file size + 1, then padding to 16bytes, then add 16 e.g. 1 117 bytes 117 + 1 padding to 16 bytes => 128 bytes 128 bytes + 16 = 144 bytes e.g. 2 127 bytes 127 + 1 padding to 16 bytes => 128 bytes 128 bytes + 16 = 144 bytes e.g. 3 128 bytes 128 + 1 padding to 16 bytes => 144 bytes 144 bytes + 16 = 160 bytes It's irrelevant to the length of password.

Lucene

http://lucene.apache.org/core/3_6_1/demo.html CLASSPATH OK export CLASSPATH=/home/mac/xxxx/xxx/xxx.jar:/home/mac/yyyy/yyy/yyy.jar export CLASSPATH=/home/mac/xxxx/xxx/*:/home/mac/yyyy/yyy/* Not OK export CLASSPATH=/home/mac/xxxx/xxx/*.jar:/home/mac/yyyy/yyy/*.jar export CLASSPATH=/home/mac/xxxx/xxx/:/home/mac/yyyy/yyy/ http://lucene.apache.org/core/3_6_1/demo2.html Need to detect doc language and change to use correct analyzer. Create Index open an directory to put index files (dir) new an Analyzer (analyzer) new an IndexWriterConfig (iwc) do some settings on IndexWriterConfig use dir and iwc to new a IndexWriter (writer) add documents new a Document (doc) add several fields new a Field (pathField) Field pathField = new Field("path", file.getPath(), Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS); pathField.setIndexOptions(IndexOptions.DOCS_ONLY); doc.add(pathField); new a NumericField (modifiedField) NumericField modifiedField =...

Change font in web pages

圖片
參加過三次Google I/O, 但都沒有去過它的codelab, 現在回頭看一下他的教材, 還真是好物... 一個是用appengine把pipeline, log, google storage, big query, bootstrap, channel api全部串起來, 做一個log分析程式. 那個很有趣, 之後再另外寫心得. 另外一個是教CSS,  http://io12-css-codelab.appspot.com/lessons/index.html . 也不是頂難的東西, 但是一步一步這樣教, 好有FU喔. 其中換字型原來這麼簡單. 1. 去 http://www.google.com/webfonts/ 挑字型. 它已經有很多工具可以讓你看不同大小的字型, 或是該字型當標題或內文時的感覺. 2. 選定後可以按該字型下面的Quick-use, 它還會跟你說因為用了這個字型, 造成網頁loading overhead的高低. 如果是綠色的應該影響就很小吧... 3. 再往下面看有兩個簡單的code要copy & paste. 只要把 <link href='http://fonts.googleapis.com/css?family=Quando' rel='stylesheet' type='text/css'> 放在html的header裡. 然後就可以在後續的css style裡用font-family: 'Quando', serif;來指定該字型囉... 真是想不到的簡單.

OAuth in GAE

因為需要用到Google的某個service, 而那個service需要用OAuth認證, 最近研究了一下OAuth2要如何在GAE上用. 身為OAuth2推手之一的Google, 在推廣OAuth2及提供對應的Tools來說, 盡了很大的心力. 以python來說, 它就提供了 google-api-python-client 這個library, 可以很容易的使用OAuth. 我參考了這份 文件 實驗成功使用OAuth2了, 特別記錄一下. 安裝 在Linux與MacOS下十分簡單, 只要'sudo easy_install --upgrade google-api-python-client'即可安裝完成. Windows下就哭哭了, 但是最後的成果是可以在Windows下的dev_appserver.py裡執行的. 在 Google APIs Console 裡註冊你的AP 在左邊的下拉選單裡新增一個Project. 在Service那欄裡找出對應的API, 把它切換成On. 點API Access. 點Create on OAuth 2.0 client ID. 因為是GAE, 選Web application. Hostname可以先打localhost. 點Create後, 會產生三個像密碼一樣的東西, 兩個link. 把它們抄寫到一個叫settings.py的檔裡. 等等會用到. CLIENT_ID='把那串密碼裡的Client ID填在這' CLIENT_SECRET='把那串密碼裡的Client secret填在這' SCOPE='' #這是要使用的API應該會提供的資訊, 從API的文件取得. 在安裝完 google-api-python-client 後, 會在/usr/local/bin裡裝一個enable-app-engine-project的script. 如果已經有把/usr/local/bin加入PATH的話, 直接執行後面加project目錄即可. e.g. enable-app-engine-project ./ 上面那段command會複製一大堆目錄檔案過來, 把你的目錄搞的亂七八糟, 不過就將就著用吧. ...

Unix scripts (sed & bash for)

圖片
今天工作上有個小任務, 一個目錄裡有47張圖, 巧合的是他們只有兩種解析度, 分別是480x84及480x720. 需要把它們分別放到兩個目錄裡. 大概記錄一下怎麼做的, 不然永遠記不得... 用file取得檔案的解析度, 先不要480x84的. file *.png | grep -v "x 84" | sed 's/\(.*\):.*/\1/' > list 再在bash cmdline寫一行的script for i in `cat list`; do `mv $i 720`; done 再把剩下的搬去84目錄下. mv *.png 84 收工