Thursday, October 04, 2012

PDFs in the linux commandline

I was recently faced with the task of scanning a lot of documents and wanted to preserve them as multi-page pdfs. Converting the raw jpegs to pdfs proved to be quite easy using ImageMagick's convert that I got from http://stackoverflow.com/questions/8955425/how-can-i-convert-a-series-of-images-to-a-pdf-from-the-command-line-on-linux. But, the file sizes of the pdfs was really large when using convert to create multi-page documents. A little bit of googling and I found pdfunite which was perfect and I was able to create reasonably sized multi-page pdfs from a set of single page pdfs :)

Wednesday, January 18, 2012

Naruto downloader from managreader.net

After reading Naruto upto chapter 520 (courtesy naruto with elisp) I was eager to read the rest. As of today the latest chapter is 570. I found mangareader.net after a bit of googling and was busy reading my way through the chapters. But, like before reading in the browser was not up to my taste. Currently the mcomix is the comic reader of my choice. So, I set about to write a script to automatically download the remaining chapters from mangareader.net. I do not know if its wrong to do so, the site does not have any terms of use :|
import re
from urllib2 import urlopen
from zipfile import ZipFile, ZIP_DEFLATED
from xml.dom.minidom import parseString

def get_info(line, alt_regex):
    try:
        line = line[:line.index('</a>')] + '</a>'
        line = line[line.index('<a href'):]
        dom = parseString(line)
        info = {}
        a = dom.getElementsByTagName('a')[0]
        info['next'] = a.getAttribute('href')
        img = a.getElementsByTagName('img')[0]
        info['img_url'] = img.getAttribute('src')
        info['img_ext'] = info['img_url'][info['img_url'].rindex('.') + 1:]
        alt = img.getAttribute('alt')
        m = re.search(alt_regex, alt)
        info['chapter'] = int(m.group(1))
        info['page'] = int(m.group(2))
        dom.unlink()
        return info
    except Exception as e:
        print('[ERROR] %s' % line)
        print('[ERROR] ' + e)

def get_image(url):
    try:
        f = urlopen(url)
        b = f.read()
        f.close()
        return b
    except Exception as e:
        print('[ERROR] ' + e)

def get_chapter(url_prefix, url_suffix, title, chapter=1):
    need_more = True
    alt_regex = re.compile(r'%s (\d+) - Page (\d+)' % title)
    cbz = ZipFile('%03d.cbz' % chapter, "w", ZIP_DEFLATED)
    url = '%s%s' % (url_prefix, url_suffix)
    
    try:
        while ( need_more ):
            f = urlopen(url)
            lines = f.readlines()
            f.close()
            line = filter(lambda x: x.find('id="img"') != -1, lines)[0]
            info = get_info(line, alt_regex)
            need_more = info['chapter'] == chapter
            if ( need_more ):
                cbz.writestr('%02d.%s' % (info['page'], info['img_ext']), get_image(info['img_url']))
                url = '%s%s' % (url_prefix, info['next'])
            else:
                # new chapter
                cbz.close()
                chapter = info['chapter']
                need_more = True
                cbz = ZipFile('%03d.cbz' % chapter, "w", ZIP_DEFLATED)
    except IndexError:
        pass # image not found so end of chapter

get_chapter('http://www.mangareader.net', '/naruto/521', 'Naruto')
So, I let it rip and now I'm busy reading... :)

Friday, January 13, 2012

Project Euler Problem 1 functionally

I'm trying to learn a bit of elisp (see Luhns Algorithm in elisp). So, to try out me elisp some more I tried the Project Euler's simplest problem, Problem 1. So, I cracked open emacs and tried out a few things, gave up and went into hibernation. A few weeks later I remembered this problem and had a go at it again. I thought I nailed it...
(defun num_check (x) (if (or (eq (mod x 3) 0) (eq (mod x 5) 0)) x 0))
(defun sum(x) (if (eq x 3) 3 (+ (num_check x) (sum (- x 1)))))
(sum 3)
3
(sum 5)
8
(sum 10)
33
(sum 1000)
Debugger entered--Lisp error: (error "Lisp nesting exceeds `max-lisp-eval-depth'")
After a bit of googling I found out the elisp only supports 300 recursive calls :(. So, I decided to take the plunge into clisp and started downloading that. Meanwhile I had some old version of scala available and tried this
Welcome to Scala version 2.9.0.1 (Java HotSpot(TM) Server VM, Java 1.6.0_20).
Type in expressions to have them evaluated.
Type :help for more information.

scala> def num_check(n: Int):Int = if ( (n % 3) == 0 || (n % 5) == 0 ) n else 0
num_check: (n: Int)Int

scala> def do_sum(x: Int):Int = if ( x == 3 ) 3 else num_check(x) + do_sum(x - 1)
do_sum: (x: Int)Int

scala> do_sum(10)
res0: Int = 33

scala> do_sum(100)
res1: Int = 2418

scala> do_sum(1000)
res2: Int = 234168
Wow, that went well and by the time I had googled to find the syntax for scala etc., clisp was downloaded and here goes
  i i i i i i i       ooooo    o        ooooooo   ooooo   ooooo
  I I I I I I I      8     8   8           8     8     o  8    8
  I  \ `+' /  I      8         8           8     8        8    8
   \  `-+-'  /       8         8           8      ooooo   8oooo
    `-__|__-'        8         8           8           8  8
        |            8     o   8           8     o     8  8
  ------+------       ooooo    8oooooo  ooo8ooo   ooooo   8

Welcome to GNU CLISP 2.44.1 (2008-02-23) <http://clisp.cons.org/>

Copyright (c) Bruno Haible, Michael Stoll 1992, 1993
Copyright (c) Bruno Haible, Marcus Daniels 1994-1997
Copyright (c) Bruno Haible, Pierpaolo Bernardi, Sam Steingold 1998
Copyright (c) Bruno Haible, Sam Steingold 1999-2000
Copyright (c) Sam Steingold, Bruno Haible 2001-2008

Type :h and hit Enter for context help.

[1]> (defun num_check (x) (if (or (eq (mod x 3) 0) (eq (mod x 5) 0)) x 0))
NUM_CHECK
[2]> (defun sum(x) (if (eq x 3) 3 (+ (num_check x) (sum (- x 1)))))
SUM
[3]> (sum 10)
33
[4]> (sum 1000)
234168

Friday, December 02, 2011

Naruto with elisp

I just downloaded a huge Naruto (Manga) torrent which was organized like below
|- Chapter 001 to 010
|---- c001-p01.jpg
|---- c001-p02.jpg
|---- ...
|---- c010-p21.jpg
|---- c010-p22.jpg
|- Chapter 011 to 020
|- ...
|- Chapter 511 to 520
|- ...
I prefer to read my comics in cdisplay/comix which can read cbz/cbr formats. I was looking for an easy way of automating the process in elisp (buoyed by my previous success with the Luhnz algorithm). I was also itching to use this. My attempt at creating a script in one go was a failure and since I wanted to do it quick I used this
M-: (dotimes (i 52) (insert (format "cd \"Chapter %03d to %03d\"\n(dotimes (i 10) (insert (format \"\\\"C:\\\\Program Files\\\\7-Zip\\\\7z.exe\\\" a -tzip \\\"E:\\\\comics\\\\Naruto\\\\out\\\\%s.cbz\\\" c%s-*\\n\" (+ %d i) (+ %d i))))\ncd ..\n" (1+ (* i 10)) (* (+ i 10)) "%03d" "%03d" (1+ (* i 10)) (1+ (* i 10)))))
On running which I got this
cd "Chapter 001 to 010"
(dotimes (i 10) (insert (format "\"C:\\Program Files\\7-Zip\\7z.exe\" a -tzip \"E:\\comics\\Naruto\\out\\%03d.cbz\" c%03d-*\n" (+ 1 i) (+ 1 i))))
cd ..
cd "Chapter 011 to 011"
(dotimes (i 10) (insert (format "\"C:\\Program Files\\7-Zip\\7z.exe\" a -tzip \"E:\\comics\\Naruto\\out\\%03d.cbz\" c%03d-*\n" (+ 11 i) (+ 11 i))))
cd ..
cd "Chapter 021 to 012"
(dotimes (i 10) (insert (format "\"C:\\Program Files\\7-Zip\\7z.exe\" a -tzip \"E:\\comics\\Naruto\\out\\%03d.cbz\" c%03d-*\n" (+ 21 i) (+ 21 i))))
cd ..
Then positioning the caret at the first character for the dotimes line I recorded the following macro C-x ( C-k M-: C-y C-j C-k C-n C-n C-x ). This done, I ran the macro C-x e and held down the e till all occurrences were covered and I got this.
cd "Chapter 001 to 010"
"C:\Program Files\7-Zip\7z.exe" a -tzip "E:\comics\Naruto\out\001.cbz" c001-*
"C:\Program Files\7-Zip\7z.exe" a -tzip "E:\comics\Naruto\out\002.cbz" c002-*
...
"C:\Program Files\7-Zip\7z.exe" a -tzip "E:\comics\Naruto\out\010.cbz" c010-*
cd ..
...
I saved it as a bat file and let it loose and I had a neat little directory in which all the cbz files :). I'm sure there are more cleverer ways to accomplish this but, this was the easiest approach that I could think of (after a few failed attempts at nested inserts and let).

Friday, October 07, 2011

Clean sysouts from java project

Recently I was working in a project that had a lot of sysouts (System.out.println) peppered all over the code. I tried the usual sed approach find . -name '*.java' | xargs sed -i '/System.out.println/d' and that din't work out too well because of multiline sysouts. So, I decided to try the AST approach using the java parser from http://code.google.com/p/javaparser/. So, create a maven project add the javaparser repository and dependency in the pom.xml as given below

  4.0.0

  org.fc
  sysout-cleanup
  1.0

  
    
      javaparser
      JavaParser Repository
      http://javaparser.googlecode.com/svn/maven2
      
        false
      
    
  

  
    2.0.1
    1.0.8
  

  
    
      commons-io
      commons-io
      ${commons-io-version}
    
    
      com.google.code.javaparser
      javaparser
      ${java-parser-version}
    
  
  
    
      
        maven-compiler-plugin
        
          1.6
          1.6
        
      
    
  
  
And the java code that creates the AST and looks for the System.out.println method call
package org.fc;

import japa.parser.JavaParser;
import japa.parser.ast.CompilationUnit;
import japa.parser.ast.expr.FieldAccessExpr;
import japa.parser.ast.expr.MethodCallExpr;
import japa.parser.ast.expr.NameExpr;
import japa.parser.ast.stmt.BlockStmt;
import japa.parser.ast.stmt.ExpressionStmt;
import japa.parser.ast.stmt.Statement;
import japa.parser.ast.visitor.VoidVisitorAdapter;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.PrintStream;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map;

import org.apache.commons.io.FileUtils;
import org.apache.commons.io.IOUtils;


public class CleanupSysOut {
  File inputFile;
  File outputFile;
  Map<String, String> namespaceMap = new HashMap<String, String>();
  public CleanupSysOut(File file) {
    inputFile = file;
  }
  public CleanupSysOut(File in, File out) {
    inputFile = in;
    outputFile = out;
  }
  
  protected CompilationUnit getCompilationUnit() {
    CompilationUnit cu = null;
    InputStream in = null;
    try {
      in = new FileInputStream(inputFile);
      cu = JavaParser.parse(in);
    } catch (Exception e) {
      throw new RuntimeException(e);
    } finally {
      IOUtils.closeQuietly(in);
    }
    return cu;
  }
  
  protected void writeOutput(CompilationUnit cu) throws IOException {
    
    PrintStream out = new PrintStream(new FileOutputStream(outputFile));
    out.println(cu.toString());
    IOUtils.closeQuietly(out);
  }
  
  protected String debugPrint(CompilationUnit cu) throws IOException {
    return cu.toString();
  }
  
  // MethodVisitor
  private class MethodVisitor extends VoidVisitorAdapter<Object> {
    private boolean isSysout(MethodCallExpr methodCall) {
      if ( "println".equals(methodCall.getName()) ) {
        if ( methodCall.getScope() instanceof FieldAccessExpr ) {
          FieldAccessExpr fieldExpr = (FieldAccessExpr) methodCall.getScope();
          if ( "out".equals(fieldExpr.getField()) ) {
            if ( fieldExpr.getScope() instanceof NameExpr ) {
              NameExpr clazz = (NameExpr) fieldExpr.getScope();
              if ( "System".equals(clazz.getName()) ) {
                return true;
              }
            }
          }
        }
      }
      return false;
    }
    @Override
    public void visit(BlockStmt blockStmt, Object arg) {
      List<Statement> stmts = blockStmt.getStmts();
      if ( null != stmts ) {
        Iterator<Statement> itr = stmts.iterator();
        while ( itr.hasNext() ) {
          Statement stmt = itr.next();
          if ( stmt instanceof ExpressionStmt ) {
            if ( ((ExpressionStmt) stmt).getExpression() instanceof MethodCallExpr ) {
              MethodCallExpr methodCall = (MethodCallExpr) ((ExpressionStmt) stmt).getExpression();
              if ( isSysout(methodCall) ) {
                itr.remove();
              }
            }
          } else {
            stmt.accept(this, arg);
          }
        }
      }
    }
  }
  // MethodVisitor ends
  
  public void doCleanup() {
    try {
      CompilationUnit cu = getCompilationUnit();
      new MethodVisitor().visit(cu, null);
      if ( null != outputFile ) {
        writeOutput(cu);
      }
    } catch (Exception e) {
      throw new RuntimeException(e);
    }
  }
  static void cleanupDir(File inputDir, File  outputDir) {
    for ( File java : FileUtils.listFiles(inputDir, new String[] { "java" }, true) ) {
      String relativePath = java.getAbsolutePath().replace(inputDir.getAbsolutePath(), "").substring(1);
      File outputFile = new File(outputDir, relativePath);
      outputFile.getParentFile().mkdirs();
      new CleanupSysOut(java, outputFile).doCleanup();
    }
  }
  public static void main(String[] args) {
    if ( args.length == 2 ) {
      cleanupDir(new File(args[0]), new File(args[1]));
    } else {
      System.err.println("Usage: java " + CleanupSysOut.class.getName() + " <input-dir> <output-dir>");
    }
  }
}

Luhn algorithm in elisp

I was looking for the checksum validation of credit card numbers and came across the Luhn Algorithm. I had emacs open and on a whim decided I'll try and implement it using elisp. I'm no lisp programmer but, have in the past managed to write some basic elisp in .emacs so, I guessed it would take me about .5hr at most.

I guessed wrong. It took me a lot longer to wrap my head around even some of the simple elisp constructs like let and lambda took quite a while and it took a lot longer than I anticipated. Here, I present to you the fruit of my labors :)
(defun luhn-sum (list n)
  (if (null list)
      0
    (+ (let ((x (car list)))
      (if (= 1 (mod n 2))
   (let ((y (* 2 x)))
     (if (> y 9) 
         (+ 1 (mod y 10)) 
       y))
        x))
       (luhn-sum (cdr list) (+ 1 n)))
    )
)

(defun luhn-check (card-no)
  (eq 0 (mod (luhn-sum (mapcar (lambda (x) (string-to-number x 10)) (cdr (reverse (cdr (split-string card-no ""))))) 0) 10))
)

(luhn-check "49927398716")

Wednesday, June 22, 2011

Hindi in eclipse docs

I was waiting for the indigo release and was going through the new features when I spotted some hindi in the eclipse docs.



There you have it "Hello World!" in hindi in the eclipse project :) kudos

Wednesday, February 23, 2011

Mercurial and subversion

I've been using mercurial fairly regularly for most of my hobby projects. Subversion happens to be the VCS of choice at work. In a recent project I was working on I wanted to do some major refactoring in a work related project and instead of the usual approach of creating a branch in subversion I decided to try a new approach, using mercurial and subversion together :). Branching in mercurial is vastly easier than in subversion and the ability to quickly see a graphical branch tree using hg serve is simply priceless. With minimal hgignore and svn:ignore changes mercurial and subversion can be made to mutually ignore each other :D. Heres a simple example
$ mvn archetype:create -DgroupId=test -DartifactId=test -Dversion=1.0 -Dpackaging=jar -Dpackage=test
[INFO] Scanning for projects...
[INFO] Searching repository for plugin with prefix: 'archetype'.
[INFO] ------------------------------------------------------------------------
[INFO] Building Maven Default Project
[INFO]    task-segment: [archetype:create] (aggregator-style)
[INFO] ------------------------------------------------------------------------
[INFO] Setting property: classpath.resource.loader.class => 'org.codehaus.plexus.velocity.ContextClassLoaderResourceLoader'.
[INFO] Setting property: velocimacro.messages.on => 'false'.
[INFO] Setting property: resource.loader => 'classpath'.
[INFO] Setting property: resource.manager.logwhenfound => 'false'.
[INFO] [archetype:create {execution: default-cli}]
[WARNING] This goal is deprecated. Please use mvn archetype:generate instead
[INFO] Defaulting package to group ID: test
[INFO] ----------------------------------------------------------------------------
[INFO] Using following parameters for creating OldArchetype: maven-archetype-quickstart:RELEASE
[INFO] ----------------------------------------------------------------------------
[INFO] Parameter: groupId, Value: test
[INFO] Parameter: packageName, Value: test
[INFO] Parameter: package, Value: test
[INFO] Parameter: artifactId, Value: test
[INFO] Parameter: basedir, Value: f:\shyam\emacs-23\bin
[INFO] Parameter: version, Value: 1.0
[INFO] ********************* End of debug info from resources from generated POM ***********************
[INFO] OldArchetype created in dir: f:\shyam\emacs-23\bin\test
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESSFUL
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 2 seconds
[INFO] Finished at: Thu Feb 24 03:15:08 IST 2011
[INFO] Final Memory: 8M/20M
[INFO] ------------------------------------------------------------------------
$ svn add test
A  test/pom.xml
A  test/src
A  test/src/main
A  test/src/main/java
A  test/src/main/java/test
A  test/src/main/java/test/App.java
A  test/src/test
A  test/src/test/java
A  test/src/test/java/test
A  test/src/test/java/test/AppTest.java
$ svn commit -m "adding test"
Adding  test/pom.xml
Adding  test/src
Adding  test/src/main
Adding  test/src/main/java
Adding  test/src/main/java/test
Adding  test/src/main/java/test/App.java
Adding  test/src/test
Adding  test/src/test/java
Adding  test/src/test/java/test
Adding  test/src/test/java/test/AppTest.java
Transmitting file data .
Committed revision 34.
$ cd test
$ svn propset svn:ignore '.hg
.hgignore
.project
.classpath
.settings
target' .
property 'svn:ignore' set on .
$ svn commit -m "setting svn:ignore"
Sending  .
Committed revision 34.
$ hg init
$ cat > .hgignore
syntax: regexp

^target
^.project
^.classpath
^.settings

syntax: glob
.svn
^D
$ hg add
adding .hgignore
adding pom.xml
adding src/main/java/test/App.java
adding src/test/java/test/AppTest.java
$ hg commit -m "initial import"
And, we are all set to go :)