Thursday, October 04, 2012

PDFs in the linux commandline

I was recently faced with the task of scanning a lot of documents and wanted to preserve them as multi-page pdfs. Converting the raw jpegs to pdfs proved to be quite easy using ImageMagick's convert that I got from http://stackoverflow.com/questions/8955425/how-can-i-convert-a-series-of-images-to-a-pdf-from-the-command-line-on-linux. But, the file sizes of the pdfs was really large when using convert to create multi-page documents. A little bit of googling and I found pdfunite which was perfect and I was able to create reasonably sized multi-page pdfs from a set of single page pdfs :)

Wednesday, January 18, 2012

Naruto downloader from managreader.net

After reading Naruto upto chapter 520 (courtesy naruto with elisp) I was eager to read the rest. As of today the latest chapter is 570. I found mangareader.net after a bit of googling and was busy reading my way through the chapters. But, like before reading in the browser was not up to my taste. Currently the mcomix is the comic reader of my choice. So, I set about to write a script to automatically download the remaining chapters from mangareader.net. I do not know if its wrong to do so, the site does not have any terms of use :|
import re
from urllib2 import urlopen
from zipfile import ZipFile, ZIP_DEFLATED
from xml.dom.minidom import parseString

def get_info(line, alt_regex):
    try:
        line = line[:line.index('</a>')] + '</a>'
        line = line[line.index('<a href'):]
        dom = parseString(line)
        info = {}
        a = dom.getElementsByTagName('a')[0]
        info['next'] = a.getAttribute('href')
        img = a.getElementsByTagName('img')[0]
        info['img_url'] = img.getAttribute('src')
        info['img_ext'] = info['img_url'][info['img_url'].rindex('.') + 1:]
        alt = img.getAttribute('alt')
        m = re.search(alt_regex, alt)
        info['chapter'] = int(m.group(1))
        info['page'] = int(m.group(2))
        dom.unlink()
        return info
    except Exception as e:
        print('[ERROR] %s' % line)
        print('[ERROR] ' + e)

def get_image(url):
    try:
        f = urlopen(url)
        b = f.read()
        f.close()
        return b
    except Exception as e:
        print('[ERROR] ' + e)

def get_chapter(url_prefix, url_suffix, title, chapter=1):
    need_more = True
    alt_regex = re.compile(r'%s (\d+) - Page (\d+)' % title)
    cbz = ZipFile('%03d.cbz' % chapter, "w", ZIP_DEFLATED)
    url = '%s%s' % (url_prefix, url_suffix)
    
    try:
        while ( need_more ):
            f = urlopen(url)
            lines = f.readlines()
            f.close()
            line = filter(lambda x: x.find('id="img"') != -1, lines)[0]
            info = get_info(line, alt_regex)
            need_more = info['chapter'] == chapter
            if ( need_more ):
                cbz.writestr('%02d.%s' % (info['page'], info['img_ext']), get_image(info['img_url']))
                url = '%s%s' % (url_prefix, info['next'])
            else:
                # new chapter
                cbz.close()
                chapter = info['chapter']
                need_more = True
                cbz = ZipFile('%03d.cbz' % chapter, "w", ZIP_DEFLATED)
    except IndexError:
        pass # image not found so end of chapter

get_chapter('http://www.mangareader.net', '/naruto/521', 'Naruto')
So, I let it rip and now I'm busy reading... :)

Friday, January 13, 2012

Project Euler Problem 1 functionally

I'm trying to learn a bit of elisp (see Luhns Algorithm in elisp). So, to try out me elisp some more I tried the Project Euler's simplest problem, Problem 1. So, I cracked open emacs and tried out a few things, gave up and went into hibernation. A few weeks later I remembered this problem and had a go at it again. I thought I nailed it...
(defun num_check (x) (if (or (eq (mod x 3) 0) (eq (mod x 5) 0)) x 0))
(defun sum(x) (if (eq x 3) 3 (+ (num_check x) (sum (- x 1)))))
(sum 3)
3
(sum 5)
8
(sum 10)
33
(sum 1000)
Debugger entered--Lisp error: (error "Lisp nesting exceeds `max-lisp-eval-depth'")
After a bit of googling I found out the elisp only supports 300 recursive calls :(. So, I decided to take the plunge into clisp and started downloading that. Meanwhile I had some old version of scala available and tried this
Welcome to Scala version 2.9.0.1 (Java HotSpot(TM) Server VM, Java 1.6.0_20).
Type in expressions to have them evaluated.
Type :help for more information.

scala> def num_check(n: Int):Int = if ( (n % 3) == 0 || (n % 5) == 0 ) n else 0
num_check: (n: Int)Int

scala> def do_sum(x: Int):Int = if ( x == 3 ) 3 else num_check(x) + do_sum(x - 1)
do_sum: (x: Int)Int

scala> do_sum(10)
res0: Int = 33

scala> do_sum(100)
res1: Int = 2418

scala> do_sum(1000)
res2: Int = 234168
Wow, that went well and by the time I had googled to find the syntax for scala etc., clisp was downloaded and here goes
  i i i i i i i       ooooo    o        ooooooo   ooooo   ooooo
  I I I I I I I      8     8   8           8     8     o  8    8
  I  \ `+' /  I      8         8           8     8        8    8
   \  `-+-'  /       8         8           8      ooooo   8oooo
    `-__|__-'        8         8           8           8  8
        |            8     o   8           8     o     8  8
  ------+------       ooooo    8oooooo  ooo8ooo   ooooo   8

Welcome to GNU CLISP 2.44.1 (2008-02-23) <http://clisp.cons.org/>

Copyright (c) Bruno Haible, Michael Stoll 1992, 1993
Copyright (c) Bruno Haible, Marcus Daniels 1994-1997
Copyright (c) Bruno Haible, Pierpaolo Bernardi, Sam Steingold 1998
Copyright (c) Bruno Haible, Sam Steingold 1999-2000
Copyright (c) Sam Steingold, Bruno Haible 2001-2008

Type :h and hit Enter for context help.

[1]> (defun num_check (x) (if (or (eq (mod x 3) 0) (eq (mod x 5) 0)) x 0))
NUM_CHECK
[2]> (defun sum(x) (if (eq x 3) 3 (+ (num_check x) (sum (- x 1)))))
SUM
[3]> (sum 10)
33
[4]> (sum 1000)
234168