Thursday, October 04, 2012

PDFs in the linux commandline

I was recently faced with the task of scanning a lot of documents and wanted to preserve them as multi-page pdfs. Converting the raw jpegs to pdfs proved to be quite easy using ImageMagick's convert that I got from http://stackoverflow.com/questions/8955425/how-can-i-convert-a-series-of-images-to-a-pdf-from-the-command-line-on-linux. But, the file sizes of the pdfs was really large when using convert to create multi-page documents. A little bit of googling and I found pdfunite which was perfect and I was able to create reasonably sized multi-page pdfs from a set of single page pdfs :)

Wednesday, January 18, 2012

Naruto downloader from managreader.net

After reading Naruto upto chapter 520 (courtesy naruto with elisp) I was eager to read the rest. As of today the latest chapter is 570. I found mangareader.net after a bit of googling and was busy reading my way through the chapters. But, like before reading in the browser was not up to my taste. Currently the mcomix is the comic reader of my choice. So, I set about to write a script to automatically download the remaining chapters from mangareader.net. I do not know if its wrong to do so, the site does not have any terms of use :|
import re
from urllib2 import urlopen
from zipfile import ZipFile, ZIP_DEFLATED
from xml.dom.minidom import parseString
def get_info(line, alt_regex):
try:
line = line[:line.index('</a>')] + '</a>'
line = line[line.index('<a href'):]
dom = parseString(line)
info = {}
a = dom.getElementsByTagName('a')[0]
info['next'] = a.getAttribute('href')
img = a.getElementsByTagName('img')[0]
info['img_url'] = img.getAttribute('src')
info['img_ext'] = info['img_url'][info['img_url'].rindex('.') + 1:]
alt = img.getAttribute('alt')
m = re.search(alt_regex, alt)
info['chapter'] = int(m.group(1))
info['page'] = int(m.group(2))
dom.unlink()
return info
except Exception as e:
print('[ERROR] %s' % line)
print('[ERROR] ' + e)
def get_image(url):
try:
f = urlopen(url)
b = f.read()
f.close()
return b
except Exception as e:
print('[ERROR] ' + e)
def get_chapter(url_prefix, url_suffix, title, chapter=1):
need_more = True
alt_regex = re.compile(r'%s (\d+) - Page (\d+)' % title)
cbz = ZipFile('%03d.cbz' % chapter, "w", ZIP_DEFLATED)
url = '%s%s' % (url_prefix, url_suffix)
try:
while ( need_more ):
f = urlopen(url)
lines = f.readlines()
f.close()
line = filter(lambda x: x.find('id="img"') != -1, lines)[0]
info = get_info(line, alt_regex)
need_more = info['chapter'] == chapter
if ( need_more ):
cbz.writestr('%02d.%s' % (info['page'], info['img_ext']), get_image(info['img_url']))
url = '%s%s' % (url_prefix, info['next'])
else:
# new chapter
cbz.close()
chapter = info['chapter']
need_more = True
cbz = ZipFile('%03d.cbz' % chapter, "w", ZIP_DEFLATED)
except IndexError:
pass # image not found so end of chapter
get_chapter('http://www.mangareader.net', '/naruto/521', 'Naruto')
view raw mangareader.py hosted with ❤ by GitHub
So, I let it rip and now I'm busy reading... :)

Friday, January 13, 2012

Project Euler Problem 1 functionally

I'm trying to learn a bit of elisp (see Luhns Algorithm in elisp). So, to try out me elisp some more I tried the Project Euler's simplest problem, Problem 1. So, I cracked open emacs and tried out a few things, gave up and went into hibernation. A few weeks later I remembered this problem and had a go at it again. I thought I nailed it...
(defun num_check (x) (if (or (eq (mod x 3) 0) (eq (mod x 5) 0)) x 0))
(defun sum(x) (if (eq x 3) 3 (+ (num_check x) (sum (- x 1)))))
(sum 3)
3
(sum 5)
8
(sum 10)
33
(sum 1000)
Debugger entered--Lisp error: (error "Lisp nesting exceeds `max-lisp-eval-depth'")
view raw luhn.el hosted with ❤ by GitHub
After a bit of googling I found out the elisp only supports 300 recursive calls :(. So, I decided to take the plunge into clisp and started downloading that. Meanwhile I had some old version of scala available and tried this
Welcome to Scala version 2.9.0.1 (Java HotSpot(TM) Server VM, Java 1.6.0_20).
Type in expressions to have them evaluated.
Type :help for more information.
scala&gt; def num_check(n: Int):Int = if ( (n % 3) == 0 || (n % 5) == 0 ) n else 0
num_check: (n: Int)Int
scala&gt; def do_sum(x: Int):Int = if ( x == 3 ) 3 else num_check(x) + do_sum(x - 1)
do_sum: (x: Int)Int
scala&gt; do_sum(10)
res0: Int = 33
scala&gt; do_sum(100)
res1: Int = 2418
scala&gt; do_sum(1000)
res2: Int = 234168
view raw luhn.scala hosted with ❤ by GitHub
Wow, that went well and by the time I had googled to find the syntax for scala etc., clisp was downloaded and here goes https://gist.github.com/fc-unleashed/86ecd9aee76dcd8ec1decb3f15b84d6d