I was recently faced with the task of scanning a lot of documents and wanted to preserve them as multi-page pdfs. Converting the raw jpegs to pdfs proved to be quite easy using ImageMagick's convert that I got from http://stackoverflow.com/questions/8955425/how-can-i-convert-a-series-of-images-to-a-pdf-from-the-command-line-on-linux. But, the file sizes of the pdfs was really large when using convert to create multi-page documents. A little bit of googling and I found pdfunite which was perfect and I was able to create reasonably sized multi-page pdfs from a set of single page pdfs :)
After reading Naruto upto chapter 520 (courtesy naruto with elisp) I was eager to read the rest. As of today the latest chapter is 570. I found mangareader.net after a bit of googling and was busy reading my way through the chapters. But, like before reading in the browser was not up to my taste. Currently the mcomix is the comic reader of my choice. So, I set about to write a script to automatically download the remaining chapters from mangareader.net. I do not know if its wrong to do so, the site does not have any terms of use :|
import re
from urllib2 import urlopen
from zipfile import ZipFile, ZIP_DEFLATED
from xml.dom.minidom import parseString
def get_info(line, alt_regex):
try:
line = line[:line.index('</a>')] + '</a>'
line = line[line.index('<a href'):]
dom = parseString(line)
info = {}
a = dom.getElementsByTagName('a')[0]
info['next'] = a.getAttribute('href')
img = a.getElementsByTagName('img')[0]
info['img_url'] = img.getAttribute('src')
info['img_ext'] = info['img_url'][info['img_url'].rindex('.') + 1:]
alt = img.getAttribute('alt')
m = re.search(alt_regex, alt)
info['chapter'] = int(m.group(1))
info['page'] = int(m.group(2))
dom.unlink()
return info
except Exception as e:
print('[ERROR] %s' % line)
print('[ERROR] ' + e)
def get_image(url):
try:
f = urlopen(url)
b = f.read()
f.close()
return b
except Exception as e:
print('[ERROR] ' + e)
def get_chapter(url_prefix, url_suffix, title, chapter=1):
need_more = True
alt_regex = re.compile(r'%s (\d+) - Page (\d+)' % title)
cbz = ZipFile('%03d.cbz' % chapter, "w", ZIP_DEFLATED)
url = '%s%s' % (url_prefix, url_suffix)
try:
while ( need_more ):
f = urlopen(url)
lines = f.readlines()
f.close()
line = filter(lambda x: x.find('id="img"') != -1, lines)[0]
info = get_info(line, alt_regex)
need_more = info['chapter'] == chapter
if ( need_more ):
cbz.writestr('%02d.%s' % (info['page'], info['img_ext']), get_image(info['img_url']))
url = '%s%s' % (url_prefix, info['next'])
else:
# new chapter
cbz.close()
chapter = info['chapter']
need_more = True
cbz = ZipFile('%03d.cbz' % chapter, "w", ZIP_DEFLATED)
except IndexError:
pass # image not found so end of chapter
get_chapter('http://www.mangareader.net', '/naruto/521', 'Naruto')
I'm trying to learn a bit of elisp (see Luhns Algorithm in elisp). So, to try out me elisp some more I tried the Project Euler's simplest problem, Problem 1. So, I cracked open emacs and tried out a few things, gave up and went into hibernation. A few weeks later I remembered this problem and had a go at it again. I thought I nailed it...
(defun num_check (x) (if (or (eq (mod x 3) 0) (eq (mod x 5) 0)) x 0))
(defun sum(x) (if (eq x 3) 3 (+ (num_check x) (sum (- x 1)))))
(sum 3)
3
(sum 5)
8
(sum 10)
33
(sum 1000)
Debugger entered--Lisp error: (error "Lisp nesting exceeds `max-lisp-eval-depth'")
After a bit of googling I found out the elisp only supports 300 recursive calls :(. So, I decided to take the plunge into clisp and started downloading that. Meanwhile I had some old version of scala available and tried this
Welcome to Scala version 2.9.0.1 (Java HotSpot(TM) Server VM, Java 1.6.0_20).
Type in expressions to have them evaluated.
Type :help for more information.
scala> def num_check(n: Int):Int = if ( (n % 3) == 0 || (n % 5) == 0 ) n else 0
num_check: (n: Int)Int
scala> def do_sum(x: Int):Int = if ( x == 3 ) 3 else num_check(x) + do_sum(x - 1)
do_sum: (x: Int)Int
scala> do_sum(10)
res0: Int = 33
scala> do_sum(100)
res1: Int = 2418
scala> do_sum(1000)
res2: Int = 234168
Wow, that went well and by the time I had googled to find the syntax for scala etc., clisp was downloaded and here goes
i i i i i i i ooooo o ooooooo ooooo ooooo
I I I I I I I 8 8 8 8 8 o 8 8
I \ `+' / I 8 8 8 8 8 8
\ `-+-' / 8 8 8 ooooo 8oooo
`-__|__-' 8 8 8 8 8
| 8 o 8 8 o 8 8
------+------ ooooo 8oooooo ooo8ooo ooooo 8
Welcome to GNU CLISP 2.44.1 (2008-02-23) <http://clisp.cons.org/>
Copyright (c) Bruno Haible, Michael Stoll 1992, 1993
Copyright (c) Bruno Haible, Marcus Daniels 1994-1997
Copyright (c) Bruno Haible, Pierpaolo Bernardi, Sam Steingold 1998
Copyright (c) Bruno Haible, Sam Steingold 1999-2000
Copyright (c) Sam Steingold, Bruno Haible 2001-2008
Type :h and hit Enter for context help.
[1]> (defun num_check (x) (if (or (eq (mod x 3) 0) (eq (mod x 5) 0)) x 0))
NUM_CHECK
[2]> (defun sum(x) (if (eq x 3) 3 (+ (num_check x) (sum (- x 1)))))
SUM
[3]> (sum 10)
33
[4]> (sum 1000)
234168
I just downloaded a huge Naruto (Manga) torrent which was organized like below
|- Chapter 001 to 010
|---- c001-p01.jpg
|---- c001-p02.jpg
|---- ...
|---- c010-p21.jpg
|---- c010-p22.jpg
|- Chapter 011 to 020
|- ...
|- Chapter 511 to 520
|- ...
I prefer to read my comics in cdisplay/comix which can read cbz/cbr formats. I was looking for an easy way of automating the process in elisp (buoyed by my previous success with the Luhnz algorithm). I was also itching to use this. My attempt at creating a script in one go was a failure and since I wanted to do it quick I used this
M-: (dotimes (i 52) (insert (format "cd \"Chapter %03d to %03d\"\n(dotimes (i 10) (insert (format \"\\\"C:\\\\Program Files\\\\7-Zip\\\\7z.exe\\\" a -tzip \\\"E:\\\\comics\\\\Naruto\\\\out\\\\%s.cbz\\\" c%s-*\\n\" (+ %d i) (+ %d i))))\ncd ..\n" (1+ (* i 10)) (* (+ i 10)) "%03d" "%03d" (1+ (* i 10)) (1+ (* i 10)))))
On running which I got this
cd "Chapter 001 to 010"
(dotimes (i 10) (insert (format "\"C:\\Program Files\\7-Zip\\7z.exe\" a -tzip \"E:\\comics\\Naruto\\out\\%03d.cbz\" c%03d-*\n" (+ 1 i) (+ 1 i))))
cd ..
cd "Chapter 011 to 011"
(dotimes (i 10) (insert (format "\"C:\\Program Files\\7-Zip\\7z.exe\" a -tzip \"E:\\comics\\Naruto\\out\\%03d.cbz\" c%03d-*\n" (+ 11 i) (+ 11 i))))
cd ..
cd "Chapter 021 to 012"
(dotimes (i 10) (insert (format "\"C:\\Program Files\\7-Zip\\7z.exe\" a -tzip \"E:\\comics\\Naruto\\out\\%03d.cbz\" c%03d-*\n" (+ 21 i) (+ 21 i))))
cd ..
Then positioning the caret at the first character for the dotimes line I recorded the following macro C-x(C-kM-:C-yC-jC-kC-nC-nC-x). This done, I ran the macro C-x e and held down the e till all occurrences were covered and I got this.
cd "Chapter 001 to 010"
"C:\Program Files\7-Zip\7z.exe" a -tzip "E:\comics\Naruto\out\001.cbz" c001-*
"C:\Program Files\7-Zip\7z.exe" a -tzip "E:\comics\Naruto\out\002.cbz" c002-*
...
"C:\Program Files\7-Zip\7z.exe" a -tzip "E:\comics\Naruto\out\010.cbz" c010-*
cd ..
...
I saved it as a bat file and let it loose and I had a neat little directory in which all the cbz files :). I'm sure there are more cleverer ways to accomplish this but, this was the easiest approach that I could think of (after a few failed attempts at nested inserts and let).
Recently I was working in a project that had a lot of sysouts
(System.out.println) peppered all over the code. I tried the usual sed
approach find . -name '*.java' | xargs sed -i '/System.out.println/d'
and that din't work out too well because of multiline sysouts. So, I
decided to try the AST approach using the java parser from http://code.google.com/p/javaparser/. So, create a maven project add the javaparser repository and dependency in the pom.xml as given below
I was looking for the checksum validation of credit card numbers and came across the Luhn Algorithm. I had emacs open and on a whim decided I'll try and implement it using elisp. I'm no lisp programmer but, have in the past managed to write some basic elisp in .emacs so, I guessed it would take me about .5hr at most.
I guessed wrong. It took me a lot longer to wrap my head around even some of the simple elisp constructs like let and lambda took quite a while and it took a lot longer than I anticipated. Here, I present to you the fruit of my labors :)
I've been using mercurial fairly regularly for most of my hobby projects. Subversion happens to be the VCS of choice at work. In a recent project I was working on I wanted to do some major refactoring in a work related project and instead of the usual approach of creating a branch in subversion I decided to try a new approach, using mercurial and subversion together :). Branching in mercurial is vastly easier than in subversion and the ability to quickly see a graphical branch tree using hg serve is simply priceless. With minimal hgignore and svn:ignore changes mercurial and subversion can be made to mutually ignore each other :D.
Heres a simple example
$ mvn archetype:create -DgroupId=test -DartifactId=test -Dversion=1.0 -Dpackaging=jar -Dpackage=test
[INFO] Scanning for projects...
[INFO] Searching repository for plugin with prefix: 'archetype'.
[INFO] ------------------------------------------------------------------------
[INFO] Building Maven Default Project
[INFO] task-segment: [archetype:create] (aggregator-style)
[INFO] ------------------------------------------------------------------------
[INFO] Setting property: classpath.resource.loader.class => 'org.codehaus.plexus.velocity.ContextClassLoaderResourceLoader'.
[INFO] Setting property: velocimacro.messages.on => 'false'.
[INFO] Setting property: resource.loader => 'classpath'.
[INFO] Setting property: resource.manager.logwhenfound => 'false'.
[INFO] [archetype:create {execution: default-cli}]
[WARNING] This goal is deprecated. Please use mvn archetype:generate instead
[INFO] Defaulting package to group ID: test
[INFO] ----------------------------------------------------------------------------
[INFO] Using following parameters for creating OldArchetype: maven-archetype-quickstart:RELEASE
[INFO] ----------------------------------------------------------------------------
[INFO] Parameter: groupId, Value: test
[INFO] Parameter: packageName, Value: test
[INFO] Parameter: package, Value: test
[INFO] Parameter: artifactId, Value: test
[INFO] Parameter: basedir, Value: f:\shyam\emacs-23\bin
[INFO] Parameter: version, Value: 1.0
[INFO] ********************* End of debug info from resources from generated POM ***********************
[INFO] OldArchetype created in dir: f:\shyam\emacs-23\bin\test
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESSFUL
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 2 seconds
[INFO] Finished at: Thu Feb 24 03:15:08 IST 2011
[INFO] Final Memory: 8M/20M
[INFO] ------------------------------------------------------------------------
$ svn add test
A test/pom.xml
A test/src
A test/src/main
A test/src/main/java
A test/src/main/java/test
A test/src/main/java/test/App.java
A test/src/test
A test/src/test/java
A test/src/test/java/test
A test/src/test/java/test/AppTest.java
$ svn commit -m "adding test"
Adding test/pom.xml
Adding test/src
Adding test/src/main
Adding test/src/main/java
Adding test/src/main/java/test
Adding test/src/main/java/test/App.java
Adding test/src/test
Adding test/src/test/java
Adding test/src/test/java/test
Adding test/src/test/java/test/AppTest.java
Transmitting file data .
Committed revision 34.
$ cd test
$ svn propset svn:ignore '.hg
.hgignore
.project
.classpath
.settings
target' .
property 'svn:ignore' set on .
$ svn commit -m "setting svn:ignore"
Sending .
Committed revision 34.
$ hg init
$ cat > .hgignore
syntax: regexp
^target
^.project
^.classpath
^.settings
syntax: glob
.svn
^D
$ hg add
adding .hgignore
adding pom.xml
adding src/main/java/test/App.java
adding src/test/java/test/AppTest.java
$ hg commit -m "initial import"