Friday, June 23, 2006

comp 491: report | ek$iDump

ek$iDump

The Problem:
Coming up with a portable application to acquire Ekşi Sözlük entries and store them in a database.

Design: Although the final product of the project will be a graph depicting connections between Ekşi Sözlük titles, the connections arise from the content of the titles, which are, obviously, the entries. That is one of the reasons why ek$iDump is a tool for getting the entries rather than the titles. Another and maybe the prime reason for focusing on entries is that entries have unique integer IDs that allow them to be acquired one by one in a for-loop or a while-loop.

ek$iDump is, due to this nature of Ekşi Sözlük entries, at the level of complexity of a “Hello World” program. The program, designed as a console application, gets the starting ID and the terminal ID as its input, which are integer values. In a while-loop, beginning from the starting ID, if the entry with the given ID exists, it gets it from Ekşi Sözlük by calling Entry.GetFromEksi(ID) and writes the details of the Entry acquired to the database. The database of choice is a Microsoft Access file, because one does not have to set up a server for using it; even if you do not have Microsoft Access installed, one can obtain and install a package named Office 2003 Redistributable Primary Interop Assemblies and get on with using the database. Also, the data accumulated in the database is easily exportable to Microsoft SQL Server, which the other two sections of the project, ek$iEdgeDump and ek$iVista use. One always has to make the quantum leap from Microsoft Access to another pro-level RDBMS at some level, as Microsoft Access imposes a size limit of 2 GB on a database file. Note that although it is by no means final, the size of the database file generated in the course of the project exceeds 5 GB. An image showing ek$iDump in action is given in the figure below:


Fig. 4: ek$iDump in action, dumping entries

No comments:

Post a Comment