Commit Graph

16 Commits

Author SHA1 Message Date
Julien Lengrand-Lambert
01e128f494 Adds a new column to tweets in db : invalid
Will be used to set problematic as crawled and try to correct them later on. This avoids having to process them over and over.

I might have to think about a better way to perform database updates if I do it often.
2013-01-10 10:35:41 +01:00
Julien Lengrand-Lambert
c3bbb84edd tests to see if manual flush is useful.
Apparently not !

keep that in mind though.
2013-01-04 20:40:01 +01:00
Julien Lengrand-Lambert
4cebc33e50 Works on Unicode problem.
No unicode error displayed anymore.
Let s search for Exception now.
2013-01-04 20:18:31 +01:00
Julien Lengrand-Lambert
b604a12bc3 All current tweets have been processed.
Sets counter to 1 when new Member is created

Unicode problems temporarly solved, but some work to be done there.

Also have to create something to store problematic tweets and try to correct them later (and search for patterns).
2013-01-04 15:42:24 +01:00
Julien Lengrand-Lambert
3c1fceb82c Starts taking care of exceptions in tweets.
Creates dedicated exception for it.

Problems printing unicode!!
2013-01-04 15:33:30 +01:00
Julien Lengrand-Lambert
ab2351afb6 Finishes handling of creation and update of members.
Non crawled tweets can now be processed in a row.

Next step is to take care of all Exceptions correctly.
2013-01-04 15:12:39 +01:00
Julien Lengrand-Lambert
edd02a4c13 Starts implementing create method in counter.
Aims at creating a new Member when the couple hashtag/author is not found.
2013-01-04 13:09:31 +01:00
e8b768290e Starts creating the counter thread, that will create author/hashtag couples
Creates the ORM for member.

Next step : Connect to db and crawl tweets db
2012-12-20 17:39:54 +01:00
3c2aceedf1 Hashtags now keep their #, and doubles are removed.
main hashtag detection is complete

Let s save in db correctly now
2012-12-20 16:59:02 +01:00
10a79e1999 Adds main hashtag extraction.
Later on we may want to keep # in the string
2012-12-20 16:15:29 +01:00
4085cd582c Removes the part about unicode errors.
I don´t want any in my code!
2012-12-20 16:05:39 +01:00
fe379f536d First version of StreamSaverListener to be tsted.
Tweets should soon be saved in db
2012-12-18 15:57:09 +01:00
e4def40982 Places streamer and authentification in a separate file.
Now script has his own file.

Adds .pyc to list of files to be ignored
2012-12-18 15:25:34 +01:00
fbe4b6c801 Finishes first definition of tweet element in database 2012-12-18 15:17:41 +01:00
691ccea17d Starts creating datamodel for tweets 2012-12-17 18:37:32 +01:00
8c487e337c Updates name of the project and corrects folder structure.
Code, tests and config are now well separated
2012-12-17 18:00:36 +01:00