Julien Lengrand-Lambert
01e128f494
Adds a new column to tweets in db : invalid
...
Will be used to set problematic as crawled and try to correct them later on. This avoids having to process them over and over.
I might have to think about a better way to perform database updates if I do it often.
2013-01-10 10:35:41 +01:00
Julien Lengrand-Lambert
c3bbb84edd
tests to see if manual flush is useful.
...
Apparently not !
keep that in mind though.
2013-01-04 20:40:01 +01:00
Julien Lengrand-Lambert
4cebc33e50
Works on Unicode problem.
...
No unicode error displayed anymore.
Let s search for Exception now.
2013-01-04 20:18:31 +01:00
Julien Lengrand-Lambert
b604a12bc3
All current tweets have been processed.
...
Sets counter to 1 when new Member is created
Unicode problems temporarly solved, but some work to be done there.
Also have to create something to store problematic tweets and try to correct them later (and search for patterns).
2013-01-04 15:42:24 +01:00
Julien Lengrand-Lambert
3c1fceb82c
Starts taking care of exceptions in tweets.
...
Creates dedicated exception for it.
Problems printing unicode!!
2013-01-04 15:33:30 +01:00
Julien Lengrand-Lambert
ab2351afb6
Finishes handling of creation and update of members.
...
Non crawled tweets can now be processed in a row.
Next step is to take care of all Exceptions correctly.
2013-01-04 15:12:39 +01:00
Julien Lengrand-Lambert
edd02a4c13
Starts implementing create method in counter.
...
Aims at creating a new Member when the couple hashtag/author is not found.
2013-01-04 13:09:31 +01:00
e8b768290e
Starts creating the counter thread, that will create author/hashtag couples
...
Creates the ORM for member.
Next step : Connect to db and crawl tweets db
2012-12-20 17:39:54 +01:00
3c2aceedf1
Hashtags now keep their #, and doubles are removed.
...
main hashtag detection is complete
Let s save in db correctly now
2012-12-20 16:59:02 +01:00
10a79e1999
Adds main hashtag extraction.
...
Later on we may want to keep # in the string
2012-12-20 16:15:29 +01:00
4085cd582c
Removes the part about unicode errors.
...
I don´t want any in my code!
2012-12-20 16:05:39 +01:00
fe379f536d
First version of StreamSaverListener to be tsted.
...
Tweets should soon be saved in db
2012-12-18 15:57:09 +01:00
e4def40982
Places streamer and authentification in a separate file.
...
Now script has his own file.
Adds .pyc to list of files to be ignored
2012-12-18 15:25:34 +01:00
fbe4b6c801
Finishes first definition of tweet element in database
2012-12-18 15:17:41 +01:00
691ccea17d
Starts creating datamodel for tweets
2012-12-17 18:37:32 +01:00
8c487e337c
Updates name of the project and corrects folder structure.
...
Code, tests and config are now well separated
2012-12-17 18:00:36 +01:00