Data Mining Scenarios and MySQL

Today we continued to look into how to store the data we will be mining. We looked at our schema and decided it would be a good idea to abstract out our “visit” entity. A visit is going to be a series of hits (page views). They will be related to each other because they’re from the same visitor. We can link them together and to a user either by using the user’s login session or with cookies if they aren’t logged in. Instead of worrying at this phase about what constitutes a visit (Is it when they haven’t returned for another hit in over a half hour? Is it that a visit ends when they close their browser window?), we can just calculate the visits later on with parameters. We can link the hits together with these parameters to present the visits given those parameters. This may for example be an SQL view instead of the results of querying an SQL table.

After learning that the client would prefer we not use Postgres and instead use MySQL or SQL Server, we decided to benchmark MySQL, specifically how quickly it can read JSON data types. It turns out modern MySQL also supports JSON as a native type. It can even do indexes on JSON columns to speed up read queries. Coupled with the fact it is also free and open source, we’re mainly looking at it at this point for our back end. Given the log data we got from the client, where we learned how many unique visitors they get per month, we ran some scenarios to find out how many queries per second our system would need to handle. We’re still well within doable range. We’re confident we’ll be able to get a system ready that’s capable of handling the queries, with room to grow for the future.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s