Monday, February 27, 2012

Handling large amounts of data for reporting?

Hi all,
I have an application which logs a considerable amount of data. Each day,
we log about 50,000 to 100,000 rows of data.
We like to report on this data... currently I'm using a stored procedure to
calculate the statistics, however since this is an ad hoc, reports take a
while to generate.
So how do you guys handle large amounts of data? Is there a good way to
precalculate a a set of statistics to handle ad hoc queries (i.e. By Hour,
By Day, By W, By Month). Our application also provides near realtime
statistics... so precalculation has to be done on a continual basis. Does
.NET have any statistics classes that might help out with this sort of
thing? I don't think .NET performance counters will work since they don't
log persistent data.
Any ideas?
Thanks!
Lucas Tam (REMOVEnntp@.rogers.com)
Please delete "REMOVE" from the e-mail address when replying.
[url]http://members.ebay.com/aboutme/spot18/[/url]> So how do you guys handle large amounts of data?
Usually, you can run queries in the background that will aggregate to the
various dimensions you want to measure (e.g. ).
In one current system, we aggregate data by hour, then when someone wants to
do a "this w" or "this month" type of query, we simply union the
aggregated data (super fast) from before the last hour where stats were
rolled up, against the raw data (which is also relatively fast, using the
same kind of query that would "roll it up") and then apply group by to the
derived table. This way the majority of the crunching is already done, and
you are applying operations on raw data to only the last hour or less.
You might also look at Analysis Services or other BI products, but I think
the lag of rolling data into cubes/dimensions will fail your realtime
requirement.|||Do you need to report against a big table that grows 100K rows daily or
do you need to report against a table that has 100K rows?
100,000 rows of data is usually NOT a large amount on decent hardware.|||"AK" <AK_TIREDOFSPAM@.hotmail.COM> wrote in news:1125092267.248510.101290
@.g49g2000cwa.googlegroups.com:

> Do you need to report against a big table that grows 100K rows daily or
> do you need to report against a table that has 100K rows?
> 100,000 rows of data is usually NOT a large amount on decent hardware.
I need to report against a table that grows by ~100K row daily.
Lucas Tam (REMOVEnntp@.rogers.com)
Please delete "REMOVE" from the e-mail address when replying.
[url]http://members.ebay.com/aboutme/spot18/[/url]|||You may also want to explore partitioned views depending on the types
of queries you're looking at; we partion our data daily (about 800k
rows a day), and the optimizer knows wehre to look for the data we want
to retrieve.
Stu|||Summarized datasets, federated views, and OLAP cubes some of the basic
components, but the general solution is called data warehousing. Basically,
the concept is to export a pre-calculated and pre-summarized subset of data
in the OLTP system another database or server for the purpose of reporting
or analysis. The goal is to design a system that is optimized to suit your
own specific analytic needs, so the appropriate implementation details can
very from one warehouse to the next.
This document provides a good overview:
http://userfs.cec.wustl.edu/~cse530...ng-Combined.ppt
"Lucas Tam" <REMOVEnntp@.rogers.com> wrote in message
news:Xns96BEAE3D2946Bnntprogerscom@.127.0.0.1...
> Hi all,
>
> I have an application which logs a considerable amount of data. Each day,
> we log about 50,000 to 100,000 rows of data.
> We like to report on this data... currently I'm using a stored procedure
> to
> calculate the statistics, however since this is an ad hoc, reports take a
> while to generate.
> So how do you guys handle large amounts of data? Is there a good way to
> precalculate a a set of statistics to handle ad hoc queries (i.e. By Hour,
> By Day, By W, By Month). Our application also provides near realtime
> statistics... so precalculation has to be done on a continual basis. Does
> .NET have any statistics classes that might help out with this sort of
> thing? I don't think .NET performance counters will work since they don't
> log persistent data.
> Any ideas?
> Thanks!
> --
> Lucas Tam (REMOVEnntp@.rogers.com)
> Please delete "REMOVE" from the e-mail address when replying.
> [url]http://members.ebay.com/aboutme/spot18/[/url]|||"JT" <someone@.microsoft.com> wrote in
news:Oq9egaKrFHA.3452@.TK2MSFTNGP14.phx.gbl:

> The goal is to design a system that is optimized to suit your
> own specific analytic needs, so the appropriate implementation details
> can very from one warehouse to the next.
Hmmm... that's the problem.
Alot of the data our application generates is being used by external
clients, and they all have differing reporting requirements.
Not to mention, management seems to change the way queries are calculated
every so often.
Any ideas?
Would the Star/Snowflake structures be my best solution?
Thanks!
Lucas Tam (REMOVEnntp@.rogers.com)
Please delete "REMOVE" from the e-mail address when replying.
Newmarket Volvo Sucks! http://newmarketvolvo.tripod.com

No comments:

Post a Comment