Handle Error: Have 100+ columns, or reduce it down to about 3?

Wednesday, March 21, 2012

Have 100+ columns, or reduce it down to about 3?

Basically, what I'm doing is storing answers to questions in a survey. I have two ways I can organize the table:

1. Just having a table with lots of columns - one for each question

2. A table with only about maybe 3 columns:

SurveyID
QuestionID
QuestionAnswer

In this second case, the the primary key would be both SurveyID and QuestionID combined, of course.

I don't fully know the pros and cons of the two approaches, and both look like they would work. Right now, I'm using option 1, but I keep wondering if option 2 might be better. Whenever I change the questions, I currently have to drop and recreate the table (altering it is too much effort), and I know option 2 would be a way of avoiding that. By the way, the questions themselves are stored in an xml file, if it means anything. Anyhow, once the survey is being used, there shouldn't be any further changing of questions. And there's also just too much I don't know about (how is performance affected, for example?).

Any ideas which is better and why?if you're currently doing it like option 1, then presumably you are able to write some sql for it?

okay, a couple of sample problems, may we see your sql please

1) which surveys had more than half the questions answered correctly
2) which surveys had the same number of answers as survey 23|||None of the surveys have been answered already. I'm still writing the code that will store the answers given. The questions themselves aren't final yet, but the xml file is getting constantly updated as we decided what to ask and not to ask. So when the xml file is changed, I generate a new (rather large) Create Table statement, drop the old table, and create the new one so things can be tested out. It's just extra work while developing the survey.

When a new survey is started, a new record is created for it right from the beginning, with all columns set to null. As questions get answered, values get inserted. This way, it's possible to come back later for whatever reason.

I don't see how posting the entire Create Table statement helps anything. The questions' answers are stored as bits, ints, reals, datetimes, or varchars, depending on the kinds of questions. Unanswered questions remain null, of course. It looks like there are about 160 columns in the table right now that are specifically for answers to the questions.

Really, my question simply is am I going to be ok with having a large number of columns in the table, or am I better off using what I mentioned above as "option 2", reducing the number of columns in the table.|||Think about R937's question a little more. Suppose you inherited a system like your option 1. Further, suppose you were given requirements to write reports based on R937's questions. How would you go about it?

Design questions need to take into account how you store the data AND how you get it back out. Unless of course, you are designing a black hole, in which case, it does not matter how to get data out.|||didn't want to see the CREATE TABLE statement

wanted you to think about the SQL that you would have to write against the humungous table to get meaningful stats out of it

my advice: use whichever structure you feel most comfortab le with|||oh...so in your opinion, it doesn't really matter? I was wondering if maybe one way was technically superior and/or more efficient for some reason that I may not have already seen.|||oh...so in your opinion, it doesn't really matter? no, i didn't really say that :)

i think 1NF might be applicable here

actually, i would almost always implement option 2

but since you were asking for advice, and since you seem to have settled on option 1 already, i figured it was important to let you know that being comfortable with a technique also has some importance|||Whenever I change the questions, I currently have to drop and recreate the table (altering it is too much effort)Learn ALTER TABLE syntax - it is substantially less effort than the method you describe - especially once you have data in your tables.|||You have to go option 2. There is no other way...Don't even consider option 1 - total train wreck in the making.

(sorry to diffuse the suspense created by r937 but it was driving me nuts!)|||my current project is both 1 and 2 because we are processing gigs of claims every week. 1 for OLAP and 2 for OLTP. So my answer is both.sql

Wednesday, March 21, 2012

Have 100+ columns, or reduce it down to about 3?

No comments:

Post a Comment

Handle Error

Blog Archive

About Me