Welcome to WuJiGu Developer Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
722 views
in Technique[技术] by (71.8m points)

database design - is my data normalized?

I'm making a simple quiz database, where a question has an answer and one or more image files, and belong to a subtopic which in turn belongs to a topic. Also each group may belong to one of three levels.

This is how I set up my database:

QUESTION
-------------------
question_id      pk
question    varchar
answer      varchar
subtopic_id foreign_key

MEDIA
-------------------
media_id         pk
file_name   varchar
question_id foreign_key


SUBTOPIC
-------------------
subtopic_id      pk
subtopic    varchar
topic_id    foreign_key


TOPIC
-------------------
topic_id         pk
topic       varchar
level       choices(1,2,3)

Is my database properly designed? If not, how can I make it better?

Edit: I expect to show a question (and an image, if one exists), and compare the user's answer to the correct answer. I need topics and subtopics to in order to display proper questions.

Edit2: added your suggestions.

Edit3: Updated database description.

Please take a look at this diagram: diagram

Some notes:

  1. Levels can have both unique (Topics 1 and 3) and shared (Topic 2) topics.
  2. A topic can have one or more subtopics, but a subtopic cannot belong to more than one topic.
  3. A subtopic can have one or more questions, but a questions belongs to only one subtopic. Two questions (Questions 2 and 3) may have the same subtopic and topic, but their levels may differ. Although, topics (Topic 2) and subtopics (Subtopics 2 and 3) may belong to many levels (Levels 1 and 2), some subtopics hold exclusively questions of level 1 and others hold questions (Question 4) of level 2 (Subtopic 3).
  4. A question can have only one answer and that answer cannot belong to another question.
  5. An answer (or it's question) can have one optional image file, and that image file can be used in other questions.

Thanks.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Great question, and that you ask it before building the app speaks volumes for your ability to learn and improve your data modelling skills.

Limitations of Your Model

This is normalised in the simple sense: you've already decided what the entities are; what relations you need; you've even decided that they are meaningless things that have no identifying characteristics and you've given them IDENTITY columns as physical keys. Then you've normalised the tables.

That has nothing to do with the Relational Model or Normalisation.

It is typical of thinking in terms of the app (understandable, because that's all you have, and you know the functional needs). But that has nothing to do with the data. if you create it from that perspective, you will not get a database, you will get a storage location for the app. Now if the app was to be written for one quiz and thrown away, that would be acceptable. But if the quiz is going to be run for a month or two; if the company is still going to exist in ten years; if the quizzes are going to grow; if the stats that are going to be run, are going to grow, improve and mature ... well you will have quite a few problems. Every times you enhance the app, you will have to replace its "database". At some point they will decide that the app (and the "database" contained within it) is too expensive to change, and ditch both of them.

Even for the short duration that the app exists, because the collection of data is not normalised, it is very limited, the "power" of the RM, that which is implemented in every commercial RDBMS, is lost to you. Navigation (which is cumbersome anyway, due to SQL joins being cumbersome) is even more cumbersome. Eg. to get from a child to a grandparent, you are forced to get the parent, even though to are not getting any columns from it. User can only access the data via the app, it is "closed". These days there are thousands of report tools that connect to the database and allow the users to perform all kinds of queries, which have not been identified to you yet, without having to wait for a change request to be implemented by the app team: that capability is lost. And you will have to "re-factor" the data heap every year or so.

Nomalisation

Nomalisation produces Open databases, that allow such access without limitation.

Data lives much longer than the app. It survives the app, and has a value to the org. When the app gets replaced, they will extract the data and want the replacement app to be ready with the data loaded. If they did not use a data modeller, new app designer did not learn any lessons from the old app designers mistakes, and the whole repeats itself.

Ok, so the org expects to exist next year. Ok, so you are going to run more than one quiz in production. Ok, so you've hired a data modeller to run his eye past your collection of data, so that you don't make those mistakes. Excellent. Thankyou. I accept.

Relational Modelling 203

Data is modelled (a) completely independent of the app and any functions that you might know about and (b) using completely different techniques, that app developers do not know about. So let's get started. This is not a formal lesson re Normalisation (that will take too long, and the theory will bore you to death); this is just looking over the modellers shoulder while he works. All you need to do is answer questions [every time I post a question mark, please, stop and answer, before moving on], honestly (not suggesting you are dishonest; just reinforcing that "I don't know" is quite acceptable because it identifies an area we have to work on; whereas the clear affirmative and negative allows us to avoid discussion). Also, forgive me, but I will assume the answers, just to avoid the otherwise back-and-forth delays; please point out any of my mistakes, and I will correct the model.

Ok, you have some Entities that you are clear about, so let's start with them, I think they look like this Scribble in the Sand. I am purposely not using standard symbols because I do not want those meanings drawn into it at this stage, because we have not achieved those meanings, and we don't want to convey false expectations to people (including ourselves) who understand standard symbols.

  1. How are the rows in each Entity identified uniquely from every other row ? (This is important because it helps us verify that the Entity is in fact an Entity, and not Donald Duck, something not real.) From the columns provided, you would have (at least):
    Question.Question
    .
    Media ? how exactly, am I going to ensure that there are not 42 images which are identical ? If I allow that, you will kick my rear end when our boss screams at you, and then he will kick it as well. IDENTITY won't help me, it will happily let me insert duplicates. FileName is a good identifier. Media.FileName
    .
    Topic ? would you like 101 identical Topics ? I thought not.
    Topic.Topic
    .
    Subtopic.Subtopic ? No. IDENTITY ? No. You are pretty clear it belongs under Topic, and I am already carrying Topic as a Foreignkey, which has meaning; if it is a dependent child, not an independent orphan, then the FK is an Identifying Relation:
    Subtopic: (Topic, Subtopic).
    .
    Ok, we have fuselage.

  2. How are the Entities related to each other ?
    You are clear that Subtopic is a child of Topic, fine.
    .
    Question is a child of Subtopic? I think not. Do you really want the same Question "how many cars does the household own" which may exist in 42 quizzes, inserted in 42 rows ? "there will never be the same question in two quizzes" is suspicious, so please don't say it, unless you are going to put it in writing. What's worse is, let's say the question is a very picturesque one, then we will have 42 Media entries as well. If the boss gets upset at having to change the same question in 42 places, and when he does that he finds some of the images were ancient, they were not updated when the major image update took place last year ... it's your rear end.
    .
    How about we allow for the possibility of a Question existing in more than one Quiz ?
    .
    So Question is Independent.

  3. Back to Entities. In our discussions, starting with the fifth word, you keep saying "Quiz", and I keep saying it, but we don't have an Entity for it. I can't imagine a loose and duplicated set of Questions, that I've just de-duplicated, without identifying said collection of Questions. Now you may say "actually the Quiz is the Topic", but that is very restrictive: every time the boss needs to add a quiz for some new customer, he has to add the whole Topic/Subtopic/Question set all over again, even when he knows he has entered those exact Questions before, in existing Quizzes; which is why he won the new business, and why he was going to finally make a profit, which you've just reduced to zero. My rear end is still hurting from his last upset, so let's make sure he does not have another excuse. This let's him grow, nurture, and change Quizzes and Topic/Subtopics independently, without duplication of Questions: .
    Entity. Quiz
    Key ? Well, it better have an index on it to ensure we do not supply him with duplicate Quizzes. Let's say that we don't want to carry CHAR(80) keys into the children, and the user think a ShortName CHAR(12) that they get to make up is a nice identifier, better than a 10 digit number. Quiz: QuizCode . ok, we have fuselage big enough for payload. So far we have this Improved Scribble in the Sand.

  4. What Entities are truly Independent, that can exist without the existence of other Entities; and the remainder, which exists only in the context of another Entity ?
    Media is still a bit up in the air. While we have excluded duplicate FileNames, we have not allowed for the same Image being used in more than one Question. Let's allow that. We do not want an IDENTITY, we are already carrying the Question PK, so it is (a) Dependent and (b) Identified, by the parent.
    .
    Question PK as CHAR(255) was fine before, but carrying it as the FK in the child becomes silly. You like IDENTITY, and there are going to be millions of Questions. Fine. Question.QuestionId
    .
    Let's give these Independent Entities square corners, and leave the children Dependent Entities round. Now we have this almost, but not quite, Entity Relation Diagram.

  5. Let's finish the Relations. So far we have:
    A Quiz can have many Questions, and a Question can exist in many Quizzes.
    Question is a child of Subtopic, but it is Independent.
    Media exists only in the context of Question, as an option.
    .
    Since we live in an Universe that has some order, and our endeavour is a scientific one, let's give a child of it some order. I think we now qualify for a Logical Entity-Relation Diagram.
    The solid lines are Identifying Relations; the broken lines aren't. Now, because we have applied some standard requirements to the Entities and Relations, we can use standard symbols. Feel free to point out my mistakes.

That's it, we are done. I hope I have conveyed that the exercise is back-and-forth, that's why it is called modelling. The astronauts have separate, private quarters, and they can communicate with each other.

------------------------------------------------------------------------------------------

What's that ? You want the Data Model as well ? Ok, give me five minutes. Please grab a coffee for yourself.

------------------------------------------------------------------------------------------

It is small enough that I may as well give you the physical Data Model. I use the IDEF1X methodology for modelling Relational Databases, which is a standard, the Data Model is therefore rendered in IDEF1X Notation, please check that link if you need a quick review of what the symbols mean. .

  1. Many-to-many Relations are a logical concept, and drawn as such in the Logical Model. They are implemented as Associative tables in the physical model. I have supplied that. A Normalised database has more tables (nothing to be afraid of) but fewer columns per table, and no duplicate columns (no Update Anomalies).

  2. Oh my God. The Topic and SubTopic columns are huge! We can't migrate those fat foreign keys into Question. [Discuss with The Business Users.] Ok, they say there will be only one hundred Topics and a few thousand Subtopics. No need for a NUMERIC(10,0). They want the full Topic and Subtopic in the drop-downs, and they agree it has to be unique, but an additional


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to WuJiGu Developer Q&A Community for programmer and developer-Open, Learning and Share
...