Wednesday, November 29, 2006
DBA Job Interview Question #5: Explain the Difference Between VARCHAR and NVARCHAR
What is the difference between VARCHAR and NVARCHAR?This is somewhat of a tricky question in the respect that the typical answer people will give sounds like they read it from a book - and that's bad. Here's a typical answer: "NVARCHAR is for Unicode data and VARCHAR is for non-Unicode data."
That's certainly correct but, as an interviewer, I would start digging deeper by asking further detail. Yes, you've provided a technically correct answer, but your answer shows no real knowledge other than your having memorized a definition.
What I would like to hear, as the interviewer, is an answer that shows me you know what Unicode data is, when it's used and what scenarios that it is appropriate to use Unicode. If you knew a few "gotchas" and "Best Practices" then I would be even more impressed. I would expect that, after watching my video on COLLATE and How Queries and Table Data Work with Case-Sensitive and Accent-Sensitive Data you'll be very well versed in using Unicode and Non-Unicode data.
The main point of asking this question is to see how "deep" your knowledge of SQL Server's internals is. For me, this question is really four questions:
- Explain character sets/code pages in relation to using VARCHAR/NVARCHAR
- How do you decide between using VARCHAR or NVARCHAR?
- What is Unicode?
- Why don't you just use Unicode for everything?
- Explain character sets/code pages in relation to using VARCHAR/NVARCHAR
- Character Set and Code Page are interchangeable words
- They define which characters are used
- Common to use Latin1_General in Latin-language based countries such as those that speak English, Italian, French, and Spanish
- Determine how the data is sorted in an ORDER BY and compared in WHERE/JOIN clauses
- How do you decide between using VARCHAR or NVARCHAR?
- This is generally a pretty simple decision: will you store data in multiple languages or that contains characters from multiple languages? If "Yes" then it should most likely be NVARCHAR otherwise you're probably safe using VARCHAR.
- Columns like FirstName and LastName for a website like ours that has users all around the world requires NVARCHAR
- Columns like CountryCode and EmailAddress are English-only thus they are VARCHAR
- What is Unicode?
- Too much for this post - read the Wikipedia entry
- Why don't you just use Unicode for everything?
- Because, if you just need to store US English (for example) then you would double the storage by using NVARCHAR vs. VARCHAR
Technorati Tags: Scott Whigham, Whigham, SQL Server, SQL Server 2005, SQL Server, Microsoft SQL Server
Monday, November 27, 2006
DBA Job Interview Question #4: Explain the Difference Between VARCHAR and CHAR
Explain the Difference Between VARCHAR and CHARThis is about as much of a softball question as you could expect on a junior-level interview but, nonetheless, it is still an important one to understand. A lot of times interviewers will ask softball questions so that they can grill you on your answers:
Interviewer: "Why don't you tell me the difference between using VARCHAR and CHAR?"Now, if I'm the interviewer, that's exactly the type of answer that sounds like it was memorized from a book. If you gave me that answer in a job interview, I would immediately start hammering you to see if you really knew what you just said! If, on the other hand, you gave a more meaningful answer (something that sounds like you have experience with the topic), I'm more likely to be impressed:
Interviewee: "Oh sure - VARCHAR is for variable length data and CHAR is for fixed data."
Interviewer: "Why don't you tell me the difference between using VARCHAR and CHAR?"You haven't really given a definition or memorized-from-a-book type of answer; what you've really done is explain the real-world usage of the two datatypes in a way that shows that you are THE MAN.
Interviewee: "Oh sure - VARCHAR is used for string data and is really used for storing strings that have an indefinite length; i.e. someone's last name. We don't know, in advance, what everyone's last name will be so we use something like VARCHAR(128) so that, in case they have a really long last name, we have the storage for it but, if they have a very short last name, we can store it as well. Now with the CHAR datatype, that's something that I really don't use often since it's for a fixed length string. Something like Country codes, state codes or even product codes that a fixed number of characters - that's when I use the CHAR datatype."
BTW - In case you haven't heard or don't know, there are multiple US English pronunciations of VARCHAR and CHAR. I say var-char (pronouncing the "ch") but others say var-car (a hard C like the word "car"). They're both right - it's just a preference. I've also heard var-chair and var-care (like the word "care"). I've even heard "vair-" (rhymes with "care")... Don't be intimidated to pronounce it differently from the interviewer or to mention different pronunciations. Because I work with people from all parts of the world, whenever I talk about VARCHAR/CHAR datatypes, I make sure to use both of the most likely pronunciations when I introduce them into converstation. I say something like, "Now I notice your var-char/var-car usage is a bit suspect. By the way, some people say var-char and others say var-car. I just say var-car." That way it's out in the open, it helps everyone understand my accent/preferences, and it sort of lets them know that it's totally fine for them to say it a different way.
I guess "data" (like a sheep - "daaaaaaahhhhhta") and "data" (like "day-tah") are the same way. Sometimes I say database, other times I say "database". I don't know why.
Technorati Tags: Scott Whigham, Whigham, SQL Server, SQL Server 2005, SQL Server, Microsoft SQL Server
Sunday, November 26, 2006
DBA Job Interview Question #3: When Should You Use CHAR/NCHAR?
When is it appropriate to use the CHAR/NCHAR data types?As an interviewer, I love to ask questions that assume knowledge (as this one does). To answer this question you must already know what the CHAR and NCHAR data types are as well as what the difference is between them.
Let's start with the basic answer that most interviewers would want to hear: "You would use CHAR/NCHAR anytime you needed to store a fixed amount of text/string data. Deciding between CHAR/NCHAR is just a matter of determining which types of and/or how many different characters need to be stored."
That's the right answer technically. However I think you'll find that these fixed datatypes are used a lot more in college classes on databases than they are in real life, at least that's been my experience. I don't think I've ever seen a fixed datatype column of more than 10 or 12 characters used correctly. It just isn't all that often that you have fixed data that is truly always "x" characters and "x" is larger than 10 or 12.
"What about social security numbers - those are fixed 12-character values?" you ask. If you don't know, Social Security Numbers (SSNs) are a United States government-issued way to identify people in the US and are usually in the form of nnn-nn-nnnn (like 123-45-6789). The problem is that, in the 20s, 30s, and 40s, the government was known to reuse numbers and assign them in the form of nnn-nn-nnnnA, nnn-nn-nnnnB, etc therefore we cannot guarantee that an SSN will always be 12 characters - it might be 13 or 12 but we know it's one of those two. Some people think that, because it is "kind of" fixed, that it should be CHAR/NCHAR but I disagree. Since we cannot guarantee a fixed width, I would say that this should be a VARCHAR column but that's my opinion...
Let's look at an example using social security numbers (SSNs):
CREATE TABLE #Employees (Name VARCHAR(128), SSN CHAR(13))Which of the two queries will return the row for the '123-45-6789'? I don't know if the font makes it clear but the first query is testing for "123-45-6789" and then a space whereas the second query is just testing for the 11 characters "123-45-6789". Since SSN is a CHAR(13) datatype, which of these queries will return the right row?
INSERT #Employees
SELECT 'Scott', '123-45-6789'
SELECT * FROM #Employees WHERE SSN = '123-45-6789 '
SELECT * FROM #Employees WHERE SSN = '123-45-6789'
Let's look at the first query and answer the question, "Will the space at the end of the first query cause it not to find the row?" Remember: we're talking about 12 characters being stored inside a CHAR(13) datatype therefore there are 13 characters stored: 12 "real" characters and one "blank space" character. Therefore, since the space is stored inline with the data, this query returns the correct row.
Let's look at the second query and answer this question: "Will not having a space at the end of our literal string '123-45-6789' cause it to fail to find a matching row?" Now we are comparing 12 characters against 13 so, logically, you would not expect a match, right? After all, "123-45-6789 " is not the same as "123-45-6789" but, in SQL, this query is correct as well.
So both queries "work" and give us the correct answer but why? The reason is that, since the underlying column is a CHAR(13), all string data that is compared against the column is implicitly converted to CHAR(13) before the comparison takes place. Therefore, even though you entered "123-45-6789" into your expression, SQL Server converts it to "123-45-6789 " before testing it against the actual column.
What would happen if you changed the CHAR(13) to VARCHAR(13)? You get the same rows and everything is the same way except now, "123-45-6789 " is converted to "123-45-6789" since the variable length datatypes "truncate" any trailing spaces.
Well, I suppose this post went far, far deeper than just the basics of when you should use the fixed-length datatypes but I hope it left you with a better understanding of how they work.
Technorati Tags: Scott Whigham, Whigham, SQL Server, SQL Server 2005, SQL Server, Microsoft SQL Server
Friday, November 24, 2006
DBA Job Interview Question #2 - User-Defined Datatypes
Have you used user-defined datatypes before? If so, what are your thoughts?If I were the interviewer, I would ask this question in an earnest manner; I really want to know your thoughts on using UDTs. This is one of those questions that does have a wrong answer IMO but you would have to really blunder into it. I would ask this question because I'm looking to see what your experience level is and, of course, to see if you make the mistake of answering it poorly.
THE WRONG ANSWER: This would involve something along the lines of, "You should always use them whenever..." That's just me though (my personality) - I just don't tend to like/get-along with people who speak about complex topics in absolutes; i.e., "I always...", "I never...", etc.
THE RIGHT ANSWER: Just about anything else that shows me that you (a) understand what they are, and (b) can describe when they might be used, and (c) describes any experience that you've had using them. It would be totally fine to give an answer that involves defining the UDT, describing a scenario that could benefit from it, and then saying, "I've actually never worked on a database that used UDTs so I don't really have an opinion of them." I would much rather hear an honest answer than for someone to try to B.S. me on using UDTs.
UDTs have their place. Personally I don't like to use them but, if someone makes a good case for why they chose a UDT, I can totally understand the seduction of using a UDT. The worst thing someone could do when answering this question would be to lie and explain a scenario in which they used a UDT because I would, of course, ask them about it in detail. I'm a curious guy - I want to know why you chose the UDT and if there were any problems with it. There are some pretty significant problems with using UDTs when it comes to change management and, if you could not identify at least one major pitfall of UDTs, then I'd have to assume that you were don't have that much experience with them or that your experience is "surface-level." That would take you down a peg in my book.
It would be totally fine to answer something like, "Yes, I have used them but it was very surface-level. I really wasn't in charge/didn't-have-to work with the day-to-day management of the UDT." Then I wouldn't really question you further.
Technorati Tags: Scott Whigham, Whigham, SQL Server, SQL Server 2005, SQL Server, Microsoft SQL Server
Thursday, November 23, 2006
DBA Job Interview Question #1 - DATETIME vs. SMALLDATETIME
Describe the difference between DATETIME and SMALLDATE and describe when you would use eachMy answer would first start out with an overview of what the basic differences are - I would keep it high level and then, if they want more detail, I would explore deeper. The minor differences are the DATETIME is 8 bytes and SMALLDATETIME is 4 bytes as well as the fact that the date ranges in DATETIME are larger (1753 - 9999) than SMALLDATETIME (1900-2079).
Another more subtle difference is that SMALLDATETIME only stores/works-with minutes; it does not handle seconds or milliseconds whereas DATETIME uses milliseconds. The documentation talks about the accuracy of these but, to me, it's more important to understand that if I need to compare/store seconds, I am forced in to using DATETIME.
Lastly, another even-more-subtle difference is that SMALLDATETIME will round up/down to the nearest minute whereas DATETIME will round up/down to the nearest 3/1000th of a second.
The second part of this, "describe when you would use each", is sort of answered by the explanations above, isn't it? I would tell the interviewer that, if I needed data before 1900 or after 2079, the of course I'mn using DATETIME. If I need to store/compare the seconds, then I'll use DATETIME.
This post assumes that you know a little bit about dates and times in SQL Server - that's sort of beyond this post to teach you what they are - so this post just focuses on answering the interview question.
Hope this helps you!
Technorati Tags: Scott Whigham, Whigham, SQL Server, SQL Server 2005, SQL Server, Microsoft SQL Server
SQL Server DBA Job Interview Questions with Answers
I recently was asked to help someone prepare for a job interview and, during our discussions, I came up with a list of a few job interview questions that might be asked of a junior-level database administrator (DBA). Instead of posting them all at once and not being able to answer them(it's a mighty long list), I'm going to post them individually so that I can give the answer(s) that I would like to hear if I were the interviewer. It's going to take a while to go through all of these - there are over 30 questions total - so to read them all, you'll have to go through each post.
Now, these are *my* answers and this is what *I* would look for so I can't say at all whether or not these are the answers that someone else would want to hear. The more you understand the nuances of the question, the better you can tailor your answers to what the interviewer wants to hear.
What? You didn't like that last part? Well that's what it's all about, isn't it - giving the interviewer the answer they want to hear? We're talking about a junior-level DBA position here - not a senior-level person who has 10 years experience and whose skills are extremely desirable by just about anyone - they can pick and choose the job they want. Someone wanting to break into database administration at the junior-level cannot be so picky all the time...
I'm going to break these into different sections:
- Datatypes
- Normalization and Relational Theory
- Table Design
- Disaster Recovery
- Importing and Exporting Data
- Queries and T-SQL Programming
- Database Administration
- and more.
Stay tuned and we'll get started with our first post!
Thursday, November 09, 2006
SQL Server 2005 Service Pack 2 - Download the CTP
Lots of changes here - and we get our 2nd service pack before the first year anniversary :)
Here's a direct link to the SP2 CTP: Download SQL Server 2005 SP2 CTP
Here's a link to the SQL Server CTP page - http://www.microsoft.com/sql/ctp.mspx
Friday, November 03, 2006
(OT) Amazing Pictures from Hubble Telescope
I was browsing reddit.com today and someone thankfully posted this link to the Hubble Telescope's Top 100 pictures. A lot (all?) of these pics have been "digitally remastered" to enhance the color but even so there are some just breathtaking images.
Nothing related to SQL but I'm a geek, you're a geek, so I thought I would share!
-------------------------------------------------
EDIT: I also saw this on digg.com today: http://www.enchgallery.com/fractals/fracthumbs.htm which is an amazing collection of fractals
Thursday, November 02, 2006
Cannot Create Index on View with MAX, MIN or AVG
Last month I was doing a few SQL-based optimizations for the database behind our sites and I ran into a strange restriction that I can find no logic behind. Here is the error message:
Cannot create index on view "dbo.MyView" because it uses aggregate "MAX". Consider eliminating the aggregate, not indexing the view, or using alternate aggregates. For example, for AVG substitute SUM and COUNT_BIG, or for COUNT, substitute COUNT_BIG.
The source of the view was something like this:
CREATE VIEW MyView
WITH SCHEMABINDING
AS
SELECT IdColumn, COUNT_BIG(*) AS NumberOfSales
, ISNULL ( MAX(wv.DateAdded), GETDATE() ) AS DateOfLastSale
, ISNULL ( SUM ( v.SalesPrice ) ) 0 )
AS TotalPrices
FROM dbo.StoreSales v JOIN dbo.WebsiteSales wv
ON v.SaleId = wv.SaleId
WHERE wv.IsEnabled = 1
GROUP BY StoreId
GO
CREATE UNIQUE CLUSTERED INDEX ci_MyView
ON MyView(StoreId)
GO
I have scoured the documentation and I can see that MAX, MIN and AVG (among other things) are not allowed but I can find no mention of *why* they aren't allowed. There's a chapter in the SQL Server 2000 Resource Kit (here) that talks about the restrictions but again, it makes no mention of the "why". Maybe Kalen's new book, Inside SQL Server 2005: The Storage Engine, can offer something but I'm not holding my breath.
Anyone know the logic behind this?
Wednesday, November 01, 2006
Speed Up SQL Server Management Studio (SSMS) - SSMS Is Slow!
A friend of mine, Chris Radcliffe, shared this tip with me last week and lo and behold, it fixed a SQL Server installation that took up to 2 minutes to load Management Studio and made it insta-load! Here is what Chris says to do and it worked for me -
- Go to Internet Explorer ( yes IE!)
- Browse to Tools>Internet options>Advanced tab
- Uncheck the check box under Security called "Check for publisher's certificate revocation" - This stops MS from checking the Internet every 15 seconds.........
Good one Bill Gates..
regards to all Chris
Thanks, Chris! One caveat - there are ramifications of not checking for the certificate information so know what they are before you set this...