Recently, I posted “Interview with a Data Scientist” at my company’s blog site. In it, my friend and colleague Yan Li answers four questions about being a data scientist and what it takes to become one. In my view Yan’s responses provide a bracing reminder that data science is something truly new, but that it rests on universal principles of application development. Continue reading
Recently there was a great post at Dzone recounting how one “tech savvy startup” moved away from its NoSQL database management system to a relational one. The writer, Matt Butcher, plays out the reasons under these main points:
- Our data is relational
- We need better querying
- We have access to better resources
Summing up: “The bottom line: choose the right tool.” Continue reading
Application developers and business people accessing relational databases need data dictionaries in order to properly load or query a database. The data dictionary provides a source of information about the model for those without model access, including entity/table and attribute/column definitions, datatypes, primary keys, relationships among tables, and so on. The data dictionary also provides data modelers with a useful cross reference that improves modeling productivity.
It is particularly useful for the dictionary to be a filterable/sortable Excel document, but out of the box ERwin, one of the leading data modeling tools, includes a notably inflexible reporting capability. Luckily, it is possible to directly query the ERwin “metamodel”. However, I found the ERwin documentation a bit hard to decipher and not quite accurate. Hopefully this post will save modelers some steps in figuring out how to query the metamodel.
Here are the topics covered:
- ODBC drivers in the ERwin install
- Reporting experience in MS Access, WinSQL, and MS Excel Continue reading
Asking fact questions in technical interviews is like eating a donut, feels great at the time but not so satisfying later.
Let’s say the interview consists of facts like this “softball question”: “What is the default port number for SQL Server?” The linked list of questions is a really good high level study guide for a SQL Server student. If a SQL Server developer candidate answers all correctly, then the interviewer can be confident that the candidate knows a lot about SQL Server.
However, few development jobs require only technical fact knowledge. Typically, developers must apply creativity when working with unclear or poorly expressed requirements under tight schedules. They must be versatile so that they can take on unforeseen roles in case of resignations or transfers of team members. If you make an investment in an individual by hiring her or him, you’ll look for a return in the form of professional development as the individual grows their skills.
So how do you test creativity, versatility, and ability to learn, while still gauging raw technical talent? My method is to ask opinion rather than fact questions. Continue reading
Recently I read an editorial about job interviews. It was breezy and funny, but not very helpful. Given that millions are out there looking for work, I want to help by giving my perspective on how to “win” the interview.
I do a lot of interviewing, from both sides of the desk. As a consultant I am interviewed by clients. As one of many technical and behavioral interviewers for my employer, I talk with candidates about their skills, goals, and fit with our business.
Of course, winning the interview may not get you the job. An interview is just one part of a many step process. Getting a job involves showing you have the skills, establishing mutual fit, coming to terms on salary, and standing out versus the competition. This post is only about how to do well in the interview.
Assuming you’re qualified for the job, you can set up a good interview experience by applying the right mental model, preparing well, and interacting effectively during the conversation. Continue reading
The data integration process is traditionally thought of in three steps: extract, transform, and load (ETL). Putting aside the often-discussed order of their execution, “extract” is pulling data out of a source system, “transform” means validating the source data and converting it to the desired standard (e.g. yards to meters), and load means storing the data at the destination.
An additional step, data “enrichment”, has recently emerged, offering significant improvement in business value of integrated data. Applying it effectively requires a foundation of sound data management practices. Continue reading
I believe that early, effective big picture diagrams are key to application development project success. According to the old saw, no project succeeds without a catchy acronym. Maybe so, but I’d say no project succeeds without a good big picture diagram. The question: what constitutes a good one? To me good high-level diagrams have four key characteristics: they are simple, precise, expressive, and correct.
The well-publicized problems with healthcare.gov are disturbing, especially when we remember they might result in many continuing without health insurance. But it seemed a step in the right direction when recent a news report differentiated between “front end” and “back end” problems. The back end problems were data issues, like a married applicant with two kids being sent to an insurer’s systems as a man with three wives.
Coincidently, I recently responded to a questionnaire about health care data. I’ve paraphrased the questions and my responses below. Perhaps the views of someone who’s spent a lot of time in the health care engine room might provide some useful perspective. Continue reading
A technique for reporting requirements has emerged as the de facto standard in the business intelligence community. The technique, which emerged in the mid-2000s, is new enough to be as yet unacknowledged by the requirements analysis powers that be. David Loshin describes how it works in this 2007 post:
- Start with a business question about how to monitor a business process using a metric, like “How many widgets have been shipped
by sizeeach week by warehouse?” Continue reading
On two successive client assignments as a data modeler I’ve waited while client technicians wrestled with getting access to the ERwin Model Mart. In short, clicking on File, Mart, Connection, and logging in to the Model Mart failed every time, with various error messages. In both cases the teams lost literally months, in spite of active assistance from the CA help desk. No one involved, including me, could find on the web a list of actions to take to try and solve the problem, although there were a few hints scattered around (if we missed it, please add the link in a comment). Continue reading