December 04, 2006

The Thick and Thin of a Paradigm Shift

If I had a time machine :-p, the first thing I'd do would be to go check out the Dinosaurs, alive and in all their spectacular glory. Then, certainly, I'd jump right into the future to see where we are heading with our development tools, platforms and standards :)

A change has already begun. Actually, change is continual in technology; however, change that shifts entire UI development paradigms are more rare. Microsoft's .NET initiative [1] is part of a change that is occurring in current software development paradigms. This change, I believe, supplements the slow, progressive changes that occur to software development paradigms every few years, but the one to come may have a more profound impact than some of the major shifts of the past. We are at the foothold of the next large step forward in software development, perhaps.

Lines are continually being blurred:

Blurred line	How?
Windows OS ↔ Other OS	Microsoft's .NET initiative¹ (like JAVA before it, but with support for multiple languages [2]) XAML (assuming eventual standardization)
Information ↔ Hypertext markup	XML Islands in HTML Stand-alone XML
Data storage ↔ Middleware	Sql Server 2005 hosts the CLR Sql Server 2005 Express² Conceptual programming techniques such as LINQ and ADO.NET Entities [3]
Addendum: Recently discovered article
Client/Server ↔ Peer-2-Peer	WCF PeerChannel [4] Windows Vista support for PNRP and PNM [4]
Rich-client applications ↔ Web applications	XAML ClickOnce (with CAS and Isolated Storage as protection measures)
Rich-client applications ↔ Services	Distributed programming with .NET Remoting and Web Services More recently, general SOA and WCF ASP.NET (e.g., Cassini [citation desired])

.NET applications

can execute on heterogeneous systems that support CLI standards [1]
will be able to execute with a GUI on heterogeneous systems (assuming XAML will be standardized)

Information

can be formatted and packaged easily
can be structured; structural information can be shared easily
can be aggregated with heterogeneous data easily
can be distributed between heterogeneous systems easily and in a standardized manner

Database management systems

can handle the workloads of application servers using the same tools and languages
are becoming more portable and efficient as the ability of hardware scales
can be programmed against by thinking conceptually, like its entities were designed
can import and export data from heterogeneous systems based on XML standards

Rich-client applications

can consume remote services
can provide public and local web services, and remoting services for distributed communication and collaboration
can host ASP.NET and a WebBrowser
can be deployed easily over the web
can be hosted in an RDBMS, adjacent to the data being used. Data itself can be pure XML markup, which is natively supported by the RDBMS
can be presented using markup (WPF, XAML)

I predict that as the ability of hardware scales, rich-client applications, in there entirety, will be service-oriented, data-driven, data-encapsulating, portable, secure, and interactive components that are part of a distributed framework of other rich-client component applications found on LANs and WANs, forming a network of peer-to-peer business intelligence systems. They'll fit nicely into Microsoft's OS and possibly other third-party OSs (Vista, having built-in support for Peer Name Resolution Protocol (PNRP) and People Near Me (PNM), PeerChannel in WCF [4], and .NET FCL such as for system networking [5] and ClickOnce [6], is only the beginning).

From Thin to Thick

I believe that Microsoft may be trying to secure their place as providers of a rich-client OS and software development platform by slowly spinning the current trends in web development back into rich-client development. This is a good thing for developers and end-users, IMO. But maybe I'm giving too much credit to Microsoft by assuming that this progression was intentional :)

Web development and standards are hard to create and enforce for a few reasons:

Hypertext markup, especially dynamic markup that includes scripting support, is complex
There are many browsers and each have their own implementations of complex presentation standards, which have problems of their own
Website developers tend to value aesthetics over ease of use and intuitive behavior and functionality
Website design is largely proprietary since standards only go as far as compatibility, not visual appeal
Because of the point above, visual designs frequently change; design trends under the guise of standardization

Websites are not ideal for business intelligence applications because

they have limited runtime ability due to non-existent scripting security standards
they have limited runtime ability due to inefficient languages and platforms (e.g., scripting)
they do not provide a rich, real-time user interface
they require network connectivity

Standardization

Totalitarianism isn't something I'm too fond of in government, but in software development and practice I'd prefer if the company creating the tools that I use would also enforce standards for using them. Standards ease interoperability, increasing the value of my applications. Interoperability relates to the new trend in software development, SOA (Service Oriented Architecture) since services are commonly distributed among heterogeneous systems (e.g., Web Services). So long as the standards are good ones, I favor them to no standards at all.

Microsoft produces and enforces standards in several ways:

The release of a new OS
The construction of new APIs
The authoring of documentation for standards and guidance
The creation of new training material and certifications for developers
Submission for standardization to ISO, ANSI and ECMA
Microsoft Open Specification Promise (OSP) [7]
Deprecation of legacy tools and support

And the community accepts shifts in software development paradigms, tools and support if Microsoft (and usually when Microsoft) proves that the new technology and standards will make development easier and provide a better overall experience for end-users. Out pours new books, seminars, user groups, newsgroups, training kits, on-line training, on-site training, certifications and standardization - and we're off to a new era in software development.

These shifts are common; a natural progression as new technology becomes available as a response to the demands of end-users and developers, but it's worth noting too that some technology doesn't survive long (DirectAnimation comes to mind).

So it seems that sometime in the future, hardware permitting, there will be a conversion from web programming trends into a standardized, distributed, service-oriented, conceptually-architected, "packaged", rich-client programming paradigm that spans operating systems from different vendors, with Microsoft in the driver's seat.

Buckle your seat belts :)

--
Appendix

¹[M]ultiple high-level languages may be executed in different system environments [1]
²Sql Server 2005 Express is on the foreground of data-encapsulation within rich-client programs by providing a pluggable data model that can execute .NET Framework code. If standardized, or if some Sql Server Express edition in the future becomes standard with a Microsoft OS installation, fully encapsulated rich-client applications will be distributable without the need for the classic separation of data tier, middleware and presentation (disregarding OOP design patterns and techniques, of course).

--
References

[1] The Common Language Runtime (CLR)
http://msdn2.microsoft.com/en-us/library/aa497266.aspx

[2] Understanding Enterprise Platforms
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnpag/html/jdni_ch02.asp

[3] The ADO.NET Entity Framework Overview
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnvs05/html/ADONETEnFrmOvw.asp

[4] Peer To Peer, Harness The Power Of P2P Communication In Windows Vista And WCF
http://msdn.microsoft.com/msdnmag/issues/06/10/PeerToPeer/default.aspx

[5] Windows Vista Networking for Developers (September 1, 2006)
http://msdn.microsoft.com/chats/transcripts/windows/06_0901_msdn_vista.aspx

[6] Windows Vista § Development technologies
http://en.wikipedia.org/wiki/Windows_Vista

[7] Microsoft Open Specification Promise
http://www.microsoft.com/interop/osp/default.mspx

As always, I'd love to hear from anyone that has something to say about this post. Drop me a comment.

standards | .NET | Microsoft | Sql Server

December 01, 2006

References, Citations, Paraphrasing and Plagiarism

There is a fair amount of research and work that can go into obtaining information when a poster responds to a newsgroup question. Information is commonly obtained through someone else's testing or research and is simply paraphrased or even quoted. I believe that's understood by the general population of newsgroup readers. So at what point does a poster's information become plagiaristic, if ever, when their sources aren't referenced in a newsgroup post?

It's no secret that a large majority of information found in newsgroups is based on the poster's memory of the subject matter. However, much of it can be attributed to actual fact-based sources such as books, articles and other newsgroup posts. Anything stated in a newsgroup post can easily be interpreted as fact even if it's not. Information posted without fundamental tests being performed by the respondent themselves are sometimes qualified with, "AFAIK" (as far as I know) and "IIRC" (if I remember correctly).

If it's just common knowledge that, as for unreferenced information found on newsgroups, the testing required to have derived it is not of the respondent's effort but instead the effort of unreferenced sources, then what if the information derives purely from the respondent's own testing and experience? Posters would only get credit for having derived their own knowledge if they explicitly state that the information provided by them was acquired solely through personal testing.

Maybe posting based on personal testing is uncommon enough that it's fair to just assume that posters have retrieved their information from other sources or just have so much experience with the subject matter that testing isn't required. Since much or all of the information of the subject matter is coming off the top of their head, to enable quick responses, it might not be fair for OPs to expect references without explicitly asking for them. In that respect, newsgroups are treated more like personal conversations.

Syndication

So this brings me to the point of syndication of USENET content and how what was once simple, off the top of your head opinion being expressed in a conversation between one or more thread correspondents, may become reference material much like books and articles to the possibly millions of users that search groups.google.com every day [1].

In school we learn about plagiarism and how bad it is, but we do it all the time in computer science. In a way, there's just so much information out there that it's used on-demand, like children with our hands in a jar full of jelly beans. It might not make sense to reference the sources of all information provided in a newsgroup, web article, or even books. Paraphrasing, to that point, is commonplace in newsgroups because it's just too difficult to locate all sources of information when speedy responses are preferred and expected (although not all OPs expect speedy responses. Especially the ones that are familiar with newsgroups).

Usefulness in references

For one thing, quotations and fact-based opinions found in posts are usually not the entire story anyway. References provide to readers a more complete description and reasoning, in its original context. Without these references, the information being provided out-of-context may be easily misinterpreted, and then even worse, assumed to be accurate and complete. People who prefer newsgroup posts to be short can't have it both ways. Either you need to provide a reference to the complete information, or provide the complete information within the newsgroup post itself. This reduces ambiguity in replies, providing more accurate information to readers. I prefer a reference link over a long post where the source itself is easy to understand or interpret. If that's not the case, I believe a fuller description in the post may be in order. There's also the idea that any missing references or incomplete information will be supplemented by another respondent, but I doubt that possibly ignorant expectation disqualifies plagiarism. I'm sure in many cases that OPs have read the references already but simply wanted clarification. I'm not sure how or if that situation particularly applies to this topic, however.

References also enable readers to perform further research on their own, given a good place to start. Without references, readers are forced to perform their own searches but in many cases they don't possess the necessary skills to perform Internet research without a helping hand, which is why they may have come to newsgroups in the first place.

In consideration of time and memory

To relax my arguments a bit, I must say that I don't expect everyone to always reference their sources in newsgroups. If you're not performing research but instead answering from memory, then I think it's reasonable if you don't always reference your sources, but it's certainly desirable for the reasons I've listed above. If you invest the time to find a good source that agrees with the information you are providing, I'm sure it will be much appreciated by readers.

Format and placement

Outdated links floating around in USENET posts are just clutter, however a reference list appearing at the bottom may prove to reduce some of the clutter found in newsgroup posts instead of in-line referencing. Long after reference hyperlinks become invalid, the information found in a post may still prove to be useful, but having an invalid link right between two informative paragraphs or sentences reduces the readability of the post in which it's referenced and serves no useful purpose since the link no longer works.

Proposal

Are newsgroup posts supposed to be small enough where citations and references (as opposed to in-line referencing, on occasion) aren't expected or doesn't need to be standardized?

I wonder if it's a lack of standardization or precedence that most respondents feel like it's unnecessary (or just don't bother) to reference and cite the sources of their information, where applicable. So here's my proposal for a simplified, standardized idiom for citing and referencing within newsgroup posts.

If anyone is aware that these standards or anything similar exists already, I'd love to see some links to your sources submitted as comments, please.

Newsgroup citation and referencing standardization based on [2,3]:

Considerations

Maximum length of lines in characters (based on common newsreader capabilities)
International character sets (e.g., reference to a book title published only in French)
Should we reference the author? publisher? dates?
How do we reference sections in web articles? pages in books?

Rules

Citations in posts should use the same standards as specified by [2]. For example, the [2] that I've just used to cite [2] :)
Respondents should post a list of references after their entire signature. Here is an example reference list, appearing after my signature, that could be included in newsgroup posts that cite these resources (also serves as the reference list for this blog entry):

--
Dave Sexton

[1] Search Engine Watch, Searches Per Day
http://searchenginewatch.com/showPage.html?page=2156461

[2] IEEE style documentation
http://www.ecf.toronto.edu/~writing/handbook-docum1b.html

[3] IEEE style edition § Web Page
http://www.ecf.toronto.edu/~writing/bbieee-help.html#wp

[4] IEEE style edition § Individual Author
http://www.ecf.toronto.edu/~writing/bbieee-help.html#indiv-auth

And here's an example book reference based on [4]:

[5] J. Writer, Computer Science, Reading-Material Press, 1996.
pp. 78-96: Artificial Intelligence