This proposal was rejected. Page kept here for historic reference.
Proposal
I propose to remove things like task and round data from the database and place them in the wiki as plain text (key=value should work just fine, but YAML, a superset of JSON would rule). Data would be placed in --data| ... -- chunks in the text, which don't expand to anything in the processed html. Instead, you can ask the wiki parser for the structured data defined in the text.
Conventional web-development wisdom says to place everything in a database. SQL databases are designed for very fast storage and manipulation of large amounts of data, however when it comes to defining complex structured data you find that database tables are about as expressive as C structs (without pointers), and harder to use. A different serialization language can be better suited in some cases, and task/round definition are a very good example. There is no need for complex queries on these objects so the extra pains of mapping this data to SQL is not worthwhile.
Storing structured data in plain text doesn't really mean the data is harder to access and sort; you can always build a database from the things stored in text blocks and play with it. It also doesn't mean we won't have a task table, but it will consist of something like { task_name, wiki_page, .. various difficulty ratings }. See Rankings about my thought on a generalized rankings system.
Good points:
- We can define new task evaluator and round types with very little work and 0 UI. Sky is the limit.
- There would be no task/round view/controllers, task and rounds will be just normal wiki pages that also happen to define a task.
- The database context would be removed, and macros will restrict themselves to data in the wiki page or else database queries. This is cleaner.
- We could allow task/round definitions anywhere and make task/xxx and round/xxx a convention only.
- Great way to attach tags (for searching perhaps).
Bad points:
- Wiki pages will need serious validation.
- This goes against the web-development tradition of keeping everything in the database.
- The judge needs to parse wiki pages. A caching system might be required.
- We have to be careful or else turn wiki processing into a programming language. See the next section.
- Automatically modifying data is hard. Think about changing time limits when switching the judge. Perhaps a yaml parser that doesn't touch comments?
Sample round page:
==data|
round:
name: preoni2007a3
start-date: 2007-01-04
start-time: "9:00"
# Mircea: De ce nu de 3 ore?
# Vladu: De aia.
duration: 4h
live-eval: false
ranking: sum-of-latest-scores
problems:
- capsuni
- elefanti
- gigel
==
Runda 3 'preoni':Preoni2007 clasa a 9-a.
==RoundTaskList()==
==RoundRankings()==
Vezi clasamentul global pe 'pagina principala de clasament':Preoni2007/Rankings.
YAML
YAML is a nice language for serializing data in a human-readable format (much more readable than XML). It is a superset of JSON; every JSON document is a valid yaml document, but yaml is a lot more powerful. We don't have to use this, but it maps particularly nice to php arrays. There are 2 major php implementations:
- spyc is a tiny pure php implementation. It only implements a tiny subset of yaml, but we don't really need advanced things like anchors and tags.
- syck is a multi-language implementation written in C. It's faster, more complete and better maintained (part of ruby) but installing it is difficult (it's a non-standard extension in php).
All we really need is two functions: yaml_load and yaml_dump that transform php arrays to strings. We could place a copy of spyc in svn and only use syck if it's available; this way we don't make installation any harder.
Comments from Cristian
Short answer: not really a good ideea
While I agree the "magic pages" and "view contexts" are design issues that need be addressed, I don't think we're heading the right way with this proposal. It seems to me you're trying to make a web framework out of - what should be - simple wiki pages. However I suggest leveraging PHP/MySQL is best.
Here are some points againts the proposal. Please point out stuff I might have misunderstood. I'll happily stand corrected.
- An obvious question is why stick various task attributes inside a task statement (ro. "enuntul problemei")? Basic database rules say no entity attributes shall depend on other attributes that which are not primary keys.
- You've already highlighted the difficulty of updating YAML declarations. I'd like to emphasize this issue. When you take all these attributes out of the database and insert them inside wiki pages you suddenly need scripts and libraries to perform what would've been simple database operations. What happens when you need to migrate the schema, say drop or convert some attributes? How do you manipulate the "wiki database" at the data & meta-data (database schema) level when you're faced with hundreds of entities that share the same hard-coded schema? You need tools for that. You need to start with the wheel.
- I'd rather have hard-coded SQL schemas than hard-coded and duplicated YAML-encoded schemas
- Updating YAML inside static wiki text using an comments-preserving parser is an elaborate hack :)
- Allowing data declarations inside wiki pages isn't very helpful by itself. Static data is useless unless you expand it into meaningful model instances and their related parts. For instance, the view layer needs a round start-date in a native data type (unix timestamp) so it can display it (strftime) as it wishes. There's got to be some code that expands the YAML declaration into a "full-fledged" corresponding model (one that you normally get by, say task_get($task_id)).
- A possible work-around is to code a general "model factory" that spawns model instances out of yaml declarations. This hack merely masks the initial problem.
- Obvious performance issue and horrendous coding when trying to aggregate data from both mysql and yaml-wiki. Some query examples:
- task monitor queries: show me user submissions along with informations about tasks (name, title, author, task type ... )
- judge queries: count solution submissions for tasks inside active speed-contests that are next in line to be evaluated
- round queries: show me list of tasks (id, title, author, maxpoints, task type ...) that belong to this round
This is not about creating a cache. Cache only affects performance. Cache is something that you can drop at any moment. The judge or the website should not rely on some pre-calculated and synchronized SQL tables. Synchronization sucks and is not caching. Instead you should do FORs, IFs and hashes and along with frequent calls to wiki_extract_data() and yaml_decode(). You can fix performance with a transparent cache but you can't avoid the horrible code. What you used to do with natural SQL you end up doing in hard-coded PHP. It's like reinventing database, isn't it?
- Some objects are still bound to bits of data inside SQL rows. That's 2 storage places.
- The UI argument: I think we can design and implement better UI than you'd get by editing wiki macros inside textareas. Sure, we're coders and this is a coding website. We understand macros. I wouldn't mind however having a live-search/suggest when filling in the author field of a task. I wouldn't mind the UI displaying me what parameters I should specify for a given task type along with appropriate error messages. I tend to think a more hard-coded / straightforward UI is better albeit less flexible and harder to implement.
![[infoarena] development](/chrome/site/logo.png)