We just ended the founding meeting of a Machine Learning Study Group here in Copenhagen. We're planning to meet approximately once a month to discuss machine learning algorithms. In between meetings there's a Facebook group where we hang out and discuss our current pet machine learning problems and a Google Docs repository of machine learning resource (still to come).
Here's how we're going to run the group: The goal is to get smarter about machine learning. We're narrowing down on a couple of stacks we care about. Big data using databases - we're going to have a look at Madlib for Postgresql. Or something based on Map/Reduce. Some of us occasionally work in Processing using OpenCV - which is neat, because it's also a good choice on the iPhone/in XCode/Openframeworks.
Some of us are probably going to be looking at Microsoft Solver Foundation in C# at some point. Some of us might be looking at implementing algorithms for one of these environments if no one else has.
We'll be building out our knowledge on configuring and setting up these stacks. The idea is for everyone to work on a pet problem with a machine learning element. We'll share our ideas on how to proceed, share how to approach the stack of software.
The typical approach to a machine learning problems involves
Just to give a little landscape: The founder members are primarily interested in the following spaces of problems: Image analysis - with a little time series analysis of sensor data thrown in for interactive fun - and "big data", running machine learning algorithms on web scala data.
If you're into this problem space, and this style of working - project driven, collaborative and conversational - feel free to join. If there's simply something you'd like to be able to do with a computer, feel free to join as well - maybe one of the more technical hangers on will take an interest in your idea and start working on it.
Whether or not its RESTian kosher I don't know, but the live Apollo 11 transcript is built natively around a JSON API. If you don't like mine - feel free to make your own.
Details: The API is at
http://www.classy.dk/cgi-bin/apollo_transcript.pl?q=transcript&jsonp=somejsonp_prefix
http://www.classy.dk/cgi-bin/apollo_transcript.pl?q=transcript
If something is unclear just view source on http://classy.dk/moon - which explains in source how to use it with JQuery, and gives tables describing the IDs of speakers.
This way of building websites: Presentation with HTML+CSS+JQuery and a pure API backend really appeals to me. Clearly there are fallback concerns for non-modern browsers etc. etc. but the separation of concerns is appealing - as is the instant availability of an API for other's to use.
A quick rundown of the various physical interfaces I have for sensing information from the real world
In short: Lots. I am going to build a compendium of how to talk to these things from environments I care about, which are mainly Processing, Puredata and then some previously nonexistent time series glue that I might have to write on my own.
Shoot 5 seconds of bland video, use virtualdub to split into images, run images through potrace, use virtualdub to resequence to video and end with this (AVI). Yes, it's a work in progress. I have hopes of being able to extract parts (e.g. the laptop in this video) and composit with other stuff and put it back together again - but we'll have to see.
Inkscape uses potrace combined with pre-trace color separation and gets good results from that.
Thomas Angermann rapporterer fra biblioteksstyrelsens og DBCs, det firma der laver det meste biblioteksteknologi herhjemme, respons på visse udfordringer, primært gennem projektet Brugernes Bibliotek. Svarene virker decideret nedladende på mig, Og iøvrigt misforståede i samme retning Angermann antyder. Desuden afvises kritikken fordi den kommer fra "ikke-brugere" af bibliotekerne. For min egen del (jeg har også pippet lidt med i kritikken) må jeg sige at det udsagn er absurd. Jeg har været en hyppig gæst, i barndommen stort set daglig gæst på kommunebiblioteker, skolebiblioteker, forskningsbiblioteker, institutbiblioteker og nu digitale biblioteker siden jeg lærte at læse. Jeg bruger dem mindre nu - men det er fordi de ikke giver mig de svar jeg leder efter.
For det andet er det lidt for selvgodt et argument: Selvfølgelig synes de brugere biblioteket har at det er besværet værd - eller var de jo holdt op med at bruge biblioteket. Men som softwareudvikler kender jeg skam godt argumentationen. Det er et superbrugerargument. Man designer til de der viser den største interesse for ens arbejde. Det er altid dem der er glade for ens arbejde - altså dem, der har vænnet sig til hvordan tingene er lavet. Det er næsten en naturlov at der ikke kommer nogen udfordringer fra dem. Til gengæld kan man være sikker på at jo mere man retter sig mod disse brugere, jo mere skubber man alle de andre fra sig.
- Til gengæld må man sige at Biblioteksstyrelsen tager faklen op. I et stykke tid har man kunnet få et drop af hele nationalbibliografien, altså det overordnede indeks for bibliotekernes materialer til fri eksperimenteren. Det er en lille smule uklart hvordan setuppet er. Man må ikke uden videre drive en konkurrende bibliotekssøgetjeneste. DBC er en kommercielt drevet virksomhed - er det meningen at de skal hoppe ind bagefter og hæve gevinsten af "ikke bruger nørdernes" ulønnede ekspertise? Det vil tiden vise. Jeg har ihvertfald bedt om min kopi.
We had fun with DBPedia the other night - but DBPedia is still a little confusing and rough around the edges (no snarkiness here - I think the project members think so too). I got an illustration of this when I had a look at the property set within DBPedia, the results of which are here. It was just a quick naive survey: What are the properties I can query and how distinctive/useful are they. Turns out most of the DBPedia set of properties are project local and, as far as I can tell, so far have very little structure other than being properties. Places and people have received a little modeling love, so that names, geolocation, birth and death make a little more sense than the rest of the data.
I think this should temper the the semantic web is here optimisim just a little bit. It is indeed nice to be able to filter by infobox-properties and to project down to specific properties - but it is hardly the arrival of another world just yet.
There's a lot of fun to be had coming up with discovery tools though - and for that reason alone the DBPedia project is great.
It's all about the "data => tools => data => tools" virtuous circle.
Morten har den bedste omtale af DBPedia hackeaftenen. Hyggeligt, forvirrende og oplysende. Det blev klart for os at DBPedia i sin nuværende form er lidt flosset i kanterne, men at potentialet er der.
Det er også sandsynligt at DBPedia vil give anledning til en masse datarens og konsistenschecks og det er jo heller ikke så skidt.
Vi fik nogen ideer til nogen hurtige tricks man burde lave med DBPedia for at gøre det lidt nemmere at undersøge datasættet og med lidt held er der også nogen af os der får gjort noget ved ideerne...
Som man kunne læse forleden her på kanalen, så er der kommet en "semantisk" udgave af Wikipedia, DBPedia - en gigantisk samling af RDF assertions baseret på Wikipedias ganske omfattende data. Jeg ved alt for lidt om RDF og SPARQL og det semantiske web i praksis. DBPedia er en fremragende anledning til at få gjort noget ved det. Morten, som ved en masse om RDF, har heldigvis lovet at være guide til en
DBPedia hack-aften.
på ITU kl 20.00
d. 24/4 (det er en tirsdag)
Lokalet er såvidt jeg husker Marie Curie mødelokalet på 5. sal.
ADGANG: KA godt blive noget rod med at I skal lukkes ind af mig af en sidedør. Ring 22 90 18 86 hvis det fejler. Jeg prøver at skilte.
Program: Tag din laptop med.
Morten giver en DBPedia baseret RDF og SPARQL intro
Vi undersøger hvad man kan få ud af DBPedia sådan hands-on
Vi diskuterer goe interfaces til/anvendelser af de data der er
Vi fortsætter til vi ikke gider mereImity lægger hus på ITU - Jeg sørger for kaffe - og at der er en Linuxboks med fornuftige tools og en kopi af datasættet og iøvrigt netværk nok til de ca 10 mennesker vi har plads til.
Hvis du har lyst til at være med, og forslag til en bedre dato så kom bare an i kommentarsporet på den her post.
I have been looking a little at CSound, because I wanted to do some musi-mathematical investigations and text formats always make for nice accessibility. A text format however is no guarantee for readability. Csounds looks like what you get if you try to construct a programming idiom without any knowledge of other programming languages. I know that sounds a bit harsh and I do think there are likeable features but there are so many strange things that are just unsoftwarelike in the language. Have a look at the sample in the Wikipedia entry for CSound, as an example. Let's begin to enumerate the strangeness: