Thursday 26 October 2017

Fortschreitende Vorhersage Modell Nachteile


Es folgt nun eine alphabetische Liste einiger häufig verwendete wordsphrasesabbreviations, die man in der Newsgroup (oder über andere wetterbezogene Seiten im WWW) sehen könnte, die aber nicht leicht verstanden werden kann. Die Liste wurde in Zusammenarbeit mit Paul Bartlett, David Reynolds und vielen anderen vorbereitet: Ich würde gerne Vorschläge für die Aufnahme begrüßen, aber ich werde auch die Beiträge in der Newsgroup scannen und wenn ich eine wordphraseabbreviation sehe, die Probleme verursacht, werde ich das auch enthalten. Wo eine längere Erklärung erforderlich ist, wird ein QA in der FAQ verarbeitet und nachfolgend erwähnt. Für die professionelle akademische Gemeinschaft werden diese Begriffe nicht streng genug sein, aber ich bitte um etwas Verständnis in dieser Hinsicht, da die Liste die Idee eines Konzepts, Prozesses etc. vermitteln soll, so dass die Gelegenheitsleser in der Newsgroup behalten können Mit Diskussionen, ohne sich zu tief in ein meteorologisches Lehrbuch vertiefen zu müssen. Wenn ich aber einen grundsätzlichen Fehler gemacht habe, dann lassen Sie mich mit allen Mitteln einen korrigierten Eintrag zur Prüfung haben. Glossar Begriffe A-F Absolute Dürre Eine Periode von mindestens 15 aufeinanderfolgenden Tagen, von denen keiner von 0,01 Zoll 0,2 mm oder mehr Niederschlag gutgeschrieben wird. (Siehe auch Dürre partielle Dürre Trockene Zauber) Absolute Vorticity (oder totale Vorticity) von Luftpartikeln an einem bestimmten Punkt besteht aus zwei Elementen: (i) auf der rotierenden Erde nimmt die Luft die lokale Vorticity aufgrund der Erden Festkörper-Rotation Um seine Pol-zu-Pol-Achse, die breitenabhängig ist und als Coriolis-Parameter bekannt ist. Dies erhöht sich auf ein Maximum über die Pole und verringert sich am Äquator auf Null. Der Coriolis-Rotations-Sinn ist immer positiv (oder null). (Ii) das andere Element ist bekannt als die relative Vorticity, die Spinneigung von Luftpartikeln aufgrund ihrer Bewegung relativ zur Erde - angetrieben durch atmosphärische Kräfte. Relative Vorticity kann entweder positiv (zyklonisch) oder negativ (antizyklonisch) sein. (Siehe auch: Vorticity Relative Vorticity.) (Abbr) Altocumulus (AC für METARSIGWX Charts etc. Ac sonst) ein mittlerer Level, Layer Cloud Typ, aber mit unterschiedlichen Ebenen der Instabilität assoziiert - das extreme Ereignis ist ACCAST. (Alto-Q-Cast-Ziffer) (Altocumulus castellanus oder castellatus die richtige Abkürzung für das Tagesregister ist Ac cas) Dies sind mittelschwere Wolken (zwischen circa 8000 und 18000 ft 2,5 bis 5,5 km), die zumindest in ihrem oberen Teile, markierte kumulative Turmaussehen - die konvektiven Türme sind oft größer als die Breite der einzelnen Basis die Basen verbunden sind und oft erscheinen, um in Linien oder Straßen angeordnet werden. Diese Wolken sind ein guter Indikator für mittlere Instabilität und hoher Feuchtigkeitsgehalt - und oft der Vorläufer für weit verbreitete Thunder-Aktivität innerhalb der folgenden 24 bis 48 Stunden. Beobachter, die zur SYNOP-Codierung verwendet werden, können sich auch auf diese (zusammen mit Altocumulus floccusAc flo) beziehen, als CM8-Wolken. Ein Prozess, bei dem Temperaturänderungen in einem System auftreten, ohne dass Wärme an dieses System geliefert oder verloren geht. In der Meteorologie, die in Verbindung mit Veränderungen mit Luftpaketen verwendet wird, die sich vertikal in der Atmosphäre bewegen. Wenn Wärmeaustausch beteiligt ist, ist das Verfahren nicht-adiabatisch (oder diabatisch). Adiabatische Rauschrate Die Änderung der Temperatur eines Luftpakets steigt (Kühlung) oder absteigend (Erwärmung) adiabatisch. Für trockenen (d. H. Ungesättigten) Aufstieg oder Abstieg beträgt die Rate 10 ° C für feuchte oder gesättigte Prozesse, sie variiert aber ein nützlicher Durchschnitt beträgt 5 oder 6 ° C. Die Übertragung durch horizontale Luftbewegung von Hitze, Feuchtigkeit (oder Feuchtigkeit), Impuls etc. Die Atmosphäre auf allen Ebenen ist in der Regel in irgendeiner Form von Bewegung zu den meisten Zeiten, so ist es notwendig, Bereiche von signifikanter Advektion zu identifizieren. Für die niedermodernen Troposphären-Dickenprodukte (z. B. 500-1000 hPa) werden häufig verwendet. Feste Partikel in der Luft suspendiert. Dazu gehören Staub, Salzpartikel, Verbrennungsprodukte etc. Diese sind sehr wichtig für die Bildung von Wassertröpfchen oder Eispartikeln in der Atmosphäre, die als Keime für die Kondensationssublimation wirken. Ageostrophische Effekte Ein umfangreiches Luftvolumen mit gleichmäßigen physikalischen Eigenschaften in ähnlichen Höhen in Bezug auf Feuchtigkeit und Temperaturstruktur. Der Anteil des Sonnenlichtes (nach Durchgang durch die Atmosphäre), das von einer Oberfläche reflektiert wird (z. B. Meer, Wolkenkratzer, Eis usw.), ausgedrückt als Bruch oder Prozentsatz des auf diese Fläche fallenden einfallenden Lichts. Die Wolken haben eine sehr unterschiedliche Albedo, abhängig von Dicke und Zusammensetzung. Alter Schnee ist etwa 55 (oder 0,55), neuer Schnee um 80 (oder 0,8). Die Wasserflächen variieren von sehr niedrigen (etwa 5 oder so) bei hoher Sonneneinstrahlung bis zu mindestens 70 (0,7) bei niedrigen Sonnenwinkeln: sehr glatte Wasseroberflächen mit einem niedrigen Sonnenwinkel können das Phänomen verursachen, das als Sonnenlicht bezeichnet wird Manchmal gesehen in sichtbaren Satellitenbilder, wo der Albedo-Wert sehr hoch ist. (Abbr) Über dem mittleren Meeresspiegel. Anasatischer Wind Ein lokaler Wind, der nach starker Erhitzung der Hügelmount-Seite durch die Sonne aufblies, Solche Hochwinden können manchmal Nebelstratus im Talboden (gebildet nach einer kalten Nacht) zu Hochlandgebieten ziehen, die vorher von diesen Phänomenen klar waren. Flugplätze, die zuvor klar waren, können sich nach diesem Effekt plötzlich nach dem Sonnenaufgang wieder verschmelzen. Wenn die warme Luft relativ zur kalten Luft an einer Stirnfläche aufsteigt, wird die Vorderseite als Ana-Front bezeichnet. Solche Fronten sind normalerweise aktiv, in dem dicken Niederschlag, der Wolken bildet (möglicherweise mit eingebetteter Instabilität), sind gewöhnlich in der warmen Luft gelegen, die mit einer warmen und kalten Front verbunden ist. Antizyklone Ein Druckmerkmal, bei dem ein Höchstdruck von relativ niedrigeren Werten umgeben ist. Auf einem synoptischen Diagramm wird ein System von geschlossenen Isobaren gefunden, das die zentrale High einschließt. Die Zirkulation (des Windes) ist im Uhrzeigersinn in der nördlichen Hemisphäre (gegen den Uhrzeigersinn in der südlichen Hemisphäre). Die Struktur solcher Merkmale in der Vertikalen ist, dass eine hochrangige Konvergenz, die mit einem sanften Abweichungsabfluss an der Oberfläche gekoppelt ist, zu einem Abfall (oder einer Absenkung) von Luft innerhalb des Antizyklons führt, was wiederum zu einer Abnahme der Feuchtigkeit und einer Erhöhung der Stabilität der Luft, oft eine Inversion in der Nähe der Oberfläche. Zwei Arten sind definiert: kalt und warm sehen Cold Antizyklon Warm Antizyklon. Antizyklon-Trog-Unterbrechung Wenn der nördliche (südliche in der südlichen Hemisphäre) Teil eines oberen Trogs sich nach vorne bewegt und erwärmt, so dass ein quasi-stationärer Abbau niedrig in der Basis des Troges bleibt, wird der Prozess als antizyklonische Trogunterbrechung beschrieben - weil Das Nettoergebnis ist ein starker Aufbau von pressurenew hohen Zellenbildung hinter dem zurückziehenden Trog. (Siehe auch Zyklon-Trog-Störung) Arktische Oszillation Die nördliche Hemisphärische breit angelegte Oszillation des Langwellen-Typs, von der die Nordatlantische Oszillation (NAO) unsere regionale Komponente ist. (Vgl. Hier) (abbr) Altostratus (AS für METARSIGWX-Diagramme etc.) Andernfalls ist ein mittelgroßer, Schichtwolken-Typ, der durch eine breit angelegte Aufwärtsbewegung in der Troposphäre gebildet wird, die sich von der dünnen, nicht präzipitierenden Art unterscheidet, durch die der Sonnenmond kann Gesehen werden, zu dicken Schicht (en) mit frontaler Entwicklung verbunden, die anhaltende, signifikante Niederschlag, Ampere moderate In-Flight-Turbulenz Ampere Vereisung. In den vielen Jahren, in denen sich die operative Meteorologie entwickelt hat, wurden bestimmte Stunden als synoptische Stunden bezeichnet und Beobachtungszeiten um diese Punkte standardisiert, wobei die MAIN-synoptischen Stunden derzeit 00, 06, 12 und 18 UTC (früher GMT) mit Zwischenstunden bei 03, 09, 15 und 21 UTC. In zunehmendem Maße werden jedoch Beobachtungssysteme (z. B. Satellit, Radarnetze, Driftbojen etc.) Daten zu anderen Zeiten als diesen festen Stunden zur Verfügung gestellt - diese werden als nicht oder asynoptische Beobachtungen bezeichnet. NWP-Modelle können diese Beobachtungen während des Initialisierungsprozesses assimilieren. Abkürzung für die Luftfahrt stehen. Ein Begriff, der angewendet wird, wenn sich eine voraussichtliche Windrichtung im Gegenuhrzeigersinn ändert, d. h. von Süden zurück nach Nordosten, über Ost. Der entgegengesetzte Begriff ist veering, also Südwesten 4, der nach Norden 5 oder 6 fährt, würde einen Wind bedeuten, der ursprünglich 4 aus dem Südwesten zwingt, eine nördliche Kraft 5 oder 6 am Ende der Prognoseperiode, die über Westen verläuft. Baroklin Die Temperatur entlang einer konstanten Druckfläche (z. B. 500 mbar) variiert einen Dickengradienten. Der Grad der Baroklinizität ergibt sich aus dem Produkt der Schicht thermischen Wind (q. v.) und dem Coriolis-Parameter. Für praktische Zwecke ist die Stärke des thermischen Windes allein ein guter Führer. Baroklinisches Blatt Ein langgestrecktes Wolkenmuster, das in der Strahlstromzone gebildet wird, die mit einer deutlichen Baroklinizität (d. h. starker thermischer Kontrast) verbunden ist. Die Grenze (in Satellitenbilder) auf der polaren Luftmasse Seite der Entwicklung ist gut definiert, und hat das Aussehen einer ausgedehnten quotSquot Form. Die nachgeschaltete Luft-Masse-Kante ist weniger ausgeprägt. Dieses Merkmal repräsentiert die anfängliche (oder frontogenetische) Stufe einer Systementwicklung, sicherlich in der Mitte-Troposphäre und oft (aber nicht immer) an der Oberfläche. Nicht alle baroklinischen Blätter führen zu einer ausgeprägten Cyclogenese, obwohl sie die erste Stufe davon sein werden. (Siehe auch Trockenintrusion). Baroclinic Zone Bereich, wo es einen deutlichen Kontrast zwischen kalten und warmen Luftmassen gibt. Kann auf einem Dickendiagramm durch eine Zerlegung zusammengestellt werden, die zusammen mit den Konturen der Dicke (q. v.) In der Regel assoziiert auf einem msl-Diagramm mit klassischen Fronten, und daher ein Bereich für die potenzielle Zyklon-Entwicklung. Barotropischer A (theoretischer) Zustand, in dem Oberflächen von konstantem Druck und konstanter Temperatur auf allen Ebenen zusammenfallen. Die Atmosphäre kann die Entwicklung nicht aufnehmen, und die Dicke der Dicke (q. v.) ist Null. Wenn Dickenkonturen weit beabstandet sind (der realistische Zustand), so heißt die Atmosphäre quasi-barotrop. (Abbr) Patches (wie bei BCFG in METAR-Codierung, also Nebelschwaden). Beaufort Windskala Diese Skala wurde ursprünglich von Francis Beaufort (später Admiral Sir Francis, Hydrographer der königlichen Marine) entworfen, der von 1774 bis 1857 lebte. Er hatte eine sehr aktive Marinekarriere und interessierte sich auch von einem frühen Stadium der meteorologischen Beobachtungen über Wasser . In 1805 Ampere 1806 erarbeitete er eine Skala für seinen eigenen Gebrauch, die auf der Menge der Leinwand basierte, die ein Segelschiff unter den gegebenen Bedingungen tragen konnte. Die Skala unterzog sich verschiedenen Modifikationen und wurde erst im mittleren Teil des 19. Jahrhunderts in den allgemeinen RN-Einsatz eingeführt, aber danach gewann sie schnell eine weltweite Akzeptanz. Es wurden jedoch verschiedene Versionen entwickelt. Und im Jahre 1906 versuchte das britische Meteorologische Amt, den Gebrauch zu koordinieren und lieferte gleichzeitig die ersten endgültigen Windgeschwindigkeitsäquivalente für jedes Kraftniveau, und seit 1920 wurde die Skala verwendet, um prognostizierte Windverhältnisse in der Versandprognosen für Gewässer im NE AtlanticNW Europa Festlandsockel. Für eine Beschreibung der aktuellen Skala im Einsatz, siehe zetnet. co. uksigsweatherMetCodescodes. htm (abbr) quotBecomingquot, verwendet in Flugplatz (TAF) Fragen und andere. Eine permanente Änderung der Bedingungen. (Abbr) quotBrokenquot, 5 bis 7 Oktas (Achtel) der Wolkendecke. (Abbr) quotBlowingquot, verwendet in Verbindung mit Schnee, Sand etc. für die UK Met Office nur - andere Dienstleistungen haben unterschiedliche Kriterien und die Definition hat sich im Laufe der Zeit geändert hat es nicht immer so streng definiert. Das gleichzeitige Auftreten von moderaten oder starken Schneefall Mit Winden von mindestens Kraft 7, verursacht treibenden Schnee und Verringerung der Sichtbarkeit auf 200 m oder wenigerquot. (Mäßiger Schnee soll auftreten, wenn die Sichtbarkeit erheblich beeinträchtigt wird und die Schneedecke in einer Tiefe von bis zu etwa 4 cm pro Stunde ansteigt. Schwerer Schnee sollte die Sicht auf einen niedrigen Wert reduzieren (in den niedrigen Hunderten von Metern) , Und die Schneedecke nimmt mit einer Geschwindigkeit von mehr als 4 cm pro Stunde zu.) Blockiertes Muster Große Skala Obstruktion der normalen West-Ost-Progression der Oberflächen-Zyklone in der Mitte der Breiten. Die obere Strömung wechselt von überwiegend zonal (q. v.) zu meridional (q. v.). In einem meridionalen Block teilt sich die obere Strömung den Wind des Blocks und fließt quasi-stationäre Wirbel - eine Antizyklonik und die andere Zyklonik. Im Omega-Block-Fall wird die stärkste Strömung in niedrigere Breiten umgeleitet, so dass ein langsamer antizyklonischer Wirbel auf der Pole-Flanke des versetzten Zonenflusses zurückbleibt. Blockieren des Antizyklons Acellulares Muster des Hochdrucks in den mittleren Breiten, die die normale West-Ost-Bewegung der Vertiefungen (Bereiche des Niederdrucks) umleiten oder verhindern. Ein Name, der auf Mid-Breitengrad-Depressionen angewendet wird, die heftig vertiefen. Der Begriff wurde von Sanders amp Gyakum (US Monthly Weather Review) geprägt, in einem 1980er Papier, der sich mit solchen Ereignissen befasst, und erfordert einen Druckabfall in das Depressionszentrum von 24 hPa (oder mbar) oder mehr in 24 Stunden bei 60 Grad für die Name anzuwenden. Bei der Breite 45degN ist der erforderliche Wert 19hPa und bei 55degN, 23hPa. (Siehe Explosive Cyclogenese.) Grenzschicht In der Operationssynoptische Meteorologie ist dies meist die Schicht am Boden der Atmosphäre, in der die Oberflächenreibung wichtig ist. Es kann in der Tiefe von so wenig wie 100 m oder weniger auf einer noch kalten Nacht bis zu 1 km oder mehr in einer windigen, gut gemischten Situation variieren. Auch verschieden bekannt als die Mischschicht oder Reibschicht und ist eine Funktion der Windgeschwindigkeit, des vertikalen Temperaturprofils (d. h. Stabilität) und der Oberflächenrauhigkeit. (NB: Mikrometeorologen betrachten die Grenzschicht als die ersten wenigen cm der unteren Atmosphäre und dies kann zu Verwirrung beim Lesen einiger Texte führen.) Nebel (abbr. Aus französisch): gt1000m und nicht mehr als 5000m, wo Verdunkelung verursacht wird Wassertropfen in Suspension Verwendet in Luftfahrtberichte, Prognosen etc. Heller Bandeffekt Wenn der Schnee durch das schmelzende (oder gefrierende) Niveau sinkt, sehen die schmelzenden Schneeflocken wie riesige Regentropfen aus, die das Radarreflexionsvermögen erhöhen, was starker Niederschlag bedeutet, als es tatsächlich vorkommt. Korrekturen können angewendet werden, sofern das Kalibriersystem Kenntnisse über das vertikale Temperaturprofil hat. Der Effekt ist gewöhnlich auf eine Schicht von etwa 1000 ft (300 m) dick beschränkt. Kauft Stimmzettelgesetz Wie ursprünglich formuliert, wenn Sie mit dem Rücken zum Wind stehen (auf der nördlichen Hemisphäre), dann liegt der Niederdruck auf der linken Seite. Für die südliche Hemisphäre ist der Niederdruck auf der rechten Seite. (Persönlich fühle ich, dass wir den Wind sehen sollten, um zu sehen, was kommt, also würde ich diese umkehren, d. H. Den Wind, den Niederdruck auf das Recht usw.). Prof. C. H.D. Buys Ballot (1817-1890) war ein berühmter niederländischer Meteorologe, der 1854 das Royal Netherlands Meteorological Institute (KNMI) gründete und eine wichtige Rolle bei der Gründung der ersten internationalen Organisation für die Meteorologie spielte - nach dem Eröffnungssitzung des Internationalen Meteorologischen Kongresses in Wien 1873. Allerdings ist er jetzt am besten für das Gesetz bekannt wie oben ist es möglich, dass der Kredit für eine rigorose Behandlung der Physik könnte Alexander Buchan, lang-dienen (1860-1907) Sekretär der schottischen Meteorologischen Gesellschaft gutgeschrieben werden. Cloud-to-Air Blitz Blitz Wird bei der Beschreibung von Blitz verwendet, der von einer Cumulonimbus-Wolke abzweigt und in klare Luft endet. Dies ist eine ungewöhnliche Art von Blitz (siehe auch CC. CG amp GC) Kalibrierung (Radar) Neben der Entstörung für permanente Echos und andere Einstellungen werden die Rückstandsradar-Rückläufe in Echtzeit in einem Netzwerk von kalibriert Telemetering regenlehren. Dies bedeutet natürlich, dass, wenn die Rückkehr über Gebiete ohne Regenmessgeräte (z. B. das Meer) ist, ein Überlesen stattfinden kann, und es ist Vorsicht geboten, in blind folgendem Radarbildschirm zu zählen, um die Ausfallratenkalkulationen aus diesem Grund zu beurteilen. Konvektiv verfügbare Potentiale Energie Ein Maß für die Energie, die nach der Konvektion freigesetzt wird, wird oft von der Oberfläche (für hohe Werte) initiiert, aber auch die mittlere konvektive Initiation ist sehr wichtig. Auf einem thermodynamischen Diagramm (z. B. einem Tephigramm) untersucht, indem man die von der Umgebungskurve umschlossene Fläche (d. h. die tatsächliche Temperatur, die durch eine Radio-Sonde gefunden wird) und die Paketwegkurve, wo sie die Umgebungskurve in der Höhe schneidet, Es wurde ausgiebig in schweren konvektiven Sturmstudien verwendet, obwohl es sich bemerkte, dass nur, weil hohe Werte von CAPE beobachtet werden, andere Faktoren für ein schweres Sturm richtig sein müssen. (Siehe auch HIER) Clear Air Turbulence Bumpy Bedingungen in der oberen Atmosphäre, wenn keine Wolken vorhanden sind, um die Möglichkeit solcher zu verraten. Verursacht durch scharfe vertikale und horizontale Scherung von Wind, oft (aber nicht ausschließlich) in Verbindung mit oberirdischen Strahlströmen (siehe "Wie sind Jetstreams"). Kann auftreten, oder durch Mountain-Wave-Aktivität verbessert werden. Erhaltung der absoluten Vorticity - Das Prinzip, das von Carl-Gustav Rossby in den 1930er Jahren umrissen wurde, was für die Tendenz der oberen atmosphärischen Strömung verantwortlich ist, ein wellenartiges Muster aufzunehmen. Die Theorie kann verwendet werden, um die Wellenlänge und die Geschwindigkeit der Übersetzung der in der Atmosphäre gefundenen Langwellen vorherzusagen, die wiederum die breite Wetterart an einem beliebigen Punkt bestimmen. C eiling A nd V isibility OK quot: Nein CB, keine Wolke mit Basis lt 1500m5000ft oder unterhalb der höchsten minimalen Sektorhöhe, je nachdem, welcher Wert größer ist, Sichtweite 10km oder mehr amp kein Wetter von Bedeutung DZ, RA, SN, SG, PL , IC, GR, GS, FG, BR, SA, DU, HZ, FU, VA, PO, SQ, FC, DS, SS. Und varianten (siehe dieses glossar für decodes). CB (streng Cb) Abkürzung für Cumulonimbus, die Wolke Typ mit einem Gewitter verbunden, wenn der obere Teil der Wolke Vereisung zeigt (unterkühlte Wassertropfen Umwandlung in Eiskristalle). Im Großen und Ganzen gibt es zwei Arten: Cumulonimbus calvus (Cb cal) und Cumulonimbus capillatus (Cb Cap). Erstere wird verwendet, wenn die Vereisung gerade erst begonnen hat und oftmals der Beginn der aktivsten Entwicklungsphase ist - der Übergang von Cumulus congestus (Towering CU), der letzteren Typus die traditionelle Ambossform, wenn es sich um eine große Aktivität handelt ( Aber nicht unbedingt) anfangen zu schwinden. (Siehe auch TCU) (abbr) Cloud-to-Cloud Blitz Blitz Wird bei der Beschreibung von Blitz, der in Wolke entsteht und endet in der Wolke. So beschreibt es den Blitz mit Pässen von einer Cumulonimbus-Wolke zur anderen und Blitz, der in einer einzigen Cumulonimbus-Wolke enthalten ist. Hierbei handelt es sich um die diffus beleuchteten Blöcke (Blätter), ebenso wie der, dessen Kanal direkt sichtbar ist, wenn er aus der Wolke herausschleift, bevor er zurück in ihn zurückkehrt. (Siehe auch CA. CG amp GC) (abbr) Cirrocumulus (CC in Luftfahrtberichte etc. Cc sonst) ein hochgradiger, Schichtwolken-Typ mit Elementen der Instabilität, aber selten von Bedeutung für die Luftfahrt oder die allgemeine Meteorologie. (Abbr) konvektives Kondensationsniveau Sofern der Taupunkt eines Luftpakets hoch genug ist, führt die resultierende Kühlung bei konvektivem Aufstieg zu einer Kondensation in einer gewissen Höhe wie der Lufttemperaturpunkt. Die genaue Höhe, die dies erreicht wird, hängt von der Differenz zwischen der anfänglichen Lufttemperatur und dem Taupunkt des Päckchens und auch der Menge des Mischens mit der Umgebung der Luft, durch die das Paket steigt, ab. Das Niveau, in dem die Kondensation erreicht wird (und damit die theoretische Wolkenbasis für Cumuluswolken), wird als Konvektives Kondensationsniveau (CCL) bezeichnet. Zu einer groben Näherung ist die CCL gegeben durch (T-D) 400, wo Tair Temperatur an der Oberfläche, Duew Punkttemperatur (als die Oberfläche): die Antwort wird in den Füßen sein. Kämpfe nicht sklavisch - so gut auf die nächstgelegenen paar hundert Füße. (Abbr) Zentral-England-Temperatur - Eine Reihe verwendet, um zeitliche Veränderungen in den durchschnittlichen Temperaturen über eine große Fläche von Zentral-England zu verfolgen. Siehe quotWas ist die Central England Temperatur Serie quot. KLICKEN SIE HIER FÜR LETZTE DATEN AUS DEM HADLEY ZENTRUM. Abkürzung für Kaltfront. (Abbr) Cloud-to-ground Blitzblitz Wird bei der Beschreibung von Blitz verwendet, der von der Cumulonimbus-Wolke zum Boden verzweigt. Es wird manchmal als Gabelblitz aus seinem Aussehen bezeichnet. (Siehe auch CA. CC amp GC) Der Blitzentladungsprozess ist komplex, es handelt sich um zwei Entladungen pro Hub und es können mehrere Striche in einem Blitz sein (was zu einem oft flackernden Flackern führt). Die anfängliche und sehr schwach leuchtende Entladung bildet einen leitfähigen (ionisierten) und meist hochverzweigten Weg durch die Luft. Die zweite und intensiv leuchtende Entladung bewegt sich in die entgegengesetzte Richtung und entleert die Ladung vom Bodencloudair zum Wolkengelände. Zum Beispiel bezieht sich ein CG auf einen Strich, wo die anfängliche Entladung von Wolke zu Boden ist, obwohl die intensiv leuchtende Entladung, die wir sehen, von Boden zu Wolke ist. Channel Rat (Dutch Kanaalrat) Eine intensive (kleine Skala) Entwicklung Depression, die entlang der Ärmelkanal scuttles. Es kommt (und geht) in einer Angelegenheit von ein paar Stunden bis einen halben Tag. Wegen seiner Geschwindigkeit der Bewegung, gepaart mit seiner oft schnellen Entwicklung (Erhöhung der Windgeschwindigkeit), kann es schwere Probleme für Gebiete neben dem Kanal und der südlichen Nordsee verursachen. Der Begriff wahrscheinlich in Gebrauch (in den Niederlanden) seit mindestens den frühen 1980er Jahren. Beispiele 12. Mai 1983 amp 28. Mai 2000. Durchschnittliches Wetter über ziemlich lange Zeitintervalle, meist größer als ein Jahr und oft 30 Jahre. Es ist darauf zu achten, dass der Zeitraum, für den besondere Klimallebenen gelten, feststeht (oder sie sicherstellen). Wolkenkopf (streng baroklinischer Wolkenkopf) In den frühen Stadien der explosiven Zyklogenese (qv) ist ein sehr markierter Bereich der dichten Schichtwolke - konvex weg von der sich entwickelnden Depression - in IR-, VIS - und WV-Bilder beobachtet, die von der Wolke abgelöst wurden Bereich mit der Entwicklung verbunden. Diese Eigenschaft ist das Ergebnis der Luft schnell aufsteigend, wie die intensive Entwicklung im Gange ist. Studien haben gezeigt, dass alle Mittelbreiten-zyclogenetischen Ereignisse über ozeanische Gebiete, die zu Winden der Hurrikan-Kraft führen, diesen Merkmalen vorausgingen. Allerdings ist darauf geachtet, eine korrekte Identifizierung zu ermitteln und eine echte Erkennung ist nur mit animierten Bildern möglich. (Siehe auch Baroclinic Blatt Trockene Intrusion.) Bei Ensemble-Prognosen (q. v.) zeigen einzelne Mitglieder oft eine starke Gruppierung um ein paar Ergebnisse. Jede Gruppierung wird als Cluster bezeichnet. Je mehr Mitglieder einen bestimmten Cluster bilden, desto höher ist das Vertrauen in diese besondere Lösung. (Siehe auch Ensemble Ensemble mean) Cold Advection Der Ersatz (meist quasi-horizontal) einer warmen Luftmasse durch eine kältere. Der Prozeß kann allmählich oder abrupt sein, der oft bei gut markierten Kaltfrontgrenzen vorkommt. Kaltes Antizyklon Bereich des Hochdrucks mit einem kalten Kern (bezogen auf die umgebende Luft), wobei die kalte, dichte Luft im unteren Teil der Troposphäre dominiert, die zum Oberflächenhochdruck beiträgt. Hat eine flache Zirkulation (d. h. hohe Eigenschaften, die auf niedrigere Schichten beschränkt sind), mit einer niedrigen warmen tropopause usw. Es bildet sich oft in der polaren Luft hinter einer Depression, die sich mit den synoptischen Maßstabsmerkmalen bewegt, mit denen sie assoziiert ist. Wenn jedoch eine große Veränderung des Typs (vom mobilen zum blockierten) im Gange ist, dann kann sich der Hoch in einen warmen Typ verwandeln. Die anhaltende Kühlung der kontinentalen Gebiete im Winter in hohen Breiten (z. B. über Skandinavien, Russland, Sibirien) erzeugt halb-permanente kalte Antizyklone mit einem mittleren Meeresspiegeldruck oft über 1050 hPa. (Siehe Antizyklone Warmes Antizyklon) Kaltverschluss Wenn Luft (an der Oberfläche) hinter einer verschlossenen Front kälter ist als die Luft, die sie verdrängt (der übliche Fall im NE Atlanticmaritime NW Europe), dann ist die Front als eine kalte Okklusion bekannt. Die Okklusion kann auf synoptischen Diagrammen als lineare Erweiterung der Kaltfront gezeigt werden. (Siehe auch Warme Okklusion). Ein Bereich, in dem die Atmosphäre (in der Tiefe) kälter ist als die Umgebung. Die Temperatur wird nicht mit Hilfe von oberflächenbasierten Sensoren (wie z. B. Siebtemperaturen) gemessen, sondern oft (wenn auch nicht notwendigerweise) unter Verwendung von Dickenwerten (qv) mit dem 500-1000 hPa-Maß (grob abgetastet die untere Hälfte der Troposphäre) ) Die am häufigsten für diesen Zweck verwendet. Geschlossene Zentren mit niedrigem Dickenwert (relativ zu angrenzenden Regionen) definieren einen kalten Pool. Der entgegengesetzte Begriff (für ein geschlossenes Zentrum von hohen relativen Werten), ist Quermarm-Domequot, obwohl dieser Begriff wird nicht viel heutzutage gehört werden. Cold-Front-Welle Ein sekundäres Niederdrucksystem, das sich auf einer ausgedehnten Kaltfront bildet, wo der thermische Kontrast über die Front (in der Troposphäre) groß ist und das obere Muster förderlich für fallenden Druck an der Oberfläche ist. Die Welle kann sich sehr schnell bewegen (in Richtung der allgemeinen oberen Fahrströmung) und wird zu einem Zögern in der Freigabe der Hauptkaltfront an der Oberfläche oder ihrer Rückkehr zu Bereichen führen, die zuvor einen Abstand erlebt haben. Nicht alle solche Wellen entwickeln geschlossen-niedrige Eigenschaften einige werden nur schnell laufen entlang der Länge der hinteren kalten Front mit wenig Entwicklung, andere als Verbesserung der Niederschlag. Wegen des kleinen Maßstabs der anfänglichen Entwicklung, NWP-Modelle nicht immer platzieren und prognostizieren diese richtig. (Siehe auch Warm-Front-Welle) Kalt-Unterschnitt Ein Begriff, der oft in Situationen verwendet wird, in denen die Advektion von relativ milderer Luft, wie sie durch 850hPa-Variablen verfolgt wird (zB tatsächliche 850-Temperatur, ThetaE, ThetaW-Teildicke etc.), nicht vollständig die Ereignisse in der Niedrigste 50 bis 100hPa (dh innerhalb der planetaren Grenzschicht). Im Winter kann die Warmluft-Advektion durch Niveaus von über 850hPa angezeigt werden, aber der Oberflächenwind ist gut gesicherter Verstärker aus kälterer Richtung - die kalte (relativ dichtere) Luft unterschneidet den darüber liegenden milderen Luftstrom: eine Entkopplung erfolgt, Deprimierende Temperaturen und die Schaffung eines markierten Inversions-Potentials bewölkte Mischschicht. Die kalte Luft kann entweder eine Luftmassenbeschickung oder eine lokale Quelle sein, z. B. Kalte Luft aus der Nordsee am Ende des Winters oder Frühjahres. Farbzustände METAR-Berichte von militärischen Flugplätzen, die von der RAF betrieben werden, einige USAF und andere können einen Farbcode angehängt haben (in der Regel nur, wenn ATC offen ist), was die Flugfeld-Eignung beschreibt: Diese laufen von BLU am besten, durch WHT GRN YLO (1 Und 2), AMB und ROT. Die Farbe basiert auf der niedrigsten Wolke Basis (in der Regel 3 Oktas oder mehr Deckung, aber einige verwenden 5 Oktas) und die horizontale MET Sichtbarkeit. SCHWARZ wird auch verwendet, für Flugplatz aus nicht-wetter Gründen geschlossen. Kondensation Die Umwandlung von Wasserdampf zu flüssigem Wasser in den Prozess, Latentwärme (Verdampfung) wird entweder freigesetzt oder absorbiert. Siehe auch Verdunstung Kondensationskerne Mikroskopische Partikel in der Atmosphäre, die als Fokus oder Stimulans für Cloud-Drop-Wachstum fungieren. Conduction Wärmeübertragung durch eine Substanz von Punkt zu Punkt durch die Bewegung (oder Aufregung) von benachbarten molekularen Bewegungen. Wenn sich Strömungslinien (q. v.) einander nähern, ist das Muster ein konfluentes. Beachten Sie jedoch, dass Stromlinien nur die Windrichtung definieren. Und nicht die Windgeschwindigkeit. Ein konfluentes muster ist nicht unbedingt ein congenent muster Das Gegenteil von Konfluenten ist diffluent (oftmals in nordamerikanischen Texten unterschiedlich geschrieben). Dieser Fall bezeichnet die Ausbreitung von Stromlinien. Wiederum sind solche diffluenten Muster nicht notwendigerweise divergent. (Siehe auch Konvergenz Divergenz.) Konservative Eigenschaft Meteorologen sind immer bemüht, eine Luftmasse unter Verwendung eines Wertes zu kennzeichnen, der aus Variablen berechnet werden kann, die innerhalb dieser Luftmasse (auf verschiedenen Ebenen) gemessen werden, die aber konstant bleiben oder nahezu in vertikaler ( Adiabatische qv) Bewegung. Viele Eigenschaften sind definiert, wie zB die Potentialtemperatur (Theta), die äquivalente Potentialtemperatur (Theta-e) und die Nassbirnenpotentialtemperatur (Theta-w). Diese letztgenannte Maßnahme wird häufig in der operativen Meteorologie in NW Europe verwendet: Auf der 850hPa-Ebene wird sie als Tracer für Luftmassen verwendet und wird vielfach für die Definition von Frontalgrenzen und für die Definition der Achsen von warmen Luftkolben verwendet. (Siehe auch den Eintrag unter Wet Bulb Potential Temperature) Linien auf einer oberen Luft (konstanter Druck) Diagramm (tatsächliche oder prognostizierte) Fügeplätze gleicher Höhe, 700 mbar 500 mbar etc. oder gleicher Dicke. Kühlung TRAIL. Auch abgekürzt (von alter Codierungspraxis) bis COTRA. Sehen Sie sich ein, wenn einige hoch fliegende Flugzeuge weiße Spuren hinterlassen und MINTRA (dieses Glossar) Kontrolllauf Das Prinzip der Ensemble-Prognose ist, eine Analyse durch kleine Mengen leicht zu stören und zu sehen, was das Ergebnis für jede Veränderung ist. Anstatt eine Zentimeter-Operationsmodellanalyse und eine Prognoseausgabe zu verwenden (die viel Rechenzeit und maximale Datenerfassung benötigt), eine frühzeitige Analyse und ein Prognoselauf, wobei dieselbe Physik wie die operative (OPOPER) laufen, aber bei niedrigerer Auflösung durchgeführt wird (typischerweise Halb-Skala) eingesetzt wird. Dies wird als Kontrolllauf bezeichnet. Angesichts der Zunahme der Rechenleistung in den letzten Jahren kann die Kontrolle höhere Spezifikationen haben als einige Betriebsmodelle von weniger als einem Jahrzehnt Konvektion Die Übertragung von Wärme durch die tatsächliche Bewegung der erwärmten Substanz, wie Luft oder Wasser. In der Meteorologie bedeutet Konvektion auch einen vertikalen Transport durch Dichte-Ungleichgewicht, Transport von Masse, Wasserdampf und Aerosolen sowie Wärme. Konvektives Kondensationsniveau Konvektiver Niederschlag Für die Niederschlagsfertigung (Regen, Schnee etc.), andere Bedingungen erfüllt, muss es eine Aufwärtsbewegung durch die Wolke geben, die Regen, Schnee, Hagel oder was auch immer macht. Bei konvektiven Niederschlägen wird die Aufwärtsbewegung durch die Freisetzung der Konvektion in einer instabilen Umgebung gewährleistet. (Siehe "Stabile und instabile Luftmassen"). Computermodelle im operativen Einsatz bewältigen Instabilitätsmerkmale über Parametrisierungsschemata (qv), die in jedem Modellgitterquadrat ideale konvektive Türme vornehmen, unter Berücksichtigung des Mitnehmens von trockener Luft, feuchter konvektiver Kraftverstärkertiefe, Temperaturstruktur etc. Algorithmen werden Modellregen vergeben Entweder dynamisch oder konvektiv: Der Typ, der die größte Rate des Niederschlags gibt, ist (gewöhnlich) das, was auf dem Ausgabetiagramm erscheint (siehe auch dynamischer und orographischer Niederschlag) Konvergenz Wenn Luft so fließt, dass die Fläche von einer bestimmten Gruppe besetzt ist Der Luftpartikel verringert sich (Zusammenziehen), das Muster soll konvergent sein. Die Konvergenz in der Atmosphäre ist mit der vertikalen Bewegung und damit der Entwicklung (oder Schwächung) von Wettersystemen verbunden. Zum Beispiel ist die konvergente Strömung in der Nähe der Oberfläche mit der Aufwärtsbewegung verbunden und kann die primäre Ursache für eine Aufwärtsbewegung sein, was zu einer Verschmelzung der Wolkenformationen führt. (Siehe auch Divergenz konfluent.) Konvergenzzone In der Regel bezieht sich auf eine Low-Level-Funktion Ist ein schmaler, langgestreckter Bereich, in dem zwei verschiedene Luftströme so konvergieren, dass Luft innerhalb der Zone aufsteigen muss, was zu einer erhöhten Wolkenausscheidungsbildung führt, insbesondere wenn die Luftmasse potentiell instabil ist. Die Zone kann sich stromabwärts mit der Zeit ausbreiten, und ihre Aktivität, Lage und Ausmaß werden durch die synoptischen Muster bestimmt, die die Zone an erster Stelle auslösen. Marked on a synoptic chart by a solid line along the axis of the zone, with angled branches indicating the convergence. In synoptic systems (e. g. a developing depression) airflow is not uniformly horizontal, and the system velocity (i. e. the speed of translation of the Low) must also be allowed for. High-velocity air aloft overtakes the synoptic feature, whilst lower down, the system often moves faster in a given direction than the low level airflow. To cope with all this, the concept of conveyor belts was adapted for use in synoptic and mesoscale meteorology as a means of explaining the movement of heat, moisture and momentum around such systems. For example, in a developingmobile depression, a warm conveyor belt (WCB) is assumed to rise from low levels in the warm sector just ahead of the surface cold front, to middle and upper altitudes over and well forward of the surface warm front. A compensating cold conveyor belt (CCB), descends from mediumupper levels well ahead of the surface warm front underneath the WCB then tucks around the backside of the low merging with the boundary layer flow. Coriolis effect As a consequence of earths rotation, air moving across its surface appears to be deflected relative to an observer standing on the surface. The deflection is to the right of movement in the northern hemisphere, to the left in the southern hemisphere. (also known as the Coriolis acceleration, or deflection) Coriolis parameter A important quantity in theoretical meteorology because it plays a major part in describing (mathematically) how air moves on our spinning planet under the influence of a pressure gradient. It is usually denoted as f . and defined as twice the product of the angular velocity of the earth and the sine of a particular latitude. The angular velocity of our spinning earth is (for practical purposes) constant, therefore the important variable is latitude: from the definition, f varies from a maximum at the poles (sine 90deg1) to zero at the equator (sine 0deg0). see also Absolute Vorticity Cross-contour flow (abbr) Cirrostratus (CS in aviation reports etc. Cs otherwise) a high level, layer cloud type, due to wide-scale ascent in the upper troposphere: of no significance for aviation but is a pre-cursor to frontal activity to come. (abbr) Cumulus (CU in METARSIGWX reports etc. Cu otherwise) a convective cloud type, with a base in the lower part of the troposphere, and varying from weak to vigorous vertical penetration, possibly into medium or upper levels. (see also TCU ) Cut-off time NWP models that are used in operational meteorology must have a nominal time at which the gates are closed to new data, and the forecast computation cycle is started. For models used for primary forecast guidance at short lead times, only a couple of hours at most is allowed after the nominal data time. So for example, the cut-off for 12UTC data might be around 1345UTC. For global models, i. e. those used for international aviation, a slightly longer time is allowed, but usually no more than 3.5hrs after data time. However, some centres (e. g. ECMWF ) with less demand for immediate products allow over 9 hours or more of data to be assimilated. Cyclogenesis The formation of a major low pressure system along a baroclinic zone (q. v.) (or frontal boundary), with primary forcing due to imbalances along the upper jet. (depressions) Weather systems characterised by low pressure and rising air flows. Wind circulation is anticlockwise in the northern hemisphere and clockwise in the southern hemisphere. Cyclonic trough disruption The southern (northern in the southern hemisphere) portion of a trough advances, perhaps developing a cut-off circulation, and slowly warming out, whilst the opposite (residual) portion of the trough becomes quasi-stationary, maintaining a cyclonic pattern at the surface. (also see: anticyclonic trough disruption ) (abbr) Dry Adiabatic Lapse Rate. The rate of cooling (for ascending air), or warming (for descending air) when air parcels are displaced by whatever mechanism. Usually taken to be 10degC1km (or 3degC1000ft). Dekametres (i. e. 10s of metres) - often used on upper air charts: thus a 500 hPa height quoted as 540 dam is equivalent to 5400 metres. (NB: although DM may still be seen on some model output, this is regarded as a non-standard abbreviation. dm should definitely not be used, as it is the abbreviation for decimetres, i. e. tenths of a metre) Daughter cell As the precipitation downdraught associated with a marked Cumulonimbus event meets the ground, it will spread out in all directions. Where this cold outflow current meets the low level inflow (relative to the cloud motion) head-on, then this is a point of maximum convergence, leading to forced lifting of the air at that point, and provided the air is unstable enough, and convection is not otherwise inhibited (e. g. widescale descent), then a new convective cloud event will be initiated - a daughter cell. Decoupling Even given the strongest pressure gradients, surface-based friction will slow airflow in the lowest 800 m or so of the atmosphere, leading to the familiar cross-isobaric flow (from high to low pressure). With strong free-air gradients (Vgr) (roughly above 25 knots or 12 ms), surface winds will bear some relationship to Vgr however, below these (approximate) levels, come nightfall under clear skies, surface cooling will lead to stabilisation of the lowest layers and the atmosphere finds it increasingly difficult to transfer momentum from the free-air levels to the near-surface. The surface wind may drop away completely as the surface-based inversion develops (often within the course of half-an-hour), allowing mist, fog or surface frost to form (other factors being in place): this process has come to be known as decoupling of the boundary-layer air from the flow inferred by the isobaric flow. Once the flow is decoupled, then the surface cools even more efficiently, thus reinforcing the nocturnal inversion. (See quotStable and unstable air masses quot for discussion of stability etc. and quotWhy does the wind blow quot for matters concerning surface wind-flow.) (used in METAR reports) - fog dispersal operations are in progress (probably obsolete now so included for historical purposes). Deterministic forecast A forecast that says rain will occur at such-and-such a place within a given time band, i. e. a yesno forecast, is an example of deterministic forecasting. (See also probability forecasting .) (strictly dew-point temperature) The temperature (of an air sample that contains water vapour), to which that sample must be cooled (Pressure and humidity content being held constant) to achieve saturation with respect to a water surface. It can be measured indirectly using a wet amp dry hygrometer (ordinary dry bulb thermometer, and anotheradjacent thermometer with its bulb covered in a damp muslin - hygrometric tables or calculator then being used to calculate the dew point, relative humidity, vapour pressure) also by a dew-cell type of instrument that measures relative humidity, from which the dew point can be calculated, or it can be measured directly by a dew-point hygrometer. The screensurface dew-point temperature is used in air mass analysis, and also in the calculation of night-minimum and fog-point temperatures, as well as being used in the estimation of convective condensation levels, human-comfort indices, probability of snow at the surface etc. Dew point values above the surface (from radio-sonde ascents) are used to define cloudy or potentially cloudy layers etc. in the upper air (see also Frost point ). Dew-point depression (DPD) The numerical difference between the temperature of a sample and its dew-point. The greater the difference, the lower the relative humidity. Values (deg. C) of less than 3 would be considered to indicate high relative humidity those of 7 or greater would indicate low relative humidity. (See quotWhat is the dew point depression quot) Discontinuity Where a steep gradient (i. e. sharp change over a small horizontal distance) occurs in a meteorological variable (i. e. temperature, humidity, wind direction etc.), there is said to exist a discontinuity in that variable. Diurnal cycle Changes which take place over the course of a 24hr period. The most obvious cycle is the rise and fall of surface temperature. Divergence When air flows in such a way that the area occupied by a particular group of air particles grows (spreads apart), the pattern is said to be divergent. Divergence in the atmosphere is also (along with convergenceq. v.) associated with vertical motion, and hence development (or weakening) of weather systems, depending upon the level where the divergence is dominant in a particular atmospheric column. For example, divergent flow aloft is coupled to, and may be the primary cause of, upward motion, leading to widespread cloud formationcyclogenesis etc. ( see also diffluent.) (obsolete abbreviation for dekametre - see entry for dam ) Downward penetration of Snow Falling snow modifies the temperature structure of the atmospheric boundary layer as both melting amp evaporation takes place. (See this question in the FAQ). Even if snow does not initially penetrate to the surface (after having fallen out of the parent cloud), if the Wet Bulb Freezing Level (q. v.) is low enough, the intensity of the precipitation is more than just light and the mean wind strength in the melting layer is not too strong, then the snow level can descend considerably below initial conditions. The depth of this downward penetration of snow as it is called, increases as the intensity of rain increases, andor the wind speed decreases. It will be immediately apparent that the prospect for error in snow forecasting due to these variables in marginal situations will be large Used in METAR reports - low drifting (snow, sand etc.), not appreciably affecting the visibility, e. g. DRSN. Partial amp absolute droughts are terms that are no longer used in official summaries they were introduced in 1887 by G. J. Symons in British Rainfall, (with the term dry spell added in 1919) but ceased to be used circa 1960. Drought hydrology is a complex field of study and as statistical amp data-processing techniques have become sophisticated, there was no longer an official requirement for the use of these rather crude definitions. It is however useful to know what the definitions were, and even today, for hobby-use, defining periods of drought using these standards can be an interesting exercise. They are detailed elsewhere in this Glossary. (See Absolute drought partial drought dry spell). Droughts - types of Droughts as defined above are essentially meteorological in other words, they are defined in terms of the amount of rain (or rather lack of rain) that occurs. Hydrological drought episodes by contrast take account of the wider cycle of water use, with focus on the imbalance between precipitation input (rain, snow etc.) against water availability via aquifer storage, reservoir levels, land-surface run-off riverflow etc. Finally, agricultural droughts are usually defined in terms of the soil moisture deficit (SMD) across a growing season - the degree of irrigation (natural or artificial) needed to bring a particular soil type back to optimal production - balancing the outgoing moisture due to evaporation amp transpiration. Dry Intrusion (or dry slot) -- A narrow region, virtually cloud-free which separates a baroclinic leaf (q. v.). and the adjacent frontal cloud. This region is the result of abruptly descending upper troposphericlower stratospheric air into a rapidly developing and potentially damaging low pressure system -- hence the low humidity contentabsence of cloud. Water vapour imagery (see quotWhat are the various types of Satellite imagery available quot) in particular is used to diagnose this feature, and the rate of darkening of the dry slot gives a clue to the rate of development of the whole storm complex. A period of at least 15 consecutive days, to none of which is credited 0.04 inches 1.0 mm of precipitation. (See also drought. absolute drought. partial drought ). Duststorm (used in METARTAF reports etc.) visibility generally lt 1km due to dust raised by strong winds over a large area. Dust (widespread, in suspension) (used in METARTAF reports etc.) Visibility is 5000 m or less. Deutscher Wetterdienst (German Weather Service), based at Offenbach. Visit their web site at: dwd. de Dynamic precipitation For precipitation production, other conditions being satisfied (i. e. enough humidity, required temperature structure, sufficient depth of cloud), there must be a supply of upward motion through the cloud producing the rain, snow or whatever. In the case of dynamic precipitation, the primary agent for providing upward motion is broad-scale ascent due to, for example, short-wave troughs in the prevailing upper flow, jetstream developmental areas, mass convergence or strong warm advection. Computer models in operational use deduce dynamic precipitation by testing for super-saturation of a layer taking into account the total water content (all phases) in a layer: the excess found is precipitated out. The type (dynamic or convective) giving the greatest amount is (usually) that seen on output charts. (See also Convective and Orographic precipitation). Drizzle (as in METARTAF reports). Eclipse (of a geostationary satellite) The earths equator (and therefore a geostationary satellites orbit) is inclined to the orbit of the earth around the sun. This inclination allows sunlight to power the satellite on-board systems for most of the year. However, there is a period of about 3 weeks either side of the vernal and autumnal equinoxes when a satellite will be in the earths shadow for about 70 minutes each day (around local midnight). Because most of these platforms do not carry sufficient battery power to tide them over this gap, no imagery is generated and thus a local-midnight image is missing. European Centre for Medium Range Weather Forecasts, located on the southern outskirts of Reading, Berkshire, UK. Visit their web site at ecmwf. int Eden Winter Snow Index Embedded (in cloud, as in CB embedded in LYR cloud). A collection of NWP runs (typically in excess of 15, many having 50 or more) from the same start time (t0) and using the same model physics, but each run (or member) having a slightly perturbed (altered) set of initial conditions from the control run (q. v). The alterations are constrained within limits which are calculated in various ways - one example being that of performing a separate short-range model run and identifying the errors that would grow most over a 48 hr period. These errors are then applied in varying amounts to the initial conditions before performing the operational ensemble run. Another technique is to use (known) errors from a previous run and apply these in small amounts to the initial conditions of the new run. NB: these output are in addition to (and run some time after) the operational model output, i. e. the deterministic run which is the set of charts most often seen on web sites: it should not be assumed that the operational run (OPOPER: q. v .) is close to the ensemble mean (q. v.) or mode - significant deviations can and do occur at longer lead times. Also note that a particular centres operational model is often run at a higher spatial resolution than that used for the ensemble generation - the control . Ensemble mean An average of the ensemble output from a particular computer run - this is usually more accurate than just following one of the individual forecasts that make up the average. Further, by comparing the individual members spread about the mean, some estimate can be made of the reliability of the forecast: if there is strong agreement and therefore small divergence from the mean solution, then high confidence can be assigned to the average solution. Wide divergence, or clustering of groups of individual members well away from the mean will lead to considerable caution regarding using the output too slavishly and lower confidence in issued forecasts. Environmental Lapse Rate The actual plot of temperature against height (or equivalent) on a thermodynamic diagram. Evaporation The transformation of liquid water to water vapour - in the process absorbing latent heat (of vapourisation). (abbr)(also EWR) England and Wales Precipitation (or Rainfall). A data series combining the rainfall (and melted snowfall) amounts from a matrix of recording stations (well over 30) averaged to produce a single figure for an area taken to represent England and Wales. The series runs from 1766 - maintained (separately) by the Hadley Centre (EWR) and the University of East Anglia (EWP), though I understand that a unified data-set is to be (has been) produced. CLICK HERE FOR LATEST DATA FROM THE HADLEY CENTRE. Explosive cyclogenesis Sometimes, in an otherwise normal cyclogenetic situation, factors are conducive to rapid falls of pressure leading to very tight isobaric gradients extreme low pressure. These situations often give rise to damaging or stormy hurricane force winds: watch for 3-hourly pressure falls in excess of 10.0 mbar. (sometimes referred to as bombs , particularly in North American meteorological circles.) (abbr.) Funnel Cloud - (in METARTAF, this includes tornadowaterspout, so differs from the classical distinction between a funnel cloud not touching down, and one that does. See entry for Funnel cloud ). 1 or 2 oktas cloud amount, used in AviationMETAR reports etc. (see also SCT ) (abbr) Fog (vis lt 1000m, except when qualified by MI, BC, PR, VC) used in METARTAF reports etc. (abbr) Flight level (e. g. FL240. 24000ft amslstandard atmosphere) used in aviation reports, forecasts etc. Foehn (or Foumlhn) effect Mechanisms which give rise to a warm, dry wind on the leeward side of mountains or significant hills. Broadly, there are two: (i) the subsidence type where air at amp just above the hillmountain crest descends by lee-wave action, becoming even drier amp warmer than when it started out (ii) all air in a moist airstream on the upstream side of the hillmountain rises, leading to cloudprecipitation formation, thence lowering the humidity content, this air then descendingwarming adiabatically on the leeside. Reduction in visibility to under 1 km caused by suspension of minute water droplets (water fog) or ice crystals (Ice fog - q. v.). Water fogs are further sub-divided according to the process by which the fog forms, e. g. Radiation fog (caused mainly due to loss of surface heat from the ground at night in conditions of near-calm wind and high relative humidity) Advection fog (caused by movement of humid air over a relatively colder surface) Upslope fog (adiabatic cooling of air having high relative humidity as it climbs over high ground) and Evaporation fog (caused by evaporation into cold air which lies over a relatively warm water surface). ( If the visibility is below 200M but greater than 50M then it is usually referred to as Thick Fog (amp colloquially in the UK as motoring fog ) amp if below 50M then Dense Fog . However, there are different criteria for climatological stations, and other services will have different rules - treat this note as a guide only. ) (strictly fog-point temperature) The air temperature (as measured in a standard thermometer screen) at which fog is expecteddoes form. Its calculation (before an event) is usually based on empirical work which employs either the surface air temperaturedew point at some time earlier in the day, or by construction on a thermodynamic diagram. The fog point is lower than the air-mass dew point. because as air cools through the evening and night, moisture is condensed out on contact with the chilled land surface, and this lowers the dew point from afternoon values. Freezing fog As for the definition of fog (above), but the droplets are super-cooled (i. e. temperature below zero), and strictly, the fog should be depositing rime-ice. However, in METAR TAF coding, as long as the air temperature is below zero degC, then fog is coded as freezing irrespective of whether rime is observed. Freezing level Taken as the altitude where the air temperature is 0 deg. C. However, it should be carefully noted that in the free atmosphere, liquid water does not necessarily freeze at this level, or indeed at altitudes some way above this value -- it should more correctly be called the melting level, or as in operational aviation meteorology, the level (or altitude) of the zero degree isotherm. (see ZDL and Wet Bulb Freezing Level) . A boundary separating two air masses such as warm, moist air and cold, dry air. If the cold air pushes into a region of warm air, a cold front occurs and if the warm air advances relative to the cold, a warm front occurs. Frontal fracture During rapid cyclogenesis events, a weakness appears along the portion of the cold-front nearest to the depression centre, thought to be due to a combination of subsidence in this region, plus differential thermal advection, as, unlike in the Norwegian model. cold air is not advected so quickly eastwards to maintain the baroclinicity in this region. (See quotWhat is the Shapiro-Keyser cyclone model quot) Frontogenetic Any atmospheric process which leads to frontal formation, or an existing weak frontal zone to become enhanced, is termed frontogenetic. On charts issued by some national meteorological services, such fronts are shown with the normally solid line defining the front broken by spaces and large dots. Frontolysis When fronts weaken markedly due to, for example, marked anticyclonic subsidence across the front, then the feature is undergoing frontolysis. On charts issued by some national meteorological services, such fronts are shown with the line defining the front struck-through by short inclined strokes. Frost point (strictly frost-point temperature) The temperature (of an air sample that contains water vapour), to which that sample must be cooled (Pressure and humidity content being held constant) to achieve saturation with respect to an ice surface. (see also dew point ). (abbr) Frequent (hardly or not separated, as in FRQ CB in aviation forecasts). (abbr) Smoke (as used in METARTAF reports etc.) Funnel cloud (FC) A visible rotating tube of condensation particles, formed as the pressure falls in an intensifying vortex (extending below a cumulonimbus cloud) - perhaps reaching the groundsea. It should be noted that the funnel cloud simply betrays the zone where the pressure is low enough and humidity high enough for cloud to form - the vortical circulation may well be in contact with the ground, but of such relatively weak intensity that it either causes little or no damage, or is detected only by surface dust soil disturbance. There are well documented cases where tornadoes (as defined elsewhere) do not have cloudy funnels all the way to the surface. (based on Doswell, C. A. III, 2001) (abbr) Freezing (used in connection with rain, drizzle, fog, all giving rise to ice deposition due to supercooled water droplets impacting upon surfaces with temperatures below 0.0degC, OR when temperature lt 0.0degC anyway, whether or not ice deposits are observed.) Glossary terms G-L The word gale is used in everyday speech in a rather loose way to describe any strong wind, for example. quot its blowing a gale outsidequot, when it may be just a strong blow in inland areas of the southern Britain. Meteorologists must work to a strict definition of a gale. For operational forecasting (UK Met Office practice) both for land and sea use, a gale Force 8 on the Beaufort scale is defined as a mean wind (over 10 minutes) of 34 knots (39mph, 63 kmhr, 17 ms ) or more, or gusts of 43 knots (49 mph, 79 kmhr, 22 ms) or more. This definition is also used for verifying Shipping Forecasts and Gale Warnings. Isolated gusts accompanying squalls or thunderstorms are not counted. However, for climatological purposes (i. e. post-event analysis), only the mean wind is considered, i. e. a mean wind of 34 knots or more, as specified in the Beaufort wind scale (q. v.). see also definitions for Severe Gale. Storm. Violent Storm and Hurricane Force. (abbr)ground-to-cloud lightning flash Used when describing lightning which branches from the ground to the cloud. The upward branching often results in an appearance like a trident, etc. This is an uncommon type of lightning. (See also CA. CC amp CG ) (abbr) Global Ensemble Forecast System (of NCEP q. v.) Geopotential quotPotential energy per unit mass of a body due to the earths gravitational field, referred to an arbitrary zeroquot (The Meteorological GlossaryUK in meteorology, mean sea level is the reference level). A geopotential metre (by this definition) is related to the dynamic metre (straightforward unit of length) by the expression: 1 gpm0.98 dynamic m. Geopotential height differs from geometric height where the value of the gravitational constant (g) departs from 9.8 ms2. Gravity does vary, both by altitude and latitude, but for practical purposes, when looking at NWP output on the web, you can ignore these slight differences. Geopotential heights are used in meteorology because flow along a geopotential surface involves no loss or gain of energy, whereas flow along a geometric surface may do so - so for strict physical mathematical calculations within computer models, the distinction between the two must be maintained. Geostrophic wind Defined as the (theoretical) wind that would blow on a rotating planet which results from a balance between the pressure gradient causing the initial displacement of the air, and the apparent (to us on the earth) deflecting force due to the planetary rotation. Many corrections are needed to find the true wind vector amongst which are the effects of friction and the several forces involved when the pressure pattern changes - which is the usual case. However, by this definition we get the general statement that the speed of the geostrophic wind is proportional to the pressure gradient, or inversely proportional to the distance between isobarscontours. Curvature of the flow must also be taken into account. see Gradient wind . Global Forecast System The primary forecast model (NWP ) from the US NCEP service (q. v.). The model suite is run to T 384 hr, in two bursts one to T 120 (5 days) then a further run to the 16 days (384hr). The model is run four times daily, though not all WWW sites hold all runs (or full output for each run). Hail (abbr. from French) dia: gt 0.5 cm used in METAR aviation reports etc. Gradient wind When the path that an air parcel takes is curved (relative to the earths surface), as so often in meteorology, that airflow is subject to an additional force necessary to maintain a curved path. For cyclonic flow, the true wind that blows will be less than the theoreticalgeostrophic wind for anticyclonic flow the true wind is greater, subject to a limiting maximum. This is why, for example, around what initially looks like a dramatically intense depression, the wind may not be quite so excessive: cyclonic curvature will account for substantial negative correction to the theoretical value. Around a surface ridge, the wind is often surprisingly stronger than might be implied by isobaric spacing. Small hail dia: lt0.5cm used in METARaviation reports etc. Given that the wind in the surface boundary level varies markedly about the mean wind (q. v.), it is often necessary to report the accompanying instantaneous maximum (or gust speed) in a defined period. For METAR reports, then the period over which this peak wind is reported is between 2 and 10 minutes (depending upon the country). For SYNOP reports, the period is either the last hour (most likely in NW Europe), or the period covered by the past weather group in the report - reference to the accompanying amplifying groups will usually sort this out. Hectopascal Horizontal vorticity Even the classical vertical vorticity term (q. v.) has some upward downward component, but this is usually ignored for practical synoptic-scale meteorology. However, when coming down a scale or two, to local mesoscale development, ( e. g. severe convective storms ), then vorticity about a horizontal axis is most important. It is often assessed in the lowest 3 km of atmosphere, and is driven by two terms: vertical speed shear (increase decrease of wind with increasing altitude) and directional (twisting) shear, the change of direction with increasing altitude. If, in the lowest 3 km of atmosphere (up to 700 hPa), there is both a sharp increase of wind speed and a directional veer of wind with height, then horizontal vorticity will be potentially significant, provided it is coupled to the vigour of a developing cumulonimbus complex. (See also Vorticity Vertical vorticity and quotWhat is Helicity quot (abbr) Hectopascal - equivalent to a millibar (q. v.). An attempt to use SI units without doing away with the idea of millibars (from the c. g.s. system). 1 hPa100 Pa (or Nm 2 ) Hurricane Force This term (in UK Met Office use) is only used in shipping bulletins and associated GaleStorm warnings in the form quotHurricane Force 12quot, from the modified Beaufort scale. It is strictly defined as a mean (10 minute) wind of 64 knots or more. (Gusts not defined) (See also comments at Severe Gale ). Please note carefully that just because an area of low pressure produces winds to hurricane force as defined here, it does NOT make that feature a Hurricane For more on this, see this question on the October 1987 storm Haze: used in METARTAF reports etc. when visibility is reduced in a dry atmosphere. (visibility gt 1km, relative humidity roughly lt 90). (abbr) Ice crystals (also known as diamond dust) used in METARaviation reports. When used in aviation weather reportsforecasts, implies aircraft superstructure icing. A period of 24hr (conventionally beginning 0900UTC), during which the air temperature is less than 0 degC. Visibility reduced to less than 1000 m by suspension of minute collection of ice crystals in high concentration. The crystals will glitter and may give rise to optical phenomena. (NB: this is NOT the same as freezing fog, which is composed of water droplets - see definition elsewhere.) Initialisation The process whereby a model analysis is produced by utilising model fields from an earlier run, and integrating synoptic, and asynoptic observations to produce the initial state at t0. The model analysis may not be the same (in detail), as a hand-drawn analysis, and intervention (q. v.) is sometimes needed as a result to preserve some small scale features which can influence the forecast run. Insolation Radiant energy received from the sun on any particular surface. Often used when discussing receipt of infra-red radiation on the surface of the earth. Instant Occlusion or pseudo-occlusion The name that has been coined to label the cloud mass associated with an active trough in the cold air, that comes close to, and interacts with a pre-existing baroclinic zone, forming a pattern that looks superficially as if it was part of a traditional occlusion process. Intertropical Convergence Zone (usually abbr. ITCZ) A zone (often rather broad, but sometimes quite narrow), which separates the air-masses brought together by the low-level outflow from the sub-tropical high pressure belts north and south of the equator. Over the oceans, the zone can be well marked over land, sensible heating usually leads to breaks or other anomalies, and the regional-scale monsoon circulations also distort, or swamp the idealised structure of the ITCZ. Cloudiness (and hence precipitation activity) can vary sharply over a period of 24hr. Day-to-day change of position is often small, but the zone migrates north amp south through the course of a year, roughly in sympathy with the changing position of the sun. Intervention A process where forecasters force acceptance of a report rejected in the model initialisation routine (supporting), or use bogus observations to input a conceptual model observed in imagery. (abbr) Intensifying (as used in SIGMETs for a phenomenon becoming more intense or extensive). Inversion (of temperature) A layer in the atmosphere (usually very shallow lt 0.4 km), where temperature rises with increasing height. Two of the best-known in operational meteorology are the nocturnal inversion (formed due to strong cooling of land surfaces after sunset), and the subsidence inversion (due to descent amp adiabatic warming of air associated with anticyclones). Another near-surface type is that formed when warm air travels over a cold surface (e. g. cold seas or icesnow). (abbr) Isentropic Potential Vorticity - the product of the absolute vorticity of an air parcel, and its static stability, calculated along a constant surface of theta (potential temperature), hence the isentropic. Anomalies in IPV around the level of the tropopause (and hence in the region of the driving jet stream) can be related to developments through the troposphere, leading to cyclogenesis. Because IPV is a highly conservative property for any sample of air, it is found to be particularly useful for tracking the path that stratospheric air (high IPV values) will take as it enters the upper troposphere during rapid cyclogenesis events. NWP models can be programmed to output the height of a particular IPV value - defined such that it samples air in the model stratosphere. These patterns are then overlaid on water vapour imagery, and any mis-match between model and reality are quickly seen and allowed for. (See also Potential Vorticity ) and also this article on Water Vapour Imagery. (abbr) International Standard Atmosphere. A standard reference for temperature, pressure, and relative density structure in the troposphere and lower stratosphere, used for the calibration of (pressure) altimeters. A line on a synoptic chart joining points of equal atmospheric pressure. Isolated (as in ISOL CB etc.) Abbreviation used in aviation work to stand for quotover-landquot. Lapse rate (of temperature) The decrease of temperature with height in the atmosphere. Confusingly, the opposite case, an increase in temperature with height, is known as a negative lapse rate. Latent heat The amount of energy needed to accomplish a phase change. Latent heat of fusion is the amount of energy required to melt ice, and at 0degC is 3.34 10 5 J kg -1 (or about 80 calg). The latent heat of vaporisation is the amount of energy needed to evaporate liquid water. It is equivalent to 2.50 10 6 J kg -1 (or about 600 calg) at 0degC. The latent heat of sublimation is the energy needed to carry out a change from solid (ice) to gas (vapour). It is the sum of the latent heats of fusion and vaporisation, i. e. 2.83 10 6 J kg -1 (or about 680 calg) at 0degC. When water freezes, condenses or changes from a gas to a solid, 80 calg, 600 calg and 680 calg are released to the environment respectively. The processes are all reversible. Lies on the cold side of the jet axis, in the region of marked deceleration of flow. A preferred region for cyclonic development. Lenticular clouds These form within the crest(s) of orographic (or lee) wave-trains, over and downwind of hills mountains islands, provided of course that the air is humid enough. The clouds are formed because air cools as it is forced to rise and if condensation takes place, lens-shaped clouds are observed with clear space in between the elements. The cloud forms within the upwind leg of each wave-crest and dissipates (evaporates) on the downwind leg: the air is therefore flowing through the cloud, with the cloud itself staying quasi-stationary change in the cloud requires an alteration in the windflow or temperature humidity environment. Sometimes, under very special circumstances, a pile of plates is observed, where lenticular clouds are stacked vertically. The most common form of wave-forced cloud is perhaps Altocumulus lenticularis (Ac len), but lenticular cloud forms are found at all levels. Standing wave motion can also lead to a previously uniform sheet of cloud developing a lenticular appearance, and on occasion, complete dispersal. (See also MTW ). Loaded gun scenario On a day of instability through a great depth of the troposphere, and high values of CAPE (q. v.), rising surface temperatures will at some point ensure that convection parcels leave the surface, the condensation level will be reached, cloud will grow (given sufficient moisture), and a heavy shower, or even a thunderstorm will result. It sometimes happens though that although the atmosphere is markedly unstable above, say, 2 km a lid opposing surface-based convection exists at or below this level, due often to a layer of warmdry air that has become entrained in the airflow from some source. This means that surface temperatures must become very high to overcome this lid, often requiring additional triggers, such as low-level convergence or release of medium level potential instability by a mid-level trough, thus lifting the whole column and releasing the pent-up energy in a sudden burst. and the loaded gun will be fired, perhaps leading to a severe stormsupercell event. (see Spanish plume ). Layer(s) (as used in cloud forecasting in aviation products). Glossary terms M-R Operational model A term used to differentiate the primary NWP output from a particular centre from any ensemble products from the same source. The operational model will almost always be run at a higher resolution than that used for ensemble output. It must not, however be assumed that the OpOPER is necessarily the best outcome, particularly beyond 3 days or so. (see Ensemble ) Orographic forcing An airstream encountering a barrier to its passage is forced to go around or over the obstacle. The upward deflection of the airflow is sufficient to give rise to adiabatic cooling, and if the air is moist enough, the formation of clouds, precipitation etc. In addition, convergence of the flow on the windward side (due to a rapid decrease in velocity) when the air encounters a sharply graded barrier not only enhances the vertical motion, but also leads to a deformation of the flow which in turn alters the vorticity of the air particles. Thus, hill and mountain ranges are most important in a study of meteorology. Orographic rainfallsnowfall For precipitation to occur, other conditions being satisfied (i. e. enough humidity, required temperature structure, sufficient depth of cloud etc.), there must be a supply of upward motion through the cloud producing the rain, snow or whatever. In orographic precipitation, the forcing agent is provided by large ranges of hillsmountains blocking the flow of humid air in such a way that vertical (upward) currents of air are produced, leading to adiabatic cooling gtgt condensation gtgt cloud formationenhancement gtgt precipitation element growth. Orographic forcing OF ITSELF usually only produces small amounts of precipitation, but can be the means of enhancing or triggering other mechanisms (e. g. convective activity), and is one of the important elements in the seeder-feeder model (q. v.). Computer models in operational use do now have sufficiently realistic orography and vertical resolution to model such, but the output (usually) does not explicitly define orographic precipitation. Orographic waves (in an NWP ensemble suite) When considering a collection of solutions at a particular lead time (from a single-centre, or as part of a Poor Mans ensemble ), some clustering is usually observed - i. e. a large number of members pointing to a similar outcome. However, as lead times get longer (especially beyond 72 hours), one or two members may depart significantly from the ensemble mean, (andor mode of larger clusters) - these are termed outliers such indications carry small weight, but cannot be totally ignored. In particular, at extended range (beyond about 7 days), there may be no clear clustering signal, and an outlier is just as likely to be right as a solution nearer the meanmedian of the output. (See also ensemble. clusters. ensemble mean etc.) (abbr) Overcast 8 oktas (cloud amount, as used in aviation reports, forecasts etc.) Overconvection The promise of a fine, sunny day is sometimes spoiled because cumulus cloud builds and spread out into an almost unbroken sheet of stratocumulus by late morning - which then refuses to break up for the rest of the day. For this to occur, there must be a marked inversion (see quotWhat is an inversionquot ) within 100 to 300 hPa of the surface, which must be intense enough to stop convective currents breaking - through the inversion even at maximum temperature the convective condensation level (CCL ) must be at least 60 hPa below the inversion level, and the layer between CCL and inversion must have a reasonably high relative humidity. For some rather obscure reason, this phenomenon has come to be called overconvection (at least in the UK, probably originating within the gliding community) - possibly because convective cumulus spills-over to cover the sky ( though, other conditions being right, the cloud may disperse around or just after dusk.) Over-running trough An active, mid-latitude frontal system is associated with a marked short-wave trough. The active weather associated with the front lies forward of the trough, driven by the dynamics associated with it. At some stage in its life though, the trough (or a portion of it) will relax (and effectively weaken), allowing the trough to run well ahead of the lower-tropospheric portion of the frontal system - it over-runs the (surface) location of the front, and the activity at that position will decay. Note however that the upper trough will still have weather associated with it - and may be the means of driving an uppersplit frontal structure well away from the classical surface front as drawn on conventional analyses. See also quotWhy fronts die quot. Pascal - allocated in honour of Blaise Pascal, to a unit of one Nm 2. the basic unit of pressure in the SI system. Parametrisation Some atmospheric processes are below the grid-scalewavelength of operational meteorological computer models and cannot be handled explicitly by such schemes - for example individual showers, which are not only important for local weather, but have a feedback effect within the atmosphere that needs to be included in the NWP routines to maintain a realistic model of the real atmosphere. Larger scale model parameters (e. g. wind vector, temperature, humidity) are used to diagnose and represent the effects of such sub-gridscale processes: this is know as parametrisation. See HERE Partial drought A period of at least 29 consecutive days, whose mean daily precipitation does not exceed 0.01 inches 0.2 mm. (See also drought absolute drought dry spell ). Is the study of times of naturally occurring events, such as the first blossom appearance in a long established species, or the departure of migratory birds. From 1875 to 1948, a register of such events was maintained by the Royal Met. Society, but after a period when the science was in the doldrums, the Woodland Trust and Centre for Ecology amp Hydrology combined in the late 1990s to kick-start the observing network, recognising that such data can complement studies into long-term climate change. For more detail, see:- phenology. org. uk (abbr) Ice Pellets (was PE) used in aviation weather reports. (abbr) Polar mesospheric clouds (or Noctilucent clouds ). (abbr) Pressure at mean sea level: often seen in connection with NWP model products. (abbr) Well developed sanddust swirls as used in aviation weather reports. Polar front A boundary that separates polar air masses from tropical air masses . Polar mesocyclones A term now used to encompass the whole family of disturbances resulting from arctic air flowing equatorward over progressively warmer seas the term Polar Low (q. vabove) is now often used only for extreme systems where gale or near gale-force winds are observed. Polar mesospheric clouds quot Poor Mansquot Ensemble Technique A true NWP ensemble (q. v.) is the product of multiple iterations of a single atmospheric model on a single centres computer: the individual members of the ensemble run are obtained by perturbing the initial conditions very slightly to simulate the uncertainty that is always present at analysis time. However, long before these techniques were perfected, operational forecasters would (and still do) absorb the differing output from various international centres (e. g. EC, NCEP, DWD etc.) and or different runs from the same centre - treating all the various outputs as members of what has been dubbed a Poor Mans Ensemble. As with true ensembles, the more model runs that agree at a certain lead time, the higher is the confidence in that particular solution. Potential Instability (also known as Convective Instability) Said to exist when forced lifting (e. g. ascent over mountains or broad - scaledynamic ascent) causes a layer, initially (just) stable to such forced ascent to become unstable. Decreasing humidity aloft is required within the layer, and heavy rainthunder can be the result. Theta-W or Theta-E (q. v.) difference charts are often used to find such areas of potential instability: the usual levels used are at 850hPa and 500hPa. The value at 850hPa is subtracted from that found for 500hPa, and negative values so found indicate potential instability. Only slightly negative differences can lead to some significant convective activity. all other factors being favourable of course. Such layers can also be inferred using a thermodynamic diagram (or tabular listing of Theta E or Theta W), noting where values decrease with increasing altitude within the low-to-middle troposphere (roughly up to 400hPa). Potential Vorticity The ratio of the absolute vorticity (q. v.) of an atmospheric column to the (defined) pressure difference across the column. This quantity is used to label air in much the same way as we use other conservative properties. As a column of air moves along, it shrinks vertically (due to mass divergence) in just the right amount to decrease its absolute vorticity as it expands vertically (due to mass convergence), its absolute vorticity increases. Therefore, Potential Vorticity tends to remain constant following the motion of the flow, for adiabatic motion. Precipitation Anything precipitated by clouds (rain, snow, hail, drizzle etc.) is covered by this noun. Often abbreviated to ppn or pptn. (for definitions of various types of precipitation, see:- quotBeaufort Lettersquot ) Pressure Gradient The difference in atmospheric pressure over a defined (usually horizontal) distance. (See quotWhy does the wind blowquot ) Pressure Gradient Force (abbr. PGF)The force exerted on the air due to a pressure gradient, causing a tendency for movement (i. e. wind) from areas of high pressure to areas of low pressure.(See quotWhy does the wind blowquot ) Prevailing visibility Prevailing wind The most frequent wind direction for any particular location in a given period, e. g. a day, month, year or climatological period. Partial fog (i. e. fog quotbanksquot substantial portion of airfield covered by fog - but not completely visibility lt 1000m.) Probability (as used in aviation forecasts, e. g. TAFs in the latter, under current 2002 rules, only PROB30 or PROB40 are allowed, e. g. moderate probability of an event occurring. Probability forecasting Given that there is always a measure of uncertainty in forecasting the weather, the likelihood of an event happening can be expressed as a probability: thus a 70 chance of rain, 20 chance of thunderstorms etc. Often useful in finely balanced situations i. e. rain vs. snow severe storms vs. no storm etc. (see also Deterministic forecasts). Progression When large scale features in the upper air, such as a 500 or 300 hPa troughvortex drift west-to-east this is said to be a normal progression of the pattern. (See also retrogression). (abbr: Polar Stratospheric Clouds) During the polar night (i. e. the period in the middle of the winter when insolation does not penetrate to ultra-high latitudes), the stratosphere cools significantly leading to closed-loop circulations (both vertical and horizontal) which virtually isolate these polar stratospheric regions - the quotPolar night vortexquot is found, within which temperatures can be found well below (minus)75degC. In these extremely cold conditions, clouds are observed to form in the stratosphere, which appear to be composed of a combination of nitric acid and water. Stratospheric clouds can also form from ordinary water ice (i. e. as in the troposphere) but these are much less common at these high altitudes as the stratosphere is normally very dry and water-ice clouds only form at the lowest temperatures. The presence of PSCs and the part they play in the chemical interactions at these levels have been a subject of much debate in recent years. (See main FAQ here for the Stratospheric Night Jet and here for Stratosphere amp various web-sites dealing with upper atmosphere ozone depletion.) Pulse storms (a term often used in North America) Random air-mass thunderstorms forming in an environment of little or no vertical wind shear, which appear as individual returns (without any obvious organisation) on radarhigh-resolution satellite imagery systems. They usually last 20 to 30 minutes, perhaps up to 60 minutes, and give rise to small hail, sometimes heavy rain and perhaps weak tornadoes. They can be regarded as a more intense version of the single-cell convective type discussed in the main FAQ here. i. e. higher CAPE values are involved than for an ordinary shower. PVA region An area where marked advection (movement) of positive, or cyclonic vorticity (q. v.) is occurring - hence P ositive V orticity A dvection often associated with a small upper trough running through the broadscale upper pattern. Cyclonic development will occur - other factors being favourable. Pressure at airfield level set on an aircraft (pressure) altimeter when height above local aerodrome level (strictly the official threshold elevation) is required. Pressure at mean sea level (reduced according to actualmean temperature). Pressure at mean sea level (reduced according to ISA profile) set on an aircraft (pressure) altimeter when height above local mean sea level is required. (abbr) Rain as used in aviation (e. g. METARTAF) reports. This is the transmission of energy by electromagnetic waves, which may be propagated through a substance or through a vacuum at the speed of light. Electromagnetic radiation is divided into various classes on the basis of wavelengths these are, in order of increasing wavelength: gamma radiation, X-rays, ultra-violet (UV) radiation, visible (VIS) light, infra-red (IR) radiation and radio waves. Radiosonde ( sometimes abbreviated to RS ) An instrument that measures temperature, pressure and humidity of the atmosphere as it is carried aloft on a balloon. The quotsondequot transmits its measurements to a ground-based radio receiver via radio signals, and by accurate tracking (radar or satellite) of the sonde unit, upper winds can be deduced. A period of 24hr, conventionally beginning at 09UTC, during which precipitation of 0.2mm or more has been recorded. (See also Wet day ). Relative humidity (RH) See the main FAQ here . Relative Vorticity The vorticity (or tendency for air particles to spin) relative to the earth. It can be considered for practical purposes (and crudely assessed on meteorological charts) as the combination of two factors: (i): the spin imparted due to the curved path that air takes in its passage through the atmosphere (cyclonically curved contourspositive, anticyclonically curved contoursnegative). (ii) the other factor is due to the shear developed along the flow due to the differing velocities of the moving particles. Swiftly moving air will generate a twist element relative to the lower-velocity flow on either side shear vorticity : where the twisting generated is in a cyclonic sense, that is counted as positive where in the anticyclonic sense then it is negative. ( See also Vorticity Absolute Vorticity .) Relaxation When the amplitude of a trough decreases with time, the trough is said to have undergone relaxation. The change is usually measured in terms of a latitude change of a chosen contour or thickness line. Retrogression When an upper trough (or ridge) moves against the normal west-to-east flow in mid-latitudes, the feature is retrogressing, or undergoing retrogression. (abbr)Relative Humidity (expressed as a value). See Upper ridge Ridge amplification When contour heights along the axis of an upper ridge increase, the ridge is amplifying. Right entrance On the warm side of the jet core, in the region of maximum acceleration of flow. Often associated with marked cyclogenesis. (abbr) Saturated adiabatic lapse rate. The rate of cooling (variable) of a saturated air sample rising in the atmosphere. (see quotStable and unstable air massesquot ) Saturation The condition air reaches when it contains the most water in the vapour state that it is capable of holding at any particular temperature. If any more vapour is injected into the sample (or if the sample is cooled), then condensation will occur. (abbr) Stratocumulus (SC in METARSIGWX charts etc. Sc otherwise) a low-level cloud type, varying from thin, well broken layers with little impact for aviationgeneral weather, to deep, sometimes unstable character giving rise to persistent PPN, and a risk of moderate turbulence amp moderate (some situations severe) icing. (abbr) Scattered (3 or 4 oktas) cloud amounts used in aviation reports, forecasts etc. (of historical note, SCT used to mean 1 to 4 oktas, until the introduction of FEW on revamp of the METAR code in the 1990s). In mid-latitudes, we are used to the idea of the four seasons: spring, summer, autumn and winter. For climatological accounting purposes, these are defined using three calendar month blocks thus:- March, April amp May spring June, July amp August summer September, October amp November autumn and December, January amp February winter. (For more, see quotHow are the seasons definedquot ) Seclusion process During the process of rapid cyclogenesis (q. v.), the standard Norwegian theory of development leading to an occluded front is not appropriate. What appears to happen is that the original cold front becomes weakill-defined (close to the low centre), and a new cold front appears further to the west. (This is effectively what has been drawn in the past as a back-bent occlusion). So, what happens to the warm air associated with the warm frontal zone near the low centre Around and immediately to the equatorward side of the low, it becomes trapped or secluded from the rest of the development in a discrete region enclosed by relatively colder air encircling the development - a so-called seclusion. (This is therefore a different process from that producing the classical occlusion whereby warm-sector air is lifted by the advancing cold air.) (see quotWhat is the Shapiro-Keyser clyclone modelquot ) Seederfeeder mechanism When very moist (e. g. tropical maritime) air flow is forced to rise over upland areas, thick layers of stratus or stratocumulus cloud form. As noted elsewhere, these orographic clouds of themselves produce relatively little rainfall (in a thermally stable environment). If however rain is already occurring from medium layer cloud (thick altostratus, nimbostratus) seeder clouds, it will have to fall through the low-level feeder cloud, with collisioncollection processes markedly enhancing the net rainfall rate at the surface. This effect often produces prolonged heavy rainfall in the warm conveyor regime within a warm sector, particularly if the system is slow-moving. Sensible heat transfer The transfer of heat by conduction and convection (i. e. it can be sensed or detected directly). (abbr) Severe (as in SEV ICE, for severe icing). Severe Gale The definition of a Severe GaleForce 9 is strict for operational (UK) forecasting for maritime purposes. Either the mean (10 minute) wind must be 41 knots or more, up to 47 knots or the gusts must be 52 knots or more, up to 60 knots. The term will also be heard on broadcast weather forecasts, although its arguable that the general population cannot be expected to know what this definition is, and the practice now is to explicitly forecast gust values rather than just relying on the adjective severe to imply possible problems. (See also Gale. Storm and notes at the Beaufort wind scale .) Part of the WMO header code used in bulletins that carry atmospheric reports, more commonly known as sferics, or SFLOCS. (See quotWhat are sfericsquot ) (abbr) Snow grains as used in aviation weather reports. Crowdsourcing is a very popular means of obtaining the large amounts of labeled data that modern machine learning methods require. Although cheap and fast to obtain, crowdsourced labels suffer from significant amounts of error, thereby degrading the performance of downstream machine learning tasks. With the goal of improving the quality of the labeled data, we seek to mitigate the many errors that occur due to silly mistakes or inadvertent errors by crowdsourcing workers. We propose a two-stage setting for crowdsourcing where the worker first answers the questions, and is then allowed to change her answers after looking at a (noisy) reference answer. We mathematically formulate this process and develop mechanisms to incentivize workers to act appropriately. Our mathematical guarantees show that our mechanism incentivizes the workers to answer honestly in both stages, and refrain from answering randomly in the first stage or simply copying in the second. Numerical experiments reveal a significant boost in performance that such 8220self-correction8221 can provide when using crowdsourcing to train machine learning algorithms. There are various parametric models for analyzing pairwise comparison data, including the Bradley-Terry-Luce (BTL) and Thurstone models, but their reliance on strong parametric assumptions is limiting. In this work, we study a flexible model for pairwise comparisons, under which the probabilities of outcomes are required only to satisfy a natural form of stochastic transitivity. This class includes parametric models including the BTL and Thurstone models as special cases, but is considerably more general. We provide various examples of models in this broader stochastically transitive class for which classical parametric models provide poor fits. Despite this greater flexibility, we show that the matrix of probabilities can be estimated at the same rate as in standard parametric models. On the other hand, unlike in the BTL and Thurstone models, computing the minimax-optimal estimator in the stochastically transitive model is non-trivial, and we explore various computationally tractable alternatives. We show that a simple singular value thresholding algorithm is statistically consistent but does not achieve the minimax rate. We then propose and study algorithms that achieve the minimax rate over interesting sub-classes of the full stochastically transitive class. We complement our theoretical results with thorough numerical simulations. We show how any binary pairwise model may be uprooted to a fully symmetric model, wherein original singleton potentials are transformed to potentials on edges to an added variable, and then rerooted to a new model on the original number of variables. The new model is essentially equivalent to the original model, with the same partition function and allowing recovery of the original marginals or a MAP conguration, yet may have very different computational properties that allow much more efficient inference. This meta-approach deepens our understanding, may be applied to any existing algorithm to yield improved methods in practice, generalizes earlier theoretical results, and reveals a remarkable interpretation of the triplet-consistent polytope. We show how deep learning methods can be applied in the context of crowdsourcing and unsupervised ensemble learning. First, we prove that the popular model of Dawid and Skene, which assumes that all classifiers are conditionally independent, is to a Restricted Boltzmann Machine (RBM) with a single hidden node. Hence, under this model, the posterior probabilities of the true labels can be instead estimated via a trained RBM. Next, to address the more general case, where classifiers may strongly violate the conditional independence assumption, we propose to apply RBM-based Deep Neural Net (DNN). Experimental results on various simulated and real-world datasets demonstrate that our proposed DNN approach outperforms other state-of-the-art methods, in particular when the data violates the conditional independence assumption. Revisiting Semi-Supervised Learning with Graph Embeddings Zhilin Yang Carnegie Mellon University . William Cohen CMU . Ruslan Salakhudinov U. of Toronto Paper AbstractWe present a semi-supervised learning framework based on graph embeddings. Given a graph between instances, we train an embedding for each instance to jointly predict the class label and the neighborhood context in the graph. We develop both transductive and inductive variants of our method. In the transductive variant of our method, the class labels are determined by both the learned embeddings and input feature vectors, while in the inductive variant, the embeddings are defined as a parametric function of the feature vectors, so predictions can be made on instances not seen during training. On a large and diverse set of benchmark tasks, including text classification, distantly supervised entity extraction, and entity classification, we show improved performance over many of the existing models. Reinforcement learning can acquire complex behaviors from high-level specifications. However, defining a cost function that can be optimized effectively and encodes the correct task is challenging in practice. We explore how inverse optimal control (IOC) can be used to learn behaviors from demonstrations, with applications to torque control of high-dimensional robotic systems. Our method addresses two key challenges in inverse optimal control: first, the need for informative features and effective regularization to impose structure on the cost, and second, the difficulty of learning the cost function under unknown dynamics for high-dimensional continuous systems. To address the former challenge, we present an algorithm capable of learning arbitrary nonlinear cost functions, such as neural networks, without meticulous feature engineering. To address the latter challenge, we formulate an efficient sample-based approximation for MaxEnt IOC. We evaluate our method on a series of simulated tasks and real-world robotic manipulation problems, demonstrating substantial improvement over prior methods both in terms of task complexity and sample efficiency. In learning latent variable models (LVMs), it is important to effectively capture infrequent patterns and shrink model size without sacrificing modeling power. Various studies have been done to 8220diversify8221 a LVM, which aim to learn a diverse set of latent components in LVMs. Most existing studies fall into a frequentist-style regularization framework, where the components are learned via point estimation. In this paper, we investigate how to 8220diversify8221 LVMs in the paradigm of Bayesian learning, which has advantages complementary to point estimation, such as alleviating overfitting via model averaging and quantifying uncertainty. We propose two approaches that have complementary advantages. One is to define diversity-promoting mutual angular priors which assign larger density to components with larger mutual angles based on Bayesian network and von Mises-Fisher distribution and use these priors to affect the posterior via Bayes rule. We develop two efficient approximate posterior inference algorithms based on variational inference and Markov chain Monte Carlo sampling. The other approach is to impose diversity-promoting regularization directly over the post-data distribution of components. These two methods are applied to the Bayesian mixture of experts model to encourage the 8220experts8221 to be diverse and experimental results demonstrate the effectiveness and efficiency of our methods. High dimensional nonparametric regression is an inherently difficult problem with known lower bounds depending exponentially in dimension. A popular strategy to alleviate this curse of dimensionality has been to use additive models of emph , which model the regression function as a sum of independent functions on each dimension. Though useful in controlling the variance of the estimate, such models are often too restrictive in practical settings. Between non-additive models which often have large variance and first order additive models which have large bias, there has been little work to exploit the trade-off in the middle via additive models of intermediate order. In this work, we propose salsa, which bridges this gap by allowing interactions between variables, but controls model capacity by limiting the order of interactions. salsas minimises the residual sum of squares with squared RKHS norm penalties. Algorithmically, it can be viewed as Kernel Ridge Regression with an additive kernel. When the regression function is additive, the excess risk is only polynomial in dimension. Using the Girard-Newton formulae, we efficiently sum over a combinatorial number of terms in the additive expansion. Via a comparison on 15 real datasets, we show that our method is competitive against 21 other alternatives. We propose an extension to Hawkes processes by treating the levels of self-excitation as a stochastic differential equation. Our new point process allows better approximation in application domains where events and intensities accelerate each other with correlated levels of contagion. We generalize a recent algorithm for simulating draws from Hawkes processes whose levels of excitation are stochastic processes, and propose a hybrid Markov chain Monte Carlo approach for model fitting. Our sampling procedure scales linearly with the number of required events and does not require stationarity of the point process. A modular inference procedure consisting of a combination between Gibbs and Metropolis Hastings steps is put forward. We recover expectation maximization as a special case. Our general approach is illustrated for contagion following geometric Brownian motion and exponential Langevin dynamics. Rank aggregation systems collect ordinal preferences from individuals to produce a global ranking that represents the social preference. To reduce the computational complexity of learning the global ranking, a common practice is to use rank-breaking. Individuals preferences are broken into pairwise comparisons and then applied to efficient algorithms tailored for independent pairwise comparisons. However, due to the ignored dependencies, naive rank-breaking approaches can result in inconsistent estimates. The key idea to produce unbiased and accurate estimates is to treat the paired comparisons outcomes unequally, depending on the topology of the collected data. In this paper, we provide the optimal rank-breaking estimator, which not only achieves consistency but also achieves the best error bound. This allows us to characterize the fundamental tradeoff between accuracy and complexity in some canonical scenarios. Further, we identify how the accuracy depends on the spectral gap of a corresponding comparison graph. Dropout distillation Samuel Rota Bul FBK . Lorenzo Porzi FBK . Peter Kontschieder Microsoft Research Cambridge Paper AbstractDropout is a popular stochastic regularization technique for deep neural networks that works by randomly dropping (i. e. zeroing) units from the network during training. This randomization process allows to implicitly train an ensemble of exponentially many networks sharing the same parametrization, which should be averaged at test time to deliver the final prediction. A typical workaround for this intractable averaging operation consists in scaling the layers undergoing dropout randomization. This simple rule called 8216standard dropout8217 is efficient, but might degrade the accuracy of the prediction. In this work we introduce a novel approach, coined 8216dropout distillation8217, that allows us to train a predictor in a way to better approximate the intractable, but preferable, averaging process, while keeping under control its computational efficiency. We are thus able to construct models that are as efficient as standard dropout, or even more efficient, while being more accurate. Experiments on standard benchmark datasets demonstrate the validity of our method, yielding consistent improvements over conventional dropout. Metadata-conscious anonymous messaging Giulia Fanti UIUC . Peter Kairouz UIUC . Sewoong Oh UIUC . Kannan Ramchandran UC Berkeley . Pramod Viswanath UIUC Paper AbstractAnonymous messaging platforms like Whisper and Yik Yak allow users to spread messages over a network (e. g. a social network) without revealing message authorship to other users. The spread of messages on these platforms can be modeled by a diffusion process over a graph. Recent advances in network analysis have revealed that such diffusion processes are vulnerable to author deanonymization by adversaries with access to metadata, such as timing information. In this work, we ask the fundamental question of how to propagate anonymous messages over a graph to make it difficult for adversaries to infer the source. In particular, we study the performance of a message propagation protocol called adaptive diffusion introduced in (Fanti et al. 2015). We prove that when the adversary has access to metadata at a fraction of corrupted graph nodes, adaptive diffusion achieves asymptotically optimal source-hiding and significantly outperforms standard diffusion. We further demonstrate empirically that adaptive diffusion hides the source effectively on real social networks. The Teaching Dimension of Linear Learners Ji Liu University of Rochester . Xiaojin Zhu University of Wisconsin . Hrag Ohannessian University of Wisconsin-Madison Paper AbstractTeaching dimension is a learning theoretic quantity that specifies the minimum training set size to teach a target model to a learner. Previous studies on teaching dimension focused on version-space learners which maintain all hypotheses consistent with the training data, and cannot be applied to modern machine learners which select a specific hypothesis via optimization. This paper presents the first known teaching dimension for ridge regression, support vector machines, and logistic regression. We also exhibit optimal training sets that match these teaching dimensions. Our approach generalizes to other linear learners. Truthful Univariate Estimators Ioannis Caragiannis University of Patras . Ariel Procaccia Carnegie Mellon University . Nisarg Shah Carnegie Mellon University Paper AbstractWe revisit the classic problem of estimating the population mean of an unknown single-dimensional distribution from samples, taking a game-theoretic viewpoint. In our setting, samples are supplied by strategic agents, who wish to pull the estimate as close as possible to their own value. In this setting, the sample mean gives rise to manipulation opportunities, whereas the sample median does not. Our key question is whether the sample median is the best (in terms of mean squared error) truthful estimator of the population mean. We show that when the underlying distribution is symmetric, there are truthful estimators that dominate the median. Our main result is a characterization of worst-case optimal truthful estimators, which provably outperform the median, for possibly asymmetric distributions with bounded support. Why Regularized Auto-Encoders learn Sparse Representation Devansh Arpit SUNY Buffalo . Yingbo Zhou SUNY Buffalo . Hung Ngo SUNY Buffalo . Venu Govindaraju SUNY Buffalo Paper AbstractSparse distributed representation is the key to learning useful features in deep learning algorithms, because not only it is an efficient mode of data representation, but also 8212 more importantly 8212 it captures the generation process of most real world data. While a number of regularized auto-encoders (AE) enforce sparsity explicitly in their learned representation and others don8217t, there has been little formal analysis on what encourages sparsity in these models in general. Our objective is to formally study this general problem for regularized auto-encoders. We provide sufficient conditions on both regularization and activation functions that encourage sparsity. We show that multiple popular models (de-noising and contractive auto encoders, e. g.) and activations (rectified linear and sigmoid, e. g.) satisfy these conditions thus, our conditions help explain sparsity in their learned representation. Thus our theoretical and empirical analysis together shed light on the properties of regularizationactivation that are conductive to sparsity and unify a number of existing auto-encoder models and activation functions under the same analytical framework. k-variates: more pluses in the k-means Richard Nock Nicta 038 ANU . Raphael Canyasse Ecole Polytechnique and The Technion . Roksana Boreli Data61 . Frank Nielsen Ecole Polytechnique and Sony CS Labs Inc. Paper Abstractk-means seeding has become a de facto standard for hard clustering algorithms. In this paper, our first contribution is a two-way generalisation of this seeding, k-variates, that includes the sampling of general densities rather than just a discrete set of Dirac densities anchored at the point locations, textit a generalisation of the well known Arthur-Vassilvitskii (AV) approximation guarantee, in the form of a textit approximation bound of the textit optimum. This approximation exhibits a reduced dependency on the 8220noise8221 component with respect to the optimal potential 8212 actually approaching the statistical lower bound. We show that k-variates textit to efficient (biased seeding) clustering algorithms tailored to specific frameworks these include distributed, streaming and on-line clustering, with textit approximation results for these algorithms. Finally, we present a novel application of k-variates to differential privacy. For either the specific frameworks considered here, or for the differential privacy setting, there is little to no prior results on the direct application of k-means and its approximation bounds 8212 state of the art contenders appear to be significantly more complex and or display less favorable (approximation) properties. We stress that our algorithms can still be run in cases where there is textit closed form solution for the population minimizer. We demonstrate the applicability of our analysis via experimental evaluation on several domains and settings, displaying competitive performances vs state of the art. Multi-Player Bandits 8212 a Musical Chairs Approach Jonathan Rosenski Weizmann Institute of Science . Ohad Shamir Weizmann Institute of Science . Liran Szlak Weizmann Institute of Science Paper AbstractWe consider a variant of the stochastic multi-armed bandit problem, where multiple players simultaneously choose from the same set of arms and may collide, receiving no reward. This setting has been motivated by problems arising in cognitive radio networks, and is especially challenging under the realistic assumption that communication between players is limited. We provide a communication-free algorithm (Musical Chairs) which attains constant regret with high probability, as well as a sublinear-regret, communication-free algorithm (Dynamic Musical Chairs) for the more difficult setting of players dynamically entering and leaving throughout the game. Moreover, both algorithms do not require prior knowledge of the number of players. To the best of our knowledge, these are the first communication-free algorithms with these types of formal guarantees. The Information Sieve Greg Ver Steeg Information Sciences Institute . Aram Galstyan Information Sciences Institute Paper AbstractWe introduce a new framework for unsupervised learning of representations based on a novel hierarchical decomposition of information. Intuitively, data is passed through a series of progressively fine-grained sieves. Each layer of the sieve recovers a single latent factor that is maximally informative about multivariate dependence in the data. The data is transformed after each pass so that the remaining unexplained information trickles down to the next layer. Ultimately, we are left with a set of latent factors explaining all the dependence in the original data and remainder information consisting of independent noise. We present a practical implementation of this framework for discrete variables and apply it to a variety of fundamental tasks in unsupervised learning including independent component analysis, lossy and lossless compression, and predicting missing values in data. Deep Speech 2. End-to-End Speech Recognition in English and Mandarin Dario Amodei . Rishita Anubhai . Eric Battenberg . Carl Case . Jared Casper . Bryan Catanzaro . JingDong Chen . Mike Chrzanowski Baidu USA, Inc. . Adam Coates . Greg Diamos Baidu USA, Inc. . Erich Elsen Baidu USA, Inc. . Jesse Engel . Linxi Fan . Christopher Fougner . Awni Hannun Baidu USA, Inc. . Billy Jun . Tony Han . Patrick LeGresley . Xiangang Li Baidu . Libby Lin . Sharan Narang . Andrew Ng . Sherjil Ozair . Ryan Prenger . Sheng Qian Baidu . Jonathan Raiman . Sanjeev Satheesh Baidu SVAIL . David Seetapun . Shubho Sengupta . Chong Wang . Yi Wang . Zhiqian Wang . Bo Xiao . Yan Xie Baidu . Dani Yogatama . Jun Zhan . zhenyao Zhu Paper AbstractWe show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speechtwo vastly different languages. Because it replaces entire pipelines of hand-engineered components with neural networks, end-to-end learning allows us to handle a diverse variety of speech including noisy environments, accents and different languages. Key to our approach is our application of HPC techniques, enabling experiments that previously took weeks to now run in days. This allows us to iterate more quickly to identify superior architectures and algorithms. As a result, in several cases, our system is competitive with the transcription of human workers when benchmarked on standard datasets. Finally, using a technique called Batch Dispatch with GPUs in the data center, we show that our system can be inexpensively deployed in an online setting, delivering low latency when serving users at scale. An important question in feature selection is whether a selection strategy recovers the 8220true8221 set of features, given enough data. We study this question in the context of the popular Least Absolute Shrinkage and Selection Operator (Lasso) feature selection strategy. In particular, we consider the scenario when the model is misspecified so that the learned model is linear while the underlying real target is nonlinear. Surprisingly, we prove that under certain conditions, Lasso is still able to recover the correct features in this case. We also carry out numerical studies to empirically verify the theoretical results and explore the necessity of the conditions under which the proof holds. We propose minimum regret search (MRS), a novel acquisition function for Bayesian optimization. MRS bears similarities with information-theoretic approaches such as entropy search (ES). However, while ES aims in each query at maximizing the information gain with respect to the global maximum, MRS aims at minimizing the expected simple regret of its ultimate recommendation for the optimum. While empirically ES and MRS perform similar in most of the cases, MRS produces fewer outliers with high simple regret than ES. We provide empirical results both for a synthetic single-task optimization problem as well as for a simulated multi-task robotic control problem. CryptoNets: Applying Neural Networks to Encrypted Data with High Throughput and Accuracy Ran Gilad-Bachrach Microsoft Research . Nathan Dowlin Princeton . Kim Laine Microsoft Research . Kristin Lauter Microsoft Research . Michael Naehrig Microsoft Research . John Wernsing Microsoft Research Paper AbstractApplying machine learning to a problem which involves medical, financial, or other types of sensitive data, not only requires accurate predictions but also careful attention to maintaining data privacy and security. Legal and ethical requirements may prevent the use of cloud-based machine learning solutions for such tasks. In this work, we will present a method to convert learned neural networks to CryptoNets, neural networks that can be applied to encrypted data. This allows a data owner to send their data in an encrypted form to a cloud service that hosts the network. The encryption ensures that the data remains confidential since the cloud does not have access to the keys needed to decrypt it. Nevertheless, we will show that the cloud service is capable of applying the neural network to the encrypted data to make encrypted predictions, and also return them in encrypted form. These encrypted predictions can be sent back to the owner of the secret key who can decrypt them. Therefore, the cloud service does not gain any information about the raw data nor about the prediction it made. We demonstrate CryptoNets on the MNIST optical character recognition tasks. CryptoNets achieve 99 accuracy and can make around 59000 predictions per hour on a single PC. Therefore, they allow high throughput, accurate, and private predictions. Spectral methods for dimensionality reduction and clustering require solving an eigenproblem defined by a sparse affinity matrix. When this matrix is large, one seeks an approximate solution. The standard way to do this is the Nystrom method, which first solves a small eigenproblem considering only a subset of landmark points, and then applies an out-of-sample formula to extrapolate the solution to the entire dataset. We show that by constraining the original problem to satisfy the Nystrom formula, we obtain an approximation that is computationally simple and efficient, but achieves a lower approximation error using fewer landmarks and less runtime. We also study the role of normalization in the computational cost and quality of the resulting solution. As a widely used non-linear activation, Rectified Linear Unit (ReLU) separates noise and signal in a feature map by learning a threshold or bias. However, we argue that the classification of noise and signal not only depends on the magnitude of responses, but also the context of how the feature responses would be used to detect more abstract patterns in higher layers. In order to output multiple response maps with magnitude in different ranges for a particular visual pattern, existing networks employing ReLU and its variants have to learn a large number of redundant filters. In this paper, we propose a multi-bias non-linear activation (MBA) layer to explore the information hidden in the magnitudes of responses. It is placed after the convolution layer to decouple the responses to a convolution kernel into multiple maps by multi-thresholding magnitudes, thus generating more patterns in the feature space at a low computational cost. It provides great flexibility of selecting responses to different visual patterns in different magnitude ranges to form rich representations in higher layers. Such a simple and yet effective scheme achieves the state-of-the-art performance on several benchmarks. We propose a novel multi-task learning method that can minimize the effect of negative transfer by allowing asymmetric transfer between the tasks based on task relatedness as well as the amount of individual task losses, which we refer to as Asymmetric Multi-task Learning (AMTL). To tackle this problem, we couple multiple tasks via a sparse, directed regularization graph, that enforces each task parameter to be reconstructed as a sparse combination of other tasks, which are selected based on the task-wise loss. We present two different algorithms to solve this joint learning of the task predictors and the regularization graph. The first algorithm solves for the original learning objective using alternative optimization, and the second algorithm solves an approximation of it using curriculum learning strategy, that learns one task at a time. We perform experiments on multiple datasets for classification and regression, on which we obtain significant improvements in performance over the single task learning and symmetric multitask learning baselines. This paper illustrates a novel approach to the estimation of generalization error of decision tree classifiers. We set out the study of decision tree errors in the context of consistency analysis theory, which proved that the Bayes error can be achieved only if when the number of data samples thrown into each leaf node goes to infinity. For the more challenging and practical case where the sample size is finite or small, a novel sampling error term is introduced in this paper to cope with the small sample problem effectively and efficiently. Extensive experimental results show that the proposed error estimate is superior to the well known K-fold cross validation methods in terms of robustness and accuracy. Moreover it is orders of magnitudes more efficient than cross validation methods. We study the convergence properties of the VR-PCA algorithm introduced by cite for fast computation of leading singular vectors. We prove several new results, including a formal analysis of a block version of the algorithm, and convergence from random initialization. We also make a few observations of independent interest, such as how pre-initializing with just a single exact power iteration can significantly improve the analysis, and what are the convexity and non-convexity properties of the underlying optimization problem. We consider the problem of principal component analysis (PCA) in a streaming stochastic setting, where our goal is to find a direction of approximate maximal variance, based on a stream of i. i.d. data points in realsd. A simple and computationally cheap algorithm for this is stochastic gradient descent (SGD), which incrementally updates its estimate based on each new data point. However, due to the non-convex nature of the problem, analyzing its performance has been a challenge. In particular, existing guarantees rely on a non-trivial eigengap assumption on the covariance matrix, which is intuitively unnecessary. In this paper, we provide (to the best of our knowledge) the first eigengap-free convergence guarantees for SGD in the context of PCA. This also partially resolves an open problem posed in cite . Moreover, under an eigengap assumption, we show that the same techniques lead to new SGD convergence guarantees with better dependence on the eigengap. Dealbreaker: A Nonlinear Latent Variable Model for Educational Data Andrew Lan Rice University . Tom Goldstein University of Maryland . Richard Baraniuk Rice University . Christoph Studer Cornell University Paper AbstractStatistical models of student responses on assessment questions, such as those in homeworks and exams, enable educators and computer-based personalized learning systems to gain insights into students knowledge using machine learning. Popular student-response models, including the Rasch model and item response theory models, represent the probability of a student answering a question correctly using an affine function of latent factors. While such models can accurately predict student responses, their ability to interpret the underlying knowledge structure (which is certainly nonlinear) is limited. In response, we develop a new, nonlinear latent variable model that we call the dealbreaker model, in which a students success probability is determined by their weakest concept mastery. We develop efficient parameter inference algorithms for this model using novel methods for nonconvex optimization. We show that the dealbreaker model achieves comparable or better prediction performance as compared to affine models with real-world educational datasets. We further demonstrate that the parameters learned by the dealbreaker model are interpretablethey provide key insights into which concepts are critical (i. e. the dealbreaker) to answering a question correctly. We conclude by reporting preliminary results for a movie-rating dataset, which illustrate the broader applicability of the dealbreaker model. We derive a new discrepancy statistic for measuring differences between two probability distributions based on combining Stein8217s identity and the reproducing kernel Hilbert space theory. We apply our result to test how well a probabilistic model fits a set of observations, and derive a new class of powerful goodness-of-fit tests that are widely applicable for complex and high dimensional distributions, even for those with computationally intractable normalization constants. Both theoretical and empirical properties of our methods are studied thoroughly. Variable Elimination in the Fourier Domain Yexiang Xue Cornell University . Stefano Ermon . Ronan Le Bras Cornell University . Carla . Bart Paper AbstractThe ability to represent complex high dimensional probability distributions in a compact form is one of the key insights in the field of graphical models. Factored representations are ubiquitous in machine learning and lead to major computational advantages. We explore a different type of compact representation based on discrete Fourier representations, complementing the classical approach based on conditional independencies. We show that a large class of probabilistic graphical models have a compact Fourier representation. This theoretical result opens up an entirely new way of approximating a probability distribution. We demonstrate the significance of this approach by applying it to the variable elimination algorithm. Compared with the traditional bucket representation and other approximate inference algorithms, we obtain significant improvements. Low-rank matrix approximation has been widely adopted in machine learning applications with sparse data, such as recommender systems. However, the sparsity of the data, incomplete and noisy, introduces challenges to the algorithm stability 8212 small changes in the training data may significantly change the models. As a result, existing low-rank matrix approximation solutions yield low generalization performance, exhibiting high error variance on the training dataset, and minimizing the training error may not guarantee error reduction on the testing dataset. In this paper, we investigate the algorithm stability problem of low-rank matrix approximations. We present a new algorithm design framework, which (1) introduces new optimization objectives to guide stable matrix approximation algorithm design, and (2) solves the optimization problem to obtain stable low-rank approximation solutions with good generalization performance. Experimental results on real-world datasets demonstrate that the proposed work can achieve better prediction accuracy compared with both state-of-the-art low-rank matrix approximation methods and ensemble methods in recommendation task. Given samples from two densities p and q, density ratio estimation (DRE) is the problem of estimating the ratio pq. Two popular discriminative approaches to DRE are KL importance estimation (KLIEP), and least squares importance fitting (LSIF). In this paper, we show that KLIEP and LSIF both employ class-probability estimation (CPE) losses. Motivated by this, we formally relate DRE and CPE, and demonstrate the viability of using existing losses from one problem for the other. For the DRE problem, we show that essentially any CPE loss (eg logistic, exponential) can be used, as this equivalently minimises a Bregman divergence to the true density ratio. We show how different losses focus on accurately modelling different ranges of the density ratio, and use this to design new CPE losses for DRE. For the CPE problem, we argue that the LSIF loss is useful in the regime where one wishes to rank instances with maximal accuracy at the head of the ranking. In the course of our analysis, we establish a Bregman divergence identity that may be of independent interest. We study nonconvex finite-sum problems and analyze stochastic variance reduced gradient (SVRG) methods for them. SVRG and related methods have recently surged into prominence for convex optimization given their edge over stochastic gradient descent (SGD) but their theoretical analysis almost exclusively assumes convexity. In contrast, we prove non-asymptotic rates of convergence (to stationary points) of SVRG for nonconvex optimization, and show that it is provably faster than SGD and gradient descent. We also analyze a subclass of nonconvex problems on which SVRG attains linear convergence to the global optimum. We extend our analysis to mini-batch variants of SVRG, showing (theoretical) linear speedup due to minibatching in parallel settings. Hierarchical Variational Models Rajesh Ranganath . Dustin Tran Columbia University . Blei David Columbia Paper AbstractBlack box variational inference allows researchers to easily prototype and evaluate an array of models. Recent advances allow such algorithms to scale to high dimensions. However, a central question remains: How to specify an expressive variational distribution that maintains efficient computation To address this, we develop hierarchical variational models (HVMs). HVMs augment a variational approximation with a prior on its parameters, which allows it to capture complex structure for both discrete and continuous latent variables. The algorithm we develop is black box, can be used for any HVM, and has the same computational efficiency as the original approximation. We study HVMs on a variety of deep discrete latent variable models. HVMs generalize other expressive variational distributions and maintains higher fidelity to the posterior. The field of mobile health (mHealth) has the potential to yield new insights into health and behavior through the analysis of continuously recorded data from wearable health and activity sensors. In this paper, we present a hierarchical span-based conditional random field model for the key problem of jointly detecting discrete events in such sensor data streams and segmenting these events into high-level activity sessions. Our model includes higher-order cardinality factors and inter-event duration factors to capture domain-specific structure in the label space. We show that our model supports exact MAP inference in quadratic time via dynamic programming, which we leverage to perform learning in the structured support vector machine framework. We apply the model to the problems of smoking and eating detection using four real data sets. Our results show statistically significant improvements in segmentation performance relative to a hierarchical pairwise CRF. Binary embeddings with structured hashed projections Anna Choromanska Courant Institute, NYU . Krzysztof Choromanski Google Research NYC . Mariusz Bojarski NVIDIA . Tony Jebara Columbia . Sanjiv Kumar . Yann Paper AbstractWe consider the hashing mechanism for constructing binary embeddings, that involves pseudo-random projections followed by nonlinear (sign function) mappings. The pseudorandom projection is described by a matrix, where not all entries are independent random variables but instead a fixed budget of randomness is distributed across the matrix. Such matrices can be efficiently stored in sub-quadratic or even linear space, provide reduction in randomness usage (i. e. number of required random values), and very often lead to computational speed ups. We prove several theoretical results showing that projections via various structured matrices followed by nonlinear mappings accurately preserve the angular distance between input high-dimensional vectors. To the best of our knowledge, these results are the first that give theoretical ground for the use of general structured matrices in the nonlinear setting. In particular, they generalize previous extensions of the Johnson - Lindenstrauss lemma and prove the plausibility of the approach that was so far only heuristically confirmed for some special structured matrices. Consequently, we show that many structured matrices can be used as an efficient information compression mechanism. Our findings build a better understanding of certain deep architectures, which contain randomly weighted and untrained layers, and yet achieve high performance on different learning tasks. We empirically verify our theoretical findings and show the dependence of learning via structured hashed projections on the performance of neural network as well as nearest neighbor classifier. A Variational Analysis of Stochastic Gradient Algorithms Stephan Mandt Columbia University . Matthew Hoffman Adobe Research . Blei David Columbia Paper AbstractStochastic Gradient Descent (SGD) is an important algorithm in machine learning. With constant learning rates, it is a stochastic process that, after an initial phase of convergence, generates samples from a stationary distribution. We show that SGD with constant rates can be effectively used as an approximate posterior inference algorithm for probabilistic modeling. Specifically, we show how to adjust the tuning parameters of SGD such as to match the resulting stationary distribution to the posterior. This analysis rests on interpreting SGD as a continuous-time stochastic process and then minimizing the Kullback-Leibler divergence between its stationary distribution and the target posterior. (This is in the spirit of variational inference.) In more detail, we model SGD as a multivariate Ornstein-Uhlenbeck process and then use properties of this process to derive the optimal parameters. This theoretical framework also connects SGD to modern scalable inference algorithms we analyze the recently proposed stochastic gradient Fisher scoring under this perspective. We demonstrate that SGD with properly chosen constant rates gives a new way to optimize hyperparameters in probabilistic models. This paper proposes a new mechanism for sampling training instances for stochastic gradient descent (SGD) methods by exploiting any side-information associated with the instances (for e. g. class-labels) to improve convergence. Previous methods have either relied on sampling from a distribution defined over training instances or from a static distribution that fixed before training. This results in two problems a) any distribution that is set apriori is independent of how the optimization progresses and b) maintaining a distribution over individual instances could be infeasible in large-scale scenarios. In this paper, we exploit the side information associated with the instances to tackle both problems. More specifically, we maintain a distribution over classes (instead of individual instances) that is adaptively estimated during the course of optimization to give the maximum reduction in the variance of the gradient. Intuitively, we sample more from those regions in space that have a textit gradient contribution. Our experiments on highly multiclass datasets show that our proposal converge significantly faster than existing techniques. Tensor regression has shown to be advantageous in learning tasks with multi-directional relatedness. Given massive multiway data, traditional methods are often too slow to operate on or suffer from memory bottleneck. In this paper, we introduce subsampled tensor projected gradient to solve the problem. Our algorithm is impressively simple and efficient. It is built upon projected gradient method with fast tensor power iterations, leveraging randomized sketching for further acceleration. Theoretical analysis shows that our algorithm converges to the correct solution in fixed number of iterations. The memory requirement grows linearly with the size of the problem. We demonstrate superior empirical performance on both multi-linear multi-task learning and spatio-temporal applications. This paper presents a novel distributed variational inference framework that unifies many parallel sparse Gaussian process regression (SGPR) models for scalable hyperparameter learning with big data. To achieve this, our framework exploits a structure of correlated noise process model that represents the observation noises as a finite realization of a high-order Gaussian Markov random process. By varying the Markov order and covariance function for the noise process model, different variational SGPR models result. This consequently allows the correlation structure of the noise process model to be characterized for which a particular variational SGPR model is optimal. We empirically evaluate the predictive performance and scalability of the distributed variational SGPR models unified by our framework on two real-world datasets. Online Stochastic Linear Optimization under One-bit Feedback Lijun Zhang Nanjing University . Tianbao Yang University of Iowa . Rong Jin Alibaba Group . Yichi Xiao Nanjing University . Zhi-hua Zhou Paper AbstractIn this paper, we study a special bandit setting of online stochastic linear optimization, where only one-bit of information is revealed to the learner at each round. This problem has found many applications including online advertisement and online recommendation. We assume the binary feedback is a random variable generated from the logit model, and aim to minimize the regret defined by the unknown linear function. Although the existing method for generalized linear bandit can be applied to our problem, the high computational cost makes it impractical for real-world applications. To address this challenge, we develop an efficient online learning algorithm by exploiting particular structures of the observation model. Specifically, we adopt online Newton step to estimate the unknown parameter and derive a tight confidence region based on the exponential concavity of the logistic loss. Our analysis shows that the proposed algorithm achieves a regret bound of O(dsqrt ), which matches the optimal result of stochastic linear bandits. We present an adaptive online gradient descent algorithm to solve online convex optimization problems with long-term constraints, which are constraints that need to be satisfied when accumulated over a finite number of rounds T, but can be violated in intermediate rounds. For some user-defined trade-off parameter beta in (0, 1), the proposed algorithm achieves cumulative regret bounds of O(Tmax ) and O(T ), respectively for the loss and the constraint violations. Our results hold for convex losses, can handle arbitrary convex constraints and rely on a single computationally efficient algorithm. Our contributions improve over the best known cumulative regret bounds of Mahdavi et al. (2012), which are respectively O(T12) and O(T34) for general convex domains, and respectively O(T23) and O(T23) when the domain is further restricted to be a polyhedral set. We supplement the analysis with experiments validating the performance of our algorithm in practice. Motivated by an application of eliciting users8217 preferences, we investigate the problem of learning hemimetrics, i. e. pairwise distances among a set of n items that satisfy triangle inequalities and non-negativity constraints. In our application, the (asymmetric) distances quantify private costs a user incurs when substituting one item by another. We aim to learn these distances (costs) by asking the users whether they are willing to switch from one item to another for a given incentive offer. Without exploiting structural constraints of the hemimetric polytope, learning the distances between each pair of items requires Theta(n2) queries. We propose an active learning algorithm that substantially reduces this sample complexity by exploiting the structural constraints on the version space of hemimetrics. Our proposed algorithm achieves provably-optimal sample complexity for various instances of the task. For example, when the items are embedded into K tight clusters, the sample complexity of our algorithm reduces to O(n K). Extensive experiments on a restaurant recommendation data set support the conclusions of our theoretical analysis. We present an approach for learning simple algorithms such as copying, multi-digit addition and single digit multiplication directly from examples. Our framework consists of a set of interfaces, accessed by a controller. Typical interfaces are 1-D tapes or 2-D grids that hold the input and output data. For the controller, we explore a range of neural network-based models which vary in their ability to abstract the underlying algorithm from training instances and generalize to test examples with many thousands of digits. The controller is trained using Q-learning with several enhancements and we show that the bottleneck is in the capabilities of the controller rather than in the search incurred by Q-learning. Learning Physical Intuition of Block Towers by Example Adam Lerer Facebook AI Research . Sam Gross Facebook AI Research . Rob Fergus Facebook AI Research Paper AbstractWooden blocks are a common toy for infants, allowing them to develop motor skills and gain intuition about the physical behavior of the world. In this paper, we explore the ability of deep feed-forward models to learn such intuitive physics. Using a 3D game engine, we create small towers of wooden blocks whose stability is randomized and render them collapsing (or remaining upright). This data allows us to train large convolutional network models which can accurately predict the outcome, as well as estimating the trajectories of the blocks. The models are also able to generalize in two important ways: (i) to new physical scenarios, e. g. towers with an additional block and (ii) to images of real wooden blocks, where it obtains a performance comparable to human subjects. Structure Learning of Partitioned Markov Networks Song Liu The Inst. of Stats. Mathe. . Taiji Suzuki . Masashi Sugiyama University of Tokyo . Kenji Fukumizu The Institute of Statistical Mathematics Paper AbstractWe learn the structure of a Markov Network between two groups of random variables from joint observations. Since modelling and learning the full MN structure may be hard, learning the links between two groups directly may be a preferable option. We introduce a novel concept called the emph whose factorization directly associates with the Markovian properties of random variables across two groups. A simple one-shot convex optimization procedure is proposed for learning the emph factorizations of the partitioned ratio and it is theoretically guaranteed to recover the correct inter-group structure under mild conditions. The performance of the proposed method is experimentally compared with the state of the art MN structure learning methods using ROC curves. Real applications on analyzing bipartisanship in US congress and pairwise DNAtime-series alignments are also reported. This work focuses on dynamic regret of online convex optimization that compares the performance of online learning to a clairvoyant who knows the sequence of loss functions in advance and hence selects the minimizer of the loss function at each step. By assuming that the clairvoyant moves slowly (i. e. the minimizers change slowly), we present several improved variation-based upper bounds of the dynamic regret under the true and noisy gradient feedback, which are in light of the presented lower bounds. The key to our analysis is to explore a regularity metric that measures the temporal changes in the clairvoyant8217s minimizers, to which we refer as path variation. Firstly, we present a general lower bound in terms of the path variation, and then show that under full information or gradient feedback we are able to achieve an optimal dynamic regret. Secondly, we present a lower bound with noisy gradient feedback and then show that we can achieve optimal dynamic regrets under a stochastic gradient feedback and two-point bandit feedback. Moreover, for a sequence of smooth loss functions that admit a small variation in the gradients, our dynamic regret under the two-point bandit feedback matches that is achieved with full information. Beyond CCA: Moment Matching for Multi-View Models Anastasia Podosinnikova INRIA 8211 ENS . Francis Bach Inria . Simon Lacoste-Julien INRIA Paper AbstractWe introduce three novel semi-parametric extensions of probabilistic canonical correlation analysis with identifiability guarantees. We consider moment matching techniques for estimation in these models. For that, by drawing explicit links between the new models and a discrete version of independent component analysis (DICA), we first extend the DICA cumulant tensors to the new discrete version of CCA. By further using a close connection with independent component analysis, we introduce generalized covariance matrices, which can replace the cumulant tensors in the moment matching framework, and, therefore, improve sample complexity and simplify derivations and algorithms significantly. As the tensor power method or orthogonal joint diagonalization are not applicable in the new setting, we use non-orthogonal joint diagonalization techniques for matching the cumulants. We demonstrate performance of the proposed models and estimation techniques on experiments with both synthetic and real datasets. We present two computationally inexpensive techniques for estimating the numerical rank of a matrix, combining powerful tools from computational linear algebra. These techniques exploit three key ingredients. The first is to approximate the projector on the non-null invariant subspace of the matrix by using a polynomial filter. Two types of filters are discussed, one based on Hermite interpolation and the other based on Chebyshev expansions. The second ingredient employs stochastic trace estimators to compute the rank of this wanted eigen-projector, which yields the desired rank of the matrix. In order to obtain a good filter, it is necessary to detect a gap between the eigenvalues that correspond to noise and the relevant eigenvalues that correspond to the non-null invariant subspace. The third ingredient of the proposed approaches exploits the idea of spectral density, popular in physics, and the Lanczos spectroscopic method to locate this gap. Unsupervised Deep Embedding for Clustering Analysis Junyuan Xie University of Washington . Ross Girshick Facebook . Ali Farhadi University of Washington Paper AbstractClustering is central to many data-driven application domains and has been studied extensively in terms of distance functions and grouping algorithms. Relatively little work has focused on learning representations for clustering. In this paper, we propose Deep Embedded Clustering (DEC), a method that simultaneously learns feature representations and cluster assignments using deep neural networks. DEC learns a mapping from the data space to a lower-dimensional feature space in which it iteratively optimizes a clustering objective. Our experimental evaluations on image and text corpora show significant improvement over state-of-the-art methods. Dimensionality reduction is a popular approach for dealing with high dimensional data that leads to substantial computational savings. Random projections are a simple and effective method for universal dimensionality reduction with rigorous theoretical guarantees. In this paper, we theoretically study the problem of differentially private empirical risk minimization in the projected subspace (compressed domain). Empirical risk minimization (ERM) is a fundamental technique in statistical machine learning that forms the basis for various learning algorithms. Starting from the results of Chaudhuri et al. (NIPS 2009, JMLR 2011), there is a long line of work in designing differentially private algorithms for empirical risk minimization problems that operate in the original data space. We ask: is it possible to design differentially private algorithms with small excess risk given access to only projected data In this paper, we answer this question in affirmative, by showing that for the class of generalized linear functions, we can obtain excess risk bounds of O(w(Theta) n ) under eps-differential privacy, and O((w(Theta)n) ) under (eps, delta)-differential privacy, given only the projected data and the projection matrix. Here n is the sample size and w(Theta) is the Gaussian width of the parameter space that we optimize over. Our strategy is based on adding noise for privacy in the projected subspace and then lifting the solution to original space by using high-dimensional estimation techniques. A simple consequence of these results is that, for a large class of ERM problems, in the traditional setting (i. e. with access to the original data), under eps-differential privacy, we improve the worst-case risk bounds of Bassily et al. (FOCS 2014). We consider the maximum likelihood parameter estimation problem for a generalized Thurstone choice model, where choices are from comparison sets of two or more items. We provide tight characterizations of the mean square error, as well as necessary and sufficient conditions for correct classification when each item belongs to one of two classes. These results provide insights into how the estimation accuracy depends on the choice of a generalized Thurstone choice model and the structure of comparison sets. We find that for a priori unbiased structures of comparisons, e. g. when comparison sets are drawn independently and uniformly at random, the number of observations needed to achieve a prescribed estimation accuracy depends on the choice of a generalized Thurstone choice model. For a broad set of generalized Thurstone choice models, which includes all popular instances used in practice, the estimation error is shown to be largely insensitive to the cardinality of comparison sets. On the other hand, we found that there exist generalized Thurstone choice models for which the estimation error decreases much faster with the cardinality of comparison sets. Large-Margin Softmax Loss for Convolutional Neural Networks Weiyang Liu Peking University . Yandong Wen South China University of Technology . Zhiding Yu Carnegie Mellon University . Meng Yang Shenzhen University Paper AbstractCross-entropy loss together with softmax is arguably one of the most common used supervision components in convolutional neural networks (CNNs). Despite its simplicity, popularity and excellent performance, the component does not explicitly encourage discriminative learning of features. In this paper, we propose a generalized large-margin softmax (L-Softmax) loss which explicitly encourages intra-class compactness and inter-class separability between learned features. Moreover, L-Softmax not only can adjust the desired margin but also can avoid overfitting. We also show that the L-Softmax loss can be optimized by typical stochastic gradient descent. Extensive experiments on four benchmark datasets demonstrate that the deeply-learned features with L-softmax loss become more discriminative, hence significantly boosting the performance on a variety of visual classification and verification tasks. A Random Matrix Approach to Echo-State Neural Networks Romain Couillet CentraleSupelec . Gilles Wainrib ENS Ulm, Paris, France . Hafiz Tiomoko Ali CentraleSupelec, Gif-sur-Yvette, France . Harry Sevi ENS Lyon, Lyon, Paris Paper AbstractRecurrent neural networks, especially in their linear version, have provided many qualitative insights on their performance under different configurations. This article provides, through a novel random matrix framework, the quantitative counterpart of these performance results, specifically in the case of echo-state networks. Beyond mere insights, our approach conveys a deeper understanding on the core mechanism under play for both training and testing. One-hot CNN (convolutional neural network) has been shown to be effective for text categorization (Johnson 038 Zhang, 2015). We view it as a special case of a general framework which jointly trains a linear model with a non-linear feature generator consisting of text region embedding pooling8217. Under this framework, we explore a more sophisticated region embedding method using Long Short-Term Memory (LSTM). LSTM can embed text regions of variable (and possibly large) sizes, whereas the region size needs to be fixed in a CNN. We seek effective and efficient use of LSTM for this purpose in the supervised and semi-supervised settings. The best results were obtained by combining region embeddings in the form of LSTM and convolution layers trained on unlabeled data. The results indicate that on this task, embeddings of text regions, which can convey complex concepts, are more useful than embeddings of single words in isolation. We report performances exceeding the previous best results on four benchmark datasets. Crowdsourcing systems are popular for solving large-scale labelling tasks with low-paid (or even non-paid) workers. We study the problem of recovering the true labels from noisy crowdsourced labels under the popular Dawid-Skene model. To address this inference problem, several algorithms have recently been proposed, but the best known guarantee is still significantly larger than the fundamental limit. We close this gap under a simple but canonical scenario where each worker is assigned at most two tasks. In particular, we introduce a tighter lower bound on the fundamental limit and prove that Belief Propagation (BP) exactly matches this lower bound. The guaranteed optimality of BP is the strongest in the sense that it is information-theoretically impossible for any other algorithm to correctly la - bel a larger fraction of the tasks. In the general setting, when more than two tasks are assigned to each worker, we establish the dominance result on BP that it outperforms other existing algorithms with known provable guarantees. Experimental results suggest that BP is close to optimal for all regimes considered, while existing state-of-the-art algorithms exhibit suboptimal performances. Learning control has become an appealing alternative to the derivation of control laws based on classic control theory. However, a major shortcoming of learning control is the lack of performance guarantees which prevents its application in many real-world scenarios. As a step in this direction, we provide a stability analysis tool for controllers acting on dynamics represented by Gaussian processes (GPs). We consider arbitrary Markovian control policies and system dynamics given as (i) the mean of a GP, and (ii) the full GP distribution. For the first case, our tool finds a state space region, where the closed-loop system is provably stable. In the second case, it is well known that infinite horizon stability guarantees cannot exist. Instead, our tool analyzes finite time stability. Empirical evaluations on simulated benchmark problems support our theoretical results. Learning a classifier from private data distributed across multiple parties is an important problem that has many potential applications. How can we build an accurate and differentially private global classifier by combining locally-trained classifiers from different parties, without access to any partys private data We propose to transfer the knowledge of the local classifier ensemble by first creating labeled data from auxiliary unlabeled data, and then train a global differentially private classifier. We show that majority voting is too sensitive and therefore propose a new risk weighted by class probabilities estimated from the ensemble. Relative to a non-private solution, our private solution has a generalization error bounded by O(epsilon M ). This allows strong privacy without performance loss when the number of participating parties M is large, such as in crowdsensing applications. We demonstrate the performance of our framework with realistic tasks of activity recognition, network intrusion detection, and malicious URL detection. Network Morphism Tao Wei University at Buffalo . Changhu Wang Microsoft Research . Yong Rui Microsoft Research . Chang Wen Chen Paper AbstractWe present a systematic study on how to morph a well-trained neural network to a new one so that its network function can be completely preserved. We define this as network morphism in this research. After morphing a parent network, the child network is expected to inherit the knowledge from its parent network and also has the potential to continue growing into a more powerful one with much shortened training time. The first requirement for this network morphism is its ability to handle diverse morphing types of networks, including changes of depth, width, kernel size, and even subnet. To meet this requirement, we first introduce the network morphism equations, and then develop novel morphing algorithms for all these morphing types for both classic and convolutional neural networks. The second requirement is its ability to deal with non-linearity in a network. We propose a family of parametric-activation functions to facilitate the morphing of any continuous non-linear activation neurons. Experimental results on benchmark datasets and typical neural networks demonstrate the effectiveness of the proposed network morphism scheme. Second-order optimization methods such as natural gradient descent have the potential to speed up training of neural networks by correcting for the curvature of the loss function. Unfortunately, the exact natural gradient is impractical to compute for large models, and most approximations either require an expensive iterative procedure or make crude approximations to the curvature. We present Kronecker Factors for Convolution (KFC), a tractable approximation to the Fisher matrix for convolutional networks based on a structured probabilistic model for the distribution over backpropagated derivatives. Similarly to the recently proposed Kronecker-Factored Approximate Curvature (K-FAC), each block of the approximate Fisher matrix decomposes as the Kronecker product of small matrices, allowing for efficient inversion. KFC captures important curvature information while still yielding comparably efficient updates to stochastic gradient descent (SGD). We show that the updates are invariant to commonly used reparameterizations, such as centering of the activations. In our experiments, approximate natural gradient descent with KFC was able to train convolutional networks several times faster than carefully tuned SGD. Furthermore, it was able to train the networks in 10-20 times fewer iterations than SGD, suggesting its potential applicability in a distributed setting. Budget constrained optimal design of experiments is a classical problem in statistics. Although the optimal design literature is very mature, few efficient strategies are available when these design problems appear in the context of sparse linear models commonly encountered in high dimensional machine learning and statistics. In this work, we study experimental design for the setting where the underlying regression model is characterized by a ell1-regularized linear function. We propose two novel strategies: the first is motivated geometrically whereas the second is algebraic in nature. We obtain tractable algorithms for this problem and also hold for a more general class of sparse linear models. We perform an extensive set of experiments, on benchmarks and a large multi-site neuroscience study, showing that the proposed models are effective in practice. The latter experiment suggests that these ideas may play a small role in informing enrollment strategies for similar scientific studies in the short-to-medium term future. Minding the Gaps for Block Frank-Wolfe Optimization of Structured SVMs Anton Osokin . Jean-Baptiste Alayrac ENS . Isabella Lukasewitz INRIA . Puneet Dokania INRIA and Ecole Centrale Paris . Simon Lacoste-Julien INRIA Paper AbstractIn this paper, we propose several improvements on the block-coordinate Frank-Wolfe (BCFW) algorithm from Lacoste-Julien et al. (2013) recently used to optimize the structured support vector machine (SSVM) objective in the context of structured prediction, though it has wider applications. The key intuition behind our improvements is that the estimates of block gaps maintained by BCFW reveal the block suboptimality that can be used as an adaptive criterion. First, we sample objects at each iteration of BCFW in an adaptive non-uniform way via gap-based sampling. Second, we incorporate pairwise and away-step variants of Frank-Wolfe into the block-coordinate setting. Third, we cache oracle calls with a cache-hit criterion based on the block gaps. Fourth, we provide the first method to compute an approximate regularization path for SSVM. Finally, we provide an exhaustive empirical evaluation of all our methods on four structured prediction datasets. Exact Exponent in Optimal Rates for Crowdsourcing Chao Gao Yale University . Yu Lu Yale University . Dengyong Zhou Microsoft Research Paper AbstractCrowdsourcing has become a popular tool for labeling large datasets. This paper studies the optimal error rate for aggregating crowdsourced labels provided by a collection of amateur workers. Under the Dawid-Skene probabilistic model, we establish matching upper and lower bounds with an exact exponent mI(pi), where m is the number of workers and I(pi) is the average Chernoff information that characterizes the workers8217 collective ability. Such an exact characterization of the error exponent allows us to state a precise sample size requirement m ge frac logfrac in order to achieve an epsilon misclassification error. In addition, our results imply optimality of various forms of EM algorithms given accurate initializers of the model parameters. Unsupervised learning and supervised learning are key research topics in deep learning. However, as high-capacity supervised neural networks trained with a large amount of labels have achieved remarkable success in many computer vision tasks, the availability of large-scale labeled images reduced the significance of unsupervised learning. Inspired by the recent trend toward revisiting the importance of unsupervised learning, we investigate joint supervised and unsupervised learning in a large-scale setting by augmenting existing neural networks with decoding pathways for reconstruction. First, we demonstrate that the intermediate activations of pretrained large-scale classification networks preserve almost all the information of input images except a portion of local spatial details. Then, by end-to-end training of the entire augmented architecture with the reconstructive objective, we show improvement of the network performance for supervised tasks. We evaluate several variants of autoencoders, including the recently proposed 8220what-where8221 autoencoder that uses the encoder pooling switches, to study the importance of the architecture design. Taking the 16-layer VGGNet trained under the ImageNet ILSVRC 2012 protocol as a strong baseline for image classification, our methods improve the validation-set accuracy by a noticeable margin. (LRR) has been a significant method for segmenting data that are generated from a union of subspaces. It is also known that solving LRR is challenging in terms of time complexity and memory footprint, in that the size of the nuclear norm regularized matrix is n-by-n (where n is the number of samples). In this paper, we thereby develop a novel online implementation of LRR that reduces the memory cost from O(n2) to O(pd), with p being the ambient dimension and d being some estimated rank (d 20 reduction in the model size without any loss in accuracy on CIFAR-10 benchmark. We also demonstrate that fine-tuning can further enhance the accuracy of fixed point DCNs beyond that of the original floating point model. In doing so, we report a new state-of-the-art fixed point performance of 6.78 error-rate on CIFAR-10 benchmark. Provable Algorithms for Inference in Topic Models Sanjeev Arora Princeton University . Rong Ge . Frederic Koehler Princeton University . Tengyu Ma Princeton University . Ankur Moitra Paper AbstractRecently, there has been considerable progress on designing algorithms with provable guarantees 8212typically using linear algebraic methods8212for parameter learning in latent variable models. Designing provable algorithms for inference has proved more difficult. Here we take a first step towards provable inference in topic models. We leverage a property of topic models that enables us to construct simple linear estimators for the unknown topic proportions that have small variance, and consequently can work with short documents. Our estimators also correspond to finding an estimate around which the posterior is well-concentrated. We show lower bounds that for shorter documents it can be information theoretically impossible to find the hidden topics. Finally, we give empirical results that demonstrate that our algorithm works on realistic topic models. It yields good solutions on synthetic data and runs in time comparable to a single iteration of Gibbs sampling. This paper develops an approach for efficiently solving general convex optimization problems specified as disciplined convex programs (DCP), a common general-purpose modeling framework. Specifically we develop an algorithm based upon fast epigraph projections, projections onto the epigraph of a convex function, an approach closely linked to proximal operator methods. We show that by using these operators, we can solve any disciplined convex program without transforming the problem to a standard cone form, as is done by current DCP libraries. We then develop a large library of efficient epigraph projection operators, mirroring and extending work on fast proximal algorithms, for many common convex functions. Finally, we evaluate the performance of the algorithm, and show it often achieves order of magnitude speedups over existing general-purpose optimization solvers. We study the fixed design segmented regression problem: Given noisy samples from a piecewise linear function f, we want to recover f up to a desired accuracy in mean-squared error. Previous rigorous approaches for this problem rely on dynamic programming (DP) and, while sample efficient, have running time quadratic in the sample size. As our main contribution, we provide new sample near-linear time algorithms for the problem that 8211 while not being minimax optimal 8211 achieve a significantly better sample-time tradeoff on large datasets compared to the DP approach. Our experimental evaluation shows that, compared with the DP approach, our algorithms provide a convergence rate that is only off by a factor of 2 to 4, while achieving speedups of three orders of magnitude. Energetic Natural Gradient Descent Philip Thomas CMU . Bruno Castro da Silva . Christoph Dann Carnegie Mellon University . Emma Paper AbstractWe propose a new class of algorithms for minimizing or maximizing functions of parametric probabilistic models. These new algorithms are natural gradient algorithms that leverage more information than prior methods by using a new metric tensor in place of the commonly used Fisher information matrix. This new metric tensor is derived by computing directions of steepest ascent where the distance between distributions is measured using an approximation of energy distance (as opposed to Kullback-Leibler divergence, which produces the Fisher information matrix), and so we refer to our new ascent direction as the energetic natural gradient. Partition Functions from Rao-Blackwellized Tempered Sampling David Carlson Columbia University . Patrick Stinson Columbia University . Ari Pakman Columbia University . Liam Paper AbstractPartition functions of probability distributions are important quantities for model evaluation and comparisons. We present a new method to compute partition functions of complex and multimodal distributions. Such distributions are often sampled using simulated tempering, which augments the target space with an auxiliary inverse temperature variable. Our method exploits the multinomial probability law of the inverse temperatures, and provides estimates of the partition function in terms of a simple quotient of Rao-Blackwellized marginal inverse temperature probability estimates, which are updated while sampling. We show that the method has interesting connections with several alternative popular methods, and offers some significant advantages. In particular, we empirically find that the new method provides more accurate estimates than Annealed Importance Sampling when calculating partition functions of large Restricted Boltzmann Machines (RBM) moreover, the method is sufficiently accurate to track training and validation log-likelihoods during learning of RBMs, at minimal computational cost. In this paper we address the identifiability and efficient learning problems of finite mixtures of Plackett-Luce models for rank data. We prove that for any kgeq 2, the mixture of k Plackett-Luce models for no more than 2k-1 alternatives is non-identifiable and this bound is tight for k2. For generic identifiability, we prove that the mixture of k Plackett-Luce models over m alternatives is if kleqlfloorfrac 2rfloor. We also propose an efficient generalized method of moments (GMM) algorithm to learn the mixture of two Plackett-Luce models and show that the algorithm is consistent. Our experiments show that our GMM algorithm is significantly faster than the EMM algorithm by Gormley 038 Murphy (2008), while achieving competitive statistical efficiency. The combinatorial explosion that plagues planning and reinforcement learning (RL) algorithms can be moderated using state abstraction. Prohibitively large task representations can be condensed such that essential information is preserved, and consequently, solutions are tractably computable. However, exact abstractions, which treat only fully-identical situations as equivalent, fail to present opportunities for abstraction in environments where no two situations are exactly alike. In this work, we investigate approximate state abstractions, which treat nearly-identical situations as equivalent. We present theoretical guarantees of the quality of behaviors derived from four types of approximate abstractions. Additionally, we empirically demonstrate that approximate abstractions lead to reduction in task complexity and bounded loss of optimality of behavior in a variety of environments. Power of Ordered Hypothesis Testing Lihua Lei Lihua . William Fithian UC Berkeley, Department of Statistics Paper AbstractOrdered testing procedures are multiple testing procedures that exploit a pre-specified ordering of the null hypotheses, from most to least promising. We analyze and compare the power of several recent proposals using the asymptotic framework of Li 038 Barber (2015). While accumulation tests including ForwardStop can be quite powerful when the ordering is very informative, they are asymptotically powerless when the ordering is weaker. By contrast, Selective SeqStep, proposed by Barber 038 Candes (2015), is much less sensitive to the quality of the ordering. We compare the power of these procedures in different regimes, concluding that Selective SeqStep dominates accumulation tests if either the ordering is weak or non-null hypotheses are sparse or weak. Motivated by our asymptotic analysis, we derive an improved version of Selective SeqStep which we call Adaptive SeqStep, analogous to Storeys improvement on the Benjamini-Hochberg proce - dure. We compare these methods using the GEO-Query data set analyzed by (Li 038 Barber, 2015) and find Adaptive SeqStep has favorable performance for both good and bad prior orderings. PHOG: Probabilistic Model for Code Pavol Bielik ETH Zurich . Veselin Raychev ETH Zurich . Martin Vechev ETH Zurich Paper AbstractWe introduce a new generative model for code called probabilistic higher order grammar (PHOG). PHOG generalizes probabilistic context free grammars (PCFGs) by allowing conditioning of a production rule beyond the parent non-terminal, thus capturing rich contexts relevant to programs. Even though PHOG is more powerful than a PCFG, it can be learned from data just as efficiently. We trained a PHOG model on a large JavaScript code corpus and show that it is more precise than existing models, while similarly fast. As a result, PHOG can immediately benefit existing programming tools based on probabilistic models of code. We consider the problem of online prediction in changing environments. In this framework the performance of a predictor is evaluated as the loss relative to an arbitrarily changing predictor, whose individual components come from a base class of predictors. Typical results in the literature consider different base classes (experts, linear predictors on the simplex, etc.) separately. Introducing an arbitrary mapping inside the mirror decent algorithm, we provide a framework that unifies and extends existing results. As an example, we prove new shifting regret bounds for matrix prediction problems. Hyperparameter selection generally relies on running multiple full training trials, with selection based on validation set performance. We propose a gradient-based approach for locally adjusting hyperparameters during training of the model. Hyperparameters are adjusted so as to make the model parameter gradients, and hence updates, more advantageous for the validation cost. We explore the approach for tuning regularization hyperparameters and find that in experiments on MNIST, SVHN and CIFAR-10, the resulting regularization levels are within the optimal regions. The additional computational cost depends on how frequently the hyperparameters are trained, but the tested scheme adds only 30 computational overhead regardless of the model size. Since the method is significantly less computationally demanding compared to similar gradient-based approaches to hyperparameter optimization, and consistently finds good hyperparameter values, it can be a useful tool for training neural network models. Many of the recent Trajectory Optimization algorithms alternate between local approximation of the dynamics and conservative policy update. However, linearly approximating the dynamics in order to derive the new policy can bias the update and prevent convergence to the optimal policy. In this article, we propose a new model-free algorithm that backpropagates a local quadratic time-dependent Q-Function, allowing the derivation of the policy update in closed form. Our policy update ensures exact KL-constraint satisfaction without simplifying assumptions on the system dynamics demonstrating improved performance in comparison to related Trajectory Optimization algorithms linearizing the dynamics. Due to its numerous applications, rank aggregation has become a problem of major interest across many fields of the computer science literature. In the vast majority of situations, Kemeny consensus(es) are considered as the ideal solutions. It is however well known that their computation is NP-hard. Many contributions have thus established various results to apprehend this complexity. In this paper we introduce a practical method to predict, for a ranking and a dataset, how close the Kemeny consensus(es) are to this ranking. A major strength of this method is its generality: it does not require any assumption on the dataset nor the ranking. Furthermore, it relies on a new geometric interpretation of Kemeny aggregation that, we believe, could lead to many other results. Horizontally Scalable Submodular Maximization Mario Lucic ETH Zurich . Olivier Bachem ETH Zurich . Morteza Zadimoghaddam Google Research . Andreas Krause Paper AbstractA variety of large-scale machine learning problems can be cast as instances of constrained submodular maximization. Existing approaches for distributed submodular maximization have a critical drawback: The capacity 8211 number of instances that can fit in memory 8211 must grow with the data set size. In practice, while one can provision many machines, the capacity of each machine is limited by physical constraints. We propose a truly scalable approach for distributed submodular maximization under fixed capacity. The proposed framework applies to a broad class of algorithms and constraints and provides theoretical guarantees on the approximation factor for any available capacity. We empirically evaluate the proposed algorithm on a variety of data sets and demonstrate that it achieves performance competitive with the centralized greedy solution. Group Equivariant Convolutional Networks Taco Cohen University of Amsterdam . Max Welling University of Amsterdam CIFAR Paper AbstractWe introduce Group equivariant Convolutional Neural Networks (G-CNNs), a natural generalization of convolutional neural networks that reduces sample complexity by exploiting symmetries. G-CNNs use G-convolutions, a new type of layer that enjoys a substantially higher degree of weight sharing than regular convolution layers. G-convolutions increase the expressive capacity of the network without increasing the number of parameters. Group convolution layers are easy to use and can be implemented with negligible computational overhead for discrete groups generated by translations, reflections and rotations. G-CNNs achieve state of the art results on CIFAR10 and rotated MNIST. The partition function is fundamental for probabilistic graphical models8212it is required for inference, parameter estimation, and model selection. Evaluating this function corresponds to discrete integration, namely a weighted sum over an exponentially large set. This task quickly becomes intractable as the dimensionality of the problem increases. We propose an approximation scheme that, for any discrete graphical model whose parameter vector has bounded norm, estimates the partition function with arbitrarily small error. Our algorithm relies on a near minimax optimal polynomial approximation to the potential function and a Clenshaw-Curtis style quadrature. Furthermore, we show that this algorithm can be randomized to split the computation into a high-complexity part and a low-complexity part, where the latter may be carried out on small computational devices. Experiments confirm that the new randomized algorithm is highly accurate if the parameter norm is small, and is otherwise comparable to methods with unbounded error. Correcting Forecasts with Multifactor Neural Attention Matthew Riemer IBM . Aditya Vempaty IBM . Flavio Calmon IBM . Fenno Heath IBM . Richard Hull IBM . Elham Khabiri IBM Paper AbstractAutomatic forecasting of time series data is a challenging problem in many industries. Current forecast models adopted by businesses do not provide adequate means for including data representing external factors that may have a significant impact on the time series, such as weather, national events, local events, social media trends, promotions, etc. This paper introduces a novel neural network attention mechanism that naturally incorporates data from multiple external sources without the feature engineering needed to get other techniques to work. We demonstrate empirically that the proposed model achieves superior performance for predicting the demand of 20 commodities across 107 stores of one of America8217s largest retailers when compared to other baseline models, including neural networks, linear models, certain kernel methods, Bayesian regression, and decision trees. Our method ultimately accounts for a 23.9 relative improvement as a result of the incorporation of external data sources, and provides an unprecedented level of descriptive ability for a neural network forecasting model. Observational studies are rising in importance due to the widespread accumulation of data in fields such as healthcare, education, employment and ecology. We consider the task of answering counterfactual questions such as, 8220Would this patient have lower blood sugar had she received a different medication8221. We propose a new algorithmic framework for counterfactual inference which brings together ideas from domain adaptation and representation learning. In addition to a theoretical justification, we perform an empirical comparison with previous approaches to causal inference from observational data. Our deep learning algorithm significantly outperforms the previous state-of-the-art. Gaussian Processes (GPs) provide a general and analytically tractable way of modeling complex time-varying, nonparametric functions. The Automatic Bayesian Covariance Discovery (ABCD) system constructs natural-language description of time-series data by treating unknown time-series data nonparametrically using GP with a composite covariance kernel function. Unfortunately, learning a composite covariance kernel with a single time-series data set often results in less informative kernel that may not give qualitative, distinctive descriptions of data. We address this challenge by proposing two relational kernel learning methods which can model multiple time-series data sets by finding common, shared causes of changes. We show that the relational kernel learning methods find more accurate models for regression problems on several real-world data sets US stock data, US house price index data and currency exchange rate data. We introduce a new approach for amortizing inference in directed graphical models by learning heuristic approximations to stochastic inverses, designed specifically for use as proposal distributions in sequential Monte Carlo methods. We describe a procedure for constructing and learning a structured neural network which represents an inverse factorization of the graphical model, resulting in a conditional density estimator that takes as input particular values of the observed random variables, and returns an approximation to the distribution of the latent variables. This recognition model can be learned offline, independent from any particular dataset, prior to performing inference. The output of these networks can be used as automatically-learned high-quality proposal distributions to accelerate sequential Monte Carlo across a diverse range of problem settings. Slice Sampling on Hamiltonian Trajectories Benjamin Bloem-Reddy Columbia University . John Cunningham Columbia University Paper AbstractHamiltonian Monte Carlo and slice sampling are amongst the most widely used and studied classes of Markov Chain Monte Carlo samplers. We connect these two methods and present Hamiltonian slice sampling, which allows slice sampling to be carried out along Hamiltonian trajectories, or transformations thereof. Hamiltonian slice sampling clarifies a class of model priors that induce closed-form slice samplers. More pragmatically, inheriting properties of slice samplers, it offers advantages over Hamiltonian Monte Carlo, in that it has fewer tunable hyperparameters and does not require gradient information. We demonstrate the utility of Hamiltonian slice sampling out of the box on problems ranging from Gaussian process regression to Pitman-Yor based mixture models. Noisy Activation Functions Caglar Glehre . Marcin Moczulski . Misha Denil . Yoshua Bengio U. of Montreal Paper AbstractCommon nonlinear activation functions used in neural networks can cause training difficulties due to the saturation behavior of the activation function, which may hide dependencies that are not visible to vanilla-SGD (using first order gradients only). Gating mechanisms that use softly saturating activation functions to emulate the discrete switching of digital logic circuits are good examples of this. We propose to exploit the injection of appropriate noise so that the gradients may flow easily, even if the noiseless application of the activation function would yield zero gradients. Large noise will dominate the noise-free gradient and allow stochastic gradient descent to explore more. By adding noise only to the problematic parts of the activation function, we allow the optimization procedure to explore the boundary between the degenerate saturating) and the well-behaved parts of the activation function. We also establish connections to simulated annealing, when the amount of noise is annealed down, making it easier to optimize hard objective functions. We find experimentally that replacing such saturating activation functions by noisy variants helps optimization in many contexts, yielding state-of-the-art or competitive results on different datasets and task, especially when training seems to be the most difficult, e. g. when curriculum learning is necessary to obtain good results. PD-Sparse. A Primal and Dual Sparse Approach to Extreme Multiclass and Multilabel Classification Ian En-Hsu Yen University of Texas at Austin . Xiangru Huang UTaustin . Pradeep Ravikumar UT Austin . Kai Zhong ICES department, University of Texas at Austin . Inderjit Paper AbstractWe consider Multiclass and Multilabel classification with extremely large number of classes, of which only few are labeled to each instance. In such setting, standard methods that have training, prediction cost linear to the number of classes become intractable. State-of-the-art methods thus aim to reduce the complexity by exploiting correlation between labels under assumption that the similarity between labels can be captured by structures such as low-rank matrix or balanced tree. However, as the diversity of labels increases in the feature space, structural assumption can be easily violated, which leads to degrade in the testing performance. In this work, we show that a margin-maximizing loss with l1 penalty, in case of Extreme Classification, yields extremely sparse solution both in primal and in dual without sacrificing the expressive power of predictor. We thus propose a Fully-Corrective Block-Coordinate Frank-Wolfe (FC-BCFW) algorithm that exploits both primal and dual sparsity to achieve a complexity sublinear to the number of primal and dual variables. A bi-stochastic search method is proposed to further improve the efficiency. In our experiments on both Multiclass and Multilabel problems, the proposed method achieves significant higher accuracy than existing approaches of Extreme Classification with very competitive training and prediction time. Strategy Portfolio Construction For many decades the principles of portfolio construction laid out by Harry Markovitz in the 1950s have been broadly accepted as one of the cornerstones of modern portfolio theory (as summarized, for example, in this Wikipedia article ). Die Stärken und Schwächen der Mittelvarianz-Methodik sind heute weitgehend verstanden und weitgehend akzeptiert. But alternatives exist, one of which is the strategy portfolio approach to portfolio construction. The essence of the strategy portfolio approach lies in the understanding that it is much easier to create a diversified portfolio of equity strategies than a diversified portfolio of the underlying assets. In principle, it is quite feasible to create a basket of strategies for highly correlated stocks that are uncorrelated, or even negatively correlated with one another. For example, one could combine a mean-reversion strategy on a stock like Merck amp Co. (MRK) with a momentum strategy on a correlated stock such as Pfizer (PFE), that will typically counteract one another rather than moving in lockstep as the underlying equities tend to do. In fact this approach is widely employed by hedge fund investors as well as proprietary trading firms and family offices, who seek to diversify their investments across a broad range of underlying strategies in many different asset classes, rather than by investing in the underlying assets directly. What is to be gained from such an approach to portfolio construction, compared to the typical methodology The answer is straightforward: lower risk, which results from the lower correlation between strategies, compared to the correlation between the assets themselves. That in turn produces a more diversified portfolio, reducing strategy beta, volatility and tail risk. We see all of these characteristics in the Systematic Strategies Quantitative Equity Strategy, described below, which has an average beta of only 0.2 and a maximum realized drawdown of -2.08. Risk reduction is only part of the story, however. The Quantitative Equity Strategy has also yielded a combined alpha in excess of 4 per annum, outperforming the benchmark by a total of 16.35 (net of fees) since 2013. In other words, the individual equity strategies produce alphas that are conserved when combined together to create the overall portfolio. But even if the strategies produced little or no alpha, the risk-reduction benefits of the approach would likely still apply. What are the drawbacks to this approach to portfolio construction One major challenge is that it is much harder to create strategy portfolios than asset portfolios. The analyst is obliged to create (at least) one individual strategy for each asset in the universe, rather than a single strategy for the portfolio as a whole. This constrains the rate at which the investment universe can grow (it takes time to research and develop good strategies), limiting the rate of growth of assets under management. So it is not an approach that I would necessarily recommend if your goal is to deploy multiple billions of dollars but AUM up to around a billion dollars is certainly a feasible target. From a risk perspective, the chief limitation is that we lack the ability to control the makeup of the resulting portfolio as closely as we can with traditional approaches to portfolio construction. In a typical equity longshort strategy the portfolio is constrained to have a specified maximum beta, and overall net exposure. With the strategy portfolio approach we are unable to guarantee a specific limit on the net exposure, or the beta of the portfolio. We may be able to quantify that, historically, the portfolio has an average net long exposure, of, say, 25 and an average beta of 0.2 but there is nothing to guarantee that a situation may not arise in future in which all of the strategies align to produce a 100 net long or net short exposure and a beta of 1-1, or greater. This is extremely unlikely, of course, and may never happen even in geological timescales, as can be demonstrated by Monte Carlo simulation. What is likely, however, is that there will be periods in which the beta and net exposure of the portfolio may fluctuate widely. Is this a problem Yes and no. The point about constraining the portfolio beta and net exposure in a typical longshort strategy is to manage portfolio risk. Such constraints decrease the likelihood of substantial losses 8211 but they cannot guarantee that outcome, as has been demonstrated during prior periods of severe market stress such as 200001 and 200809. Asset correlations tend to increase significantly when markets suffer major declines, often undermining the assumptions on which the portfolio and its associated risk limits were originally based. Similarly, the way in which we construct strategy portfolios takes a statistical approach to risk control, using stochastic process control. Just as with the traditional approach to portfolio construction, statistical analysis cannot guarantee that market conditions may not arise that give rise to substantial losses, however unlikely such circumstances may be. Read more about stochastic process control: More on genetic programming: Is the risk of as yet unknown future market conditions greater for strategy portfolios greater than for traditional equity longshort portfolios I would say not. In fact, during periods of market stress I would prefer to be in a portfolio of strategies, some of which at least are likely to perform well, rather than in a portfolio of equities which are now all likely to be highly correlated to the benchmark index. The benefit of being able to check the boxes for risk controls such as limits on portfolio beta and net exposure is, I would argue, somewhat illusory. They might be characterized as just a placebo for those who are, literally, unable (or, more likely, not paid to) to think outside the box. But if this argument is not sufficiently persuasive, it is perfectly feasible to overlay risk controls on a portfolio comprising several underlying strategies. For example one can: Hedge out portfolio beta and net exposure above a specified level, using index ETFs, futures or options Reduce the capital allocations during periods of market stress, or when portfolio beta or market exposure exceeds a threshold level Turn off strategies that under-performing in current market conditions All of these measures will impact strategy performance. By and large, our preference is to let the strategy play out as it may, trusting in the robustness of the strategy portfolio to produce an acceptable outcome. We leave it to the investor to decide what additional risk controls he wishes to implement, either independently, or within a separately managed account. The Systematic Strategies Quantitative Equity Strategy Systematic Strategies started out in 2009 as a proprietary trading firm engaged in high frequency trading. Im Jahr 2012 hat sich die Firma mit der Einführung unserer VIX ETF-Strategie, die im Jahr 2015 durch die Systematische Volatilitätsstrategie ersetzt wurde, in niederfrequente systematische Handelsstrategien ausgeweitet. The firm began managing external capital in its managed account platform in 2015. In 2012 we created a research program into quantitative equity strategies using machine learning techniques, earlier versions of which were successfully employed in Proteom Capital, a hedge fund that pioneered the approach in the early 20008217s. Proteom managed capital for Paul Tudor Jones8217s Tudor Capital and several other major institutional investors until 2007. Systematic Strategies began trading its Quantitative Equity Strategy, the result of its research program, in 2013. Having built a four year live track record, the firm is opening the strategy to external investors in 2017. The investment program will be offered in managed account format and, for the first time, as a hedge fund product. For more details about the hedge fund offering, please see this post about the Systematic Strategies Fund. Designed to achieve returns comparable to the benchmark SampP 500 Index, but with much lower risk, the Quantitative Equity Strategy has produced annual returns at a compound rate of 14.74 (net of fees) since 2013, with an annual volatility of 4.46 and realized Sharpe Ratio of 3.31. By contrast, the benchmark SampP 500 Index yielded a compound annual rate of return of 11.93, with annual volatility of 10.40 and Sharpe Ratio of 1.15. In other words, the strategy has outperformed the benchmark SampP 500 Index by a significant margin, but with much reduced volatility and downside risk. For more details about the hedge fund offering, please see this post about the Systematic Strategies Fund. If you are an Accredited Investor and wish to receive a copy of the fund offering documents, please write to us at infosystematic-stratgies.

No comments:

Post a Comment